Hi everyone !

I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !

With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !

What Am I trying to achieve?

I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…

Convert the following string:

[Some text](#Header%20Linking%20MARKDOWN.md)

Into

[Some text](#header-linking-markdown.md)

As you can see those are the following requirement:

  • Pattern: [Some text](#link%20to%20header.md)
  • Only edit what’s between parentheses
  • Replace space (%20) with -
  • Everything as lowercase
  • Links are sometimes in nested parentheses
    • e.g. (look here [Some text](#link%20to%20header.md))
  • Do not change a line that begins with https (external links)

While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/

What I tried

The furthest I got was the following:

sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase

sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -

These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20 occurrence in the file.

The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.


I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !

Thanks in advance.

  • tuna@discuss.tchncs.de
    link
    fedilink
    arrow-up
    4
    ·
    19 hours ago

    annotated it is working like this:

    # use a loop to iteratively replace the %20 with -, since doing s/%20/-/g would replace too much. we loop until it cant substitute any more
    
    # label for looping
    :loop;
    # skip the following substitute command if the line contains an http link in markdown format
    /\[[^]]*\](http/!
    # capture each part of the link, and join it together with -
    s/\(\[[^]]*\]\)\(([^)]*\)%20\([^)]*)\)/\1\2-\3/g;
    # if the substitution made a change, loop again, otherwise break
    t loop;
    
    # convert all insides to the link lowercase if the line doesnt contain an http link
    /\[[^]]*\](http/!
    # this is outside the loop rather than in the s command above because if the link doesnt contain %20 at all then it won't convert to lowercase
    s/\(\[[^]]*\]\)\(([^)]*)\)/\1\L\2/g
    
    • bizdelnick@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      12 hours ago

      skip the following substitute command if the line contains an http link in markdown format

      Why you assume there’s only one link in the line?

      Also, you perform substitutions in the whole URL instead only the fragment component.

      • tuna@discuss.tchncs.de
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        10 hours ago

        Why you assume there’s only one link in the line?

        They did not want external (http) links to be modified as that would break it:

        • [Example](https://example.com/#Some%20Link)
        • [Example](https://example.com/#some-link)

        I compromised by thinking that it might be unlikely enough to have an external http link AND internal link within the same line. You could probably still do it, my first thought was [^h][^t][^t][^p] but that would cause issues for #ttp and #A so i just gave up. Instead I think you’d want a different approach, like breaking each link onto their own line, do the same external/internal check before the substitution, and join the lines afterward.

        Also, you perform substitutions in the whole URL instead of the fragment component

        That requirement i missed. I just assumed the filename would be replaced the same way too Lol. Not too hard to fix tho :)