Hi everyone !
I’m in need for some assistance for string manipulation with sed and regex. I tried a whole day to trial & error and look around the web to find a solution however it’s way over my capabilities and maybe here are some sed/regex gurus who are willing to give me a helping hand !
With everything I gathered around the web, It seems it’s rather a complicated regex and sed substitution, here we go !
What Am I trying to achieve?
I have a lot of markdown guides I want to host on a self-hosted forgejo based git markdown. However the classic markdown links are not the same as one github/forgejo…
Convert the following string:
[Some text](#Header%20Linking%20MARKDOWN.md)
Into
[Some text](#header-linking-markdown.md)
As you can see those are the following requirement:
- Pattern:
[
]( - Only edit what’s between parentheses
- Replace
space (%20)
with-
- Everything as lowercase
- Links are sometimes in nested parentheses
- e.g. (look here
[
) ](
- e.g. (look here
- Do not change a line that begins with
https
(external links)
While everything is probably a bit complex as a whole the trickiest part is probably the nested parentheses :/
What I tried
The furthest I got was the following:
sed -Ei 's|\(([^\)]+)\)|\L&|g' test3.md #make everything between parentheses lowercase
sed -i '/https/ ! s/%20/-/g' test3.md #change every %20 occurrence to -
These sed/regx substitution are what I put together while roaming the web, but it has a lot a flaws and doesn’t work with nested parentheses. Also this would change every %20
occurrence in the file.
The closest solution I found on stackoverflow looks similar but wasn’t able to fit to my needs. Actually my lack of regex/sed understanding makes it impossible to adapt to my requirements.
I would appreciate any help even if a change of tool is needed, however I’m more into a learning processes, so a script or CLI alternative is very appreciated :) actually any help is appreciated :D !
Thanks in advance.
Here’s a solution with
perl
(assuming you don’t want to change http/https after the start of(
instead of start of a line):e
flag allows you to use Perl code in the substitution portion.\[[^]]+\]\(\K
match square brackets and use\K
to mark the start of matching portion (text before that won’t be part of$&
)(?!https?)
don’t match ifhttp
orhttps
is found[^)]+(?=\))
match non)
characters and assert that)
is present after those characters$&=~s|%20|-|gr
change%20
to-
for the matching portion found, ther
flag is used to return the modified string instead of change$&
itselflc
is a function to change text to lowercaseI didn’t test this, but it will change the whole URL while changes are only needed in its fragment component (after the first
#
).Hmm, OP mentioned “Only edit what’s between parentheses” - don’t see anywhere that whole URL shouldn’t be changed…
Paths are constant, only anchors are generated by forgejo.