I don't mind at all. Beyond my explanation, you might like to try to use an online regular expression checker to explore small changes to the regex to see how it matches what it matches.
Headings always match #\s+
, because that's the character #
followed by whitespace (\s
) one or more times (+
). Other text matches this, but so not all matches are headings, but all headings match. (You might have # blah
in the middle of the text, which would match. If that's a problem, then you can change the regex to ^#\s+
, where ^
means "from the beginning of a line".
Tags always match #[^\s]
, which means the character #
followed by one not whitespace character. Be careful: tags match this regex, but this regex doesn't match the entire tag. It only says "there is a tag here".
Fortunately, that doesn't hurt, because your Python code could match #[^\s]
and then turn that #
into \#
and thereby successfully avoid escaping the #
s at the beginning of headings. You could even use regex to do this by capturing the non-whitespace character at the beginning of the tag and "putting it back" using regex search and replace.
Replace #([^s])
with \#\1
.
The parentheses capture the matching characters (the first character of the tag) and \1
echoes back the captured characters. It would replace #a
with \#a
and so on.
I hope I explained this clearly enough. I see the other folks also tried, so I hope that together, you found an explanation that works well enough for you.
Peace.
Excellent! Indeed, I'd completely forgot about H2, H3, and so on, so I'm glad you found it comfortable to figure that out!
I read Mastering Regular Expressions about 25 years ago and it's one of the best and simplest investments I ever made in my own programming practice. Regex never goes out of style.
Enjoy!