I'm trying to create a sed command using regex in order to substitute something in a text file only if it is not commented but I'm running into some troubles due to my almost non-existent knowledge of sed's commands.
I found solutions for small parts of the problem but some aren't complete enough or I just cannot put them together. TL;DR version available at the end.
Let's first determinate my ultimate goal
I'd like to match anything (like any regular regex (hehe)) in a text file only if it is NOT commented. As I'd like to do it for multiple languages, let's just take the common C comments.
So, in this case, words or lines can be commented different ways. We have the //
to comment only what's next on the line and we also have the /* */
comment block.
Environment
I'm currently working on Mac OSX which only supports POSIX sed but I installed a GNU-sed which I find better. (Thanks to Homebrew
. The package is gnu-sed
and the command is gsed
.) So, both of them are available to me if you prefer using one or another.
I'm writing this assuming a GNU-sed is used.
Ignoring a case
First problem, how to ignore some cases. I found that quite easily in this topic.
Now, the //
part seems easy for me to do and I would just have to add an OR ( |
) condition to join it with the other condition.
It would look something like this:
sed -E "/\/\/.*/! s/foo/bar/" file
Then, if the input file is:
foo
42
test
//foo
//42
// foo
//something foo
foo
42
something foo
foo
The output is:
bar
42
test
//foo
//42
// foo
//something foo
bar
42
something bar
bar
So now, I'm just going to focus my reflexion on the /* */
comment block only.
Matching through multiple lines
Second problem, how to to make the regex match through multiple lines. Well, I think this is the major problem. I found this topic talking about how to match through only one new line character. Well, it took me a moment to understand how it works. But this part of solution brings me a new problem and new questions.
It can obviously ignore only one new line ( \n
). So I now want to do the same but for an unknown number of lines (from 0 to infinite ( *
)). I bet I have to loop through the lines but this is where I'm stuck because I know nothing about sed's commands and it's really awkward to me.
During my searches, I found a little script having the purpose of replacing the tail
command and it uses a loop (I guess) but I fail at understanding its functioning.
Make it so it matches only before the */
part
The third part would be to make sure the ignored case only matches things before the end of the comment block ( */
). So, in the end, the ignore case would only match things between /*
and */
. The final command would then completely ignore things written inside a commentary block.
I made no real search on this part as I didn't solve the previous point and it appears to me that this */
problem depends on the /*
previous problem.
Final part: Putting all this together
Well, it is obvious I completely failed at this at the moment.
TL;DR
My question is: What would be the sed command in order to substitute anything we want in a text file only if it is not commented ?
Appendix
If you know an easier way to do it, using any other language, it's also very welcome. So, if you know how to do it with awk
, python
or anything else, feel free to share it.
//
bit is easy, but the/*…*/
is impossible. You will think you have done it, after hours of pain, and then you will try/* /* */
or// /*
or/* // */
If There is a version of sed that has a comments extension, then that would work. To do it you self tryawk
orpython
– ctrl-alt-delor Apr 10 '15 at 10:24awk
orpython
so if anything made with that would work, the solution made with those languages are very welcome. – Vrakfall Apr 10 '15 at 10:41