7

I'm trying to create a sed command using regex in order to substitute something in a text file only if it is not commented but I'm running into some troubles due to my almost non-existent knowledge of sed's commands.

I found solutions for small parts of the problem but some aren't complete enough or I just cannot put them together. TL;DR version available at the end.

Let's first determinate my ultimate goal

I'd like to match anything (like any regular regex (hehe)) in a text file only if it is NOT commented. As I'd like to do it for multiple languages, let's just take the common C comments.

So, in this case, words or lines can be commented different ways. We have the // to comment only what's next on the line and we also have the /* */ comment block.


Environment

I'm currently working on Mac OSX which only supports POSIX sed but I installed a GNU-sed which I find better. (Thanks to Homebrew. The package is gnu-sed and the command is gsed.) So, both of them are available to me if you prefer using one or another.

I'm writing this assuming a GNU-sed is used.


Ignoring a case

First problem, how to ignore some cases. I found that quite easily in this topic.

Now, the // part seems easy for me to do and I would just have to add an OR ( | ) condition to join it with the other condition.

It would look something like this:

    sed -E "/\/\/.*/! s/foo/bar/" file

Then, if the input file is:

foo
42
test
//foo
//42
//    foo
//something foo
foo
42
something foo
  foo

The output is:

bar
42
test
//foo
//42
//    foo
//something foo
bar
42
something bar
  bar

So now, I'm just going to focus my reflexion on the /* */ comment block only.


Matching through multiple lines

Second problem, how to to make the regex match through multiple lines. Well, I think this is the major problem. I found this topic talking about how to match through only one new line character. Well, it took me a moment to understand how it works. But this part of solution brings me a new problem and new questions.

It can obviously ignore only one new line ( \n ). So I now want to do the same but for an unknown number of lines (from 0 to infinite ( * )). I bet I have to loop through the lines but this is where I'm stuck because I know nothing about sed's commands and it's really awkward to me.

During my searches, I found a little script having the purpose of replacing the tail command and it uses a loop (I guess) but I fail at understanding its functioning.

Make it so it matches only before the */ part

The third part would be to make sure the ignored case only matches things before the end of the comment block ( */ ). So, in the end, the ignore case would only match things between /* and */. The final command would then completely ignore things written inside a commentary block.

I made no real search on this part as I didn't solve the previous point and it appears to me that this */ problem depends on the /* previous problem.


Final part: Putting all this together

Well, it is obvious I completely failed at this at the moment.


TL;DR

My question is: What would be the sed command in order to substitute anything we want in a text file only if it is not commented ?


Appendix

If you know an easier way to do it, using any other language, it's also very welcome. So, if you know how to do it with awk, python or anything else, feel free to share it.

Vrakfall
  • 373
  • 1
    sed uses regex, regex is a context-free grammar, comments are a context. This question comes up a lot, on this site and else where. As you say the // bit is easy, but the /*…*/ is impossible. You will think you have done it, after hours of pain, and then you will try /* /* */ or // /* or /* // */ If There is a version of sed that has a comments extension, then that would work. To do it you self try awk or python – ctrl-alt-delor Apr 10 '15 at 10:24
  • I see why it is impossible to do it only with regex. What I'm trying to do here is using special sed commands that would allow us to add loops and conditions to that lookaround. Please have a look at this for a better understanding of what commands could do. I appears possible to me to do it this way because it becomes more like a little program. Even if it's impossible, I'm still looking for a solution that would be close to it, like trying avoid simple comments without special cases like those you mentioned. – Vrakfall Apr 10 '15 at 10:31
  • I can also add this: How do you think the compilers and the context colors on code editors are made ? They're yet able not to take commented parts into account, that is made by programmation. Sed's commands are a kind of programming languages. That is why it still appears possible to me. – Vrakfall Apr 10 '15 at 10:33
  • Well, I know nothing about awk or python so if anything made with that would work, the solution made with those languages are very welcome. – Vrakfall Apr 10 '15 at 10:41
  • @richard - regular expressions of the pure, mathematically-descriptive language type are context free. regexp - especially w/ back-references - is not the same thing. – mikeserv Apr 10 '15 at 15:32

1 Answers1

11

You should not believe them if they tell you it cannot be done. You should believe them, however, if they tell you it's not easy.

sed '\|*/|!{ s|/\*|\n&|              #if ! */ repl 1st /* w/ \n/*
     h;      s|foo|bar|g;/\n/!b      #hold; repl all foo/bar; if ! \n branch
     G;      s|\n.*\n||;:n           #Get; clear difference; :new label
     n;      \|*/|!bn;s|^|\n/*|      #new line; if ! */ branch new label
     };s|*/|\n&|g                    #repl all */ w/ \n*/
       s|foo|&\nbar|g;:r             #repl all foo w/ foo\nbar
       s|\(/\*[^\n]*\)\nbar|\1|g;tr  #repl all /*[^\n]*\nbar w/ foo
       s|foo\n\(b\)|\1|g             #repl all foo\nbar w/ bar
       s|^\n/.||;s|\n||g             #clear any \n inserts
'    <<\INPUT
asfoo   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */foo /*foo*/ foo
/*.
foo*/
foo
hello

INPUT

OUTPUT

asbar   /* asdfooasdfoo


asdfasdfoo
asdfasdfoo
foo */bar /*foo*/ bar
/*.
foo*/
bar
hello
mikeserv
  • 58,310
  • Awesome ! I'd like to upvote you but I don't have enough reputation yet for that. Of course I did know it was not easy at all but it was hard for me to believe it was impossible. Your code seems really huge and powerful and I'm not able to understand it. xD Thank you really much for your time. However, I tested it with some different inputs and I found one that made it crash. In this case: /*.foo */ foo hello The foo outside the comment block isn't substituted and the last line is even repeated twice. I don't know if it's fixable. Anyway, thanks again, that's such a great job ! – Vrakfall Apr 10 '15 at 12:19
  • In the previous comment, each block is a new line. I don't know if the problem is due to the fact I'm using gsed on mac osx btw. – Vrakfall Apr 10 '15 at 12:20
  • @Vrakfall - each block? what's a block? if I do just /*.foo */ foo it works... But I think I need $!N anyway... It also works with several blank lines above it....? I wrote this for GNU sed because that's what you mentioned in the question. – mikeserv Apr 10 '15 at 13:51
  • Yeah, I'm still using gnu sed. By block, I meant this because I cannot enter newlines in comments. So /*.foo*/ foo does work. Let's say you replace \n with a new line in the following, it won't work and will double the last line: /*.\nfoo*/\nfoo\nhello. – Vrakfall Apr 10 '15 at 14:05
  • @Vrakfall - thanks for that. It helped me see an easier way to do it actually - if not much easier. It's fixed now. – mikeserv Apr 10 '15 at 14:41
  • Hehe, you're welcome =P Such a great job you did there, thank you again. Imight be pushing it to the details but I found another little case acting weirdly. Here's my example. I guess it comes from the way you're dealing with */. No worries if it cannot be changed, I'm already really satisfied. By the way, I tried mixing your solution with the exclusion of //and I made this, it seems to work fine even if I'm not sure of what I've done. I hope the gist is ok for you. – Vrakfall Apr 13 '15 at 08:34
  • Just a little up to make sure you didn't forget me. =P This will be deleted at some point. – Vrakfall May 06 '15 at 09:10
  • @Vrakfall - hey. dont delete this. i was thinking about this only the other day... beck said... "its all in... youre mined..." or something. anyway, i may or may not (redundant?) repair to this. as is, the method for an approach which could resolve your parse nightmare can be found, even authored by yours truly, here at or about the giant viacom stack exchange. look for DP – mikeserv Oct 16 '18 at 19:06
  • I don't plan on deleting this question, at all. :P For the rest, I don't think I understood what you meant. – Vrakfall Nov 22 '18 at 14:16