6

I'm trying to convert single line 'C' style comments to 'C++' style. The 'sed' below isn't bad, but of course it fails if any leading code (code before the comment) has any ' / ' in it at all:

sed -i 's,\(^[^\/]*\)\/\*\([^\*]*\)\*\/[ ]*$,\1\/\/\2,' filename

What I wish I could do is this:

... [^\\/\\*] ...

i.e. negate ' /* ' which doesn't work of course, but after several hours of searching, I can't find a simple explanation of how to do that properly :( It doesn't seem like it should be rocket science.

For example, these strings:

blah blah        /* comment */
blah blah / blah /* comment */
blah blah        /* comment */ blah
blah blah / blah /* comment */ blah 

... should convert thusly:

blah blah        // comment 
blah blah / blah // comment 
blah blah        /* comment */ blah  (this CAN'T be converted)
blah blah / blah /* comment */ blah  (this CAN'T be converted)

... obviously no conversion can take place if there is code AFTER the 'C' comment.

I will do a close visual comparison between the file, before and after, so there's no need to handle ' /* ' inside a literal, nor do I want to convert anything multi-line.

Note I think of this as a 'negation' problem but maybe there is another way. I just need to capture everything before a ' /* ' and I don't care how.

FOLLOW UP ON ANSWER BELOW

Well damn! I see that I've completely misunderstood something fundamental:

.*/\*

... reads: "anything except slash star followed by slash star", so actually I get my 'negation' for free :-)

So, going even further than Barmar:

sed -i 's,^\(.*\)/\*\(.*\)\*/\s*$,\1//\2,' filename

... will even catch this:

blah / * blah        /* co / mme * nt */

and output this:

blah / * blah       // co / mme * nt 

Enlightenment.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Ray Andrews
  • 2,347
  • Why do you need to negate anything before the /*? Just capture everything before /*. – Barmar Oct 22 '14 at 19:25
  • 1
    How will you handle converting multi-line /* */ comments? What about literal strings that happen to contain "... /* ..."? – Greg Hewgill Oct 22 '14 at 19:27
  • Please [edit] your question and show us an example of your input and your desired output. Don't assume that text parsing experts are necessarily familiar with C or C++ syntax. – terdon Oct 22 '14 at 19:32
  • 3
    Regular expressions are useful for context-free grammars. C comments are not context-free. As comments can be in comments. /* can be in strings, but are not comments. etc. This question is asked regularly, unfortunately I can not remember where. The answers will tell you that regexps can not do it alone. Therefore you will need something more powerful such as awk. – ctrl-alt-delor Oct 22 '14 at 19:33
  • How do you want to deal with a comment in the middle of the line? E.g.: cout<<"a"<<endl; /* foo */ cout<<"b"<<endl;. – jimmij Oct 22 '14 at 19:36
  • http://unix.stackexchange.com/questions/72429/deleting-all-c-comments-with-sed?rq=1 – ctrl-alt-delor Oct 22 '14 at 19:37
  • @Barmar, that's exactly the problem: how? – Ray Andrews Oct 22 '14 at 21:30
  • this .*/\* does not read anything except `/*, but rather *anything except **the last/`*. it will happily match as many intervening `/'s as you could want to provide it. this[^/*]*/*is anything except/orthen/`. – mikeserv Oct 23 '14 at 08:30
  • 1
    @mikeserv, right you are, but your way fails to convert if there's any star or slash in the leading code, whereas my way turns " /* /* comment */ " into the illegal " /* // comment " (which would at least flag a compiler error). More study ... – Ray Andrews Oct 23 '14 at 15:48
  • yeah, thats what i meant by not knowing which cases to test for. i only hoped to show how you might. if you already know that always matching the last /* is the way to go, then youre good to go, because .*/\* will get the last every time. i just wanted to make it clear that it will gobble any preceding occurrences. – mikeserv Oct 23 '14 at 16:00
  • Yup. Very educational tho, and thanks. I'll study your answer below, I think there's meat in that that I can now digest. – Ray Andrews Oct 23 '14 at 18:36

3 Answers3

4

Try this:

sed 's,^\(.*\)/\*\([^/]*\)\*/$,\1//\2,'

This won't convert comments that contain embedded / characters. Alternatively, you could use:

sed 's,^\(.*\)/\*\(.*\)\*/$,\1//\2,'

This will do the wrong thing if you have two comments on the same line, e.g.

blah blah        /* comment1 */ blah /* comment2 */

will convert to

blah blah       // comment1 */ blah /* comment2

It might be possible to do better with a PCRE version of sed, as you could then use negative lookahead to test for embedded comments.

Note also that using , as the delimiter in the s command means that you don't have to escape all the / characters in the regexp -- that's the point of using some character other than / as the delimiter when the regexp will contain lots of /.

Barmar
  • 9,927
2

Probably the safest way is to first test for lines you don't want to affect and branch out of the script if you have a match.

sed '\|\*/.*/\*|b'

That's a little hard to read with all of the *stars in there, but basically if /* occurs after */ sed will quit executing its script, autoprint the line, and pull in the next line to begin the next line cycle. Any commands following that are not executed for a matching line.

Another way to do this is with test, which will similarly branch out of a script if it is provided no branch label following a successful s///ubstitution:

sed 's|/\*|&|2;t'

That attempts to replace the second occurrence of the pattern on the line with itself, and, if successful, it branches out in the same manner b does.

And so...

sed 's|/\*|&|2;s|\*/|&|2;t
     s|/\*\(.*\)\*/ *$|//\1|'

...will replace the first and only occurrence of /* with // on lines which end with the first and only occurrence of */ and any amount of trailing space characters. This works because t applies to any substitution occurring before it, and so if one or the other tests successful, sed branches out.

It may be that I blunder here, though, as I'm not very familiar with C or C++ and am uncertain what might happen in a /\*.*\*/.*\*/ case - which the above script branches away from . Perhaps you should instead be testing for only 2 */ or only 2 /*. Hopefully, at least though, I have managed to convey the concept to one who knows better.

mikeserv
  • 58,310
0

I happened to need the above - but also multiline - so I merged Barmar's answer with some sed of my own to achieve this

sed -e '/\/\*/,/\*\//{s/^\( *\)\/\*/\1~~/g;s/^\( *\) \*\//\1~~/g;s/^\( *\) \*/\1~~/g;s/~~/\/\//g};s/\*\*/\/\//g;s,^\(.*\)/\*\(.*\)\*/\s*$,\1//\2,'

(BTW if on a mac, you need gsed else you will get an error from the above)

Here is a diff before/after

***************
*** 1,13 ****

! /*
     FOO
!  */

! /*
!  * BAR
!  */

! blah blah        /* comment */
! blah blah / blah /* comment */
  blah blah        /* comment */ blah
  blah blah / blah /* comment */ blah
--- 1,13 ----

! //
     FOO
! //

! //
! // BAR
! //

! blah blah        // comment
! blah blah / blah // comment
  blah blah        /* comment */ blah
  blah blah / blah /* comment */ blah