0

I need to use a bash script to delete full-line old-style comments from a C program, i.e., comments that begin (/*) and end (*/) on the same line, with no code on the same line.  This is an example of what the C program looks like:

/* Comment 1 */
printf("It is /* Comment 2 */\n");
x = 5; /* Comment 3 */
            /* Comment 4 */
/* Comment 5 */ y = 0;
            /*
             * Comment 6
             */
            // Comment 7

But I need it to look like this:

printf("It is /* Comment 2 */\n");
 x = 5; /* Comment 3 */
 /* Comment 5 */ y = 0;
            /*
             * Comment 6
             */
            // Comment 7

I do know how to delete all comments but just not sure on how to just remove certain ones.

The script should read inputs from a text file, and write outputs into another file, and all the I/O file names must be given in the command-line.

  • I edited my question, if that is what you were look for? @Jesse_b – Implicit May 22 '19 at 14:14
  • 1
    No it isn't. Should the script always delete the 1st and 4th comment or should it delete the comments literally named Comment 1 and Comment 4? Should it delete random comments? Is there any rhyme or reason to what you want to accomplish? – jesse_b May 22 '19 at 14:16
  • Oh sorry I understand you now. I need to be able to delete any comments that don't have a statement before or after the comment. @Jesse_b – Implicit May 22 '19 at 14:20
  • So any comment that is not inline with code? Does this include multiline comments? – jesse_b May 22 '19 at 14:21
  • 1
    Relating https://unix.stackexchange.com/q/317795/117549 and https://unix.stackexchange.com/q/503784/117549 and https://unix.stackexchange.com/q/33131/117549 – Jeff Schaller May 22 '19 at 14:22
  • Yes that is correct. No it does not need to include multiline comments. @Jesse_b – Implicit May 22 '19 at 14:24

4 Answers4

1

This sed one is portable:

sed '\_^[[:blank:]]*/\*.*\*/[[:blank:]]*$_d' file.c

All lines that begin (^) with zero or more blanks ([[:blank:]]*), start a comment (/\*), have anything else, (.*), close the comment (\*/) and have nothing but blanks in the rest of the line ([[:blank:]]*) will be deleted. Of course, you can also do that with grep -v.

Be aware that this will also delete lines like

/* between two comments */ x = 0; /* could be some code */
Philippos
  • 13,453
  • 3
    Why was this downvoted?  It’s a fine answer.  It fails in one, somewhat obscure edge case, but (1) the OP didn’t request that the answer handle that case, and (2) the author of the answer documented / disclosed the imperfection. – G-Man Says 'Reinstate Monica' Jun 17 '19 at 21:42
1

This is the same as Philippos’s answer except

  • It uses | as the regular expression delimiter (my personal preference).
  • It uses [[:space:]] instead of [[:blank:]][[:space:]] includes such non-graphic characters as vertical tab, form feed, and carriage return (in addition to space and tab); since C treats all those whitespace characters as blank spaces, [[:space:]] is really the better character class to use for handling C code.  And
  • It handles the edge case of multiple comments on the same line.

sed '\|^[[:space:]]*/\*.*\*/[[:space:]]*$| { \|\*/.*[^[:space:]]|!d }'

As in Philippos’s answer, it checks if the first non-blank thing on the line is /* and the last non-blank thing on the line is */.  If that’s true, we have a possible full-line comment; a candidate for removal.  In that case, enter the {} and look for a */ followed by something non-blank; i.e., a */ that’s not the last non-blank thing on the line.  If we find that, then we know that we have found the end of the first comment, and that there’s something else on the line.  In that case, do nothing.  If we don’t find a */ in the interior of the line, then delete the line.

0

Tested with sed command and worked fine

command:

  sed -r "s/^\s+//g" filename| sed '/^\/\*.*\*\/$/d'

output:

printf("It is /* Comment 2 */\n");
x = 5; /* Comment 3 */
/* Comment 5 */ y = 0;
/*
* Comment 6
*/
// Comment 7
  • 1
    if I'm reading this correctly, it will strip leading whitespace from every line (above and beyond removing one-line comments). – Jeff Schaller May 22 '19 at 15:35
  • Thanks for your help but unfortunately when I put in this command this is what I get $ sed -r "s/^\s+//g" test.c|sed '/^\/\*.*\*\/$/d' sed: illegal option -- r usage: sed script [-Ealn] [-i extension] [file ...] sed [-Ealn] [-i extension] [-e script] ... [-f script_file] ... [file ...] – Implicit May 23 '19 at 11:43
  • -r is only available for GNU sed. Most sed flavors (including GNU) understand -E for extended regular expressions instead. Anyhow, this code will modify c indention. – Philippos May 23 '19 at 19:32
0

This should get past the issue of code bracketed between comments

sed  -E '/^\s*\/\*/!bx ; /\*\/\s*$/!bx ; /\*\/\s*\S+.*\/\*/bx ; d;  :x' draft 

If a line doesn't start with a comment marker led only by whitespace then it starts with code so branch past the delete to x

/^\s*\/\*/!bx    

If the line starting as a comment doesn't end as a comment followed only by whitespace then there is code at the end so branch past the delete to x

/\*\/\s*$/!bx

These first two tests can be combined as

/^\s*\/\*.*\*\/\s*$/!bx

If the comment line contains a clsing comment marker followed by at least one non-whitespace character and then another comment start then there is code inside so branch past the delete to x

/\*\/\s*\S+.*\/\*/bx

Since we found no valid code then delete

d

Otherwise finish without doing anything

x

Tested on

/* Comment 1 */
printf("It is /* Comment 2 */\n");
x = 5; /* Comment 3 */
            /* Comment 4 */
/* Comment 5 */ y = 0;
            /*
             * Comment 6
             */
            // Comment 7
/* between two comments */ x = 0;  /*some code */

Output is

printf("It is /* Comment 2 */\n");
x = 5; /* Comment 3 */
/* Comment 5 */ y = 0;
            /*
             * Comment 6
             */
            // Comment 7
/* between two comments */ x = 0; /*some code */
bu5hman
  • 4,756
  • (1) Given that the OP explicitly wants some comments to be retained (not deleted), apparently in order to avoid any risk of accidentally deleting code, it seems like deleting even malformed code might be contrary to the OP’s wishes.  (2) I don’t understand why you’re using [^;]+.  Can you give an example of an input where your script would give the wrong result if it didn’t have the [^;]+? – G-Man Says 'Reinstate Monica' Jun 21 '19 at 05:37
  • The caveat is no different to that of @Phillippos. In his case embedded valid code is deleted, in mine only valid code is retained. If OP wanted to retain 'any text that may be malformed code' then the regex just changes from [^;]+.*\/ to .+\/]. It's OP decision which to use – bu5hman Jun 21 '19 at 06:26
  • @G-Man the [^;]+ is a lazy match for the ; at the end of a code statement. it is the test for valid code in this syntax. if you match on .*; and there is a ; within a comment it will fail – bu5hman Jun 21 '19 at 06:37
  • (3) I believe that your “correction” made the answer worse.  It will now delete this line:  /* infinite */  while (1) {  /* loop */. (4) I’m still struggling to identify non-trivial examples where [^;]+; and .*; will give different results. – G-Man Says 'Reinstate Monica' Jun 21 '19 at 18:19
  • Fair comment on the ; for wrapped lines. Will edit – bu5hman Jun 21 '19 at 18:34