Sed works on "records" (lines) which are defined by the presence of a trailing newline (\n
) character. This means you cannot match past a \n
because as far as sed
is concerned, the \n
is the end of the record. You can get around this, in GNU sed
, by using -z
to slurp the file and treat the entire thing as a single record (unless your file has NULLs (\0
) in it, in which case each \0
will define a record):
$ sed -zE 's|/\*.*\n.*\*/||' file.c
#include <stdio.h>
int main()
{
// this is a dummy function
float sum = 0;
// testing the sed commands
int x = 6; // single-line comment
x = x + 5;
char y = 'n';
}
However, this will fail if you have multiple multi-line comments in the same file because sed
cannot do non-greedy matching, so it will always try and find the longest possible match which means it would match from the first /*
to the last */
. So use a tool that can do non-greedy matching, like perl
:
$ perl -0777 -pe 's|/\*.*?\n.*?\*/||gs' file.c
#include <stdio.h>
int main()
{
// this is a dummy function
float sum = 0;
// testing the sed commands
int x = 6; // single-line comment
x = x + 5;
char y = 'n';
}
This, however, will fail if you have a single line /* */
comment. The safest way I can think of is to forget about trying to do this with regular expressions and instead write a little script that keeps count of opening and closing comment tags and deletes accordingly.
Another problem is that a string with /*
or */
will also break it. For example, what if you have something like:
char foo [ ] = "A comment starts with /*";
At the end of the day, the only safe way of doing this will be something like this SO answer by Ed Morton which uses a C preprocessor:
If this is in a C file then you MUST use a C preprocessor for this in
combination with other tools to temporarily disable specific
preprocessor functionality like expanding #defines or #includes, all
other approaches will fail in edge cases. This will work for all
cases:
[ $# -eq 2 ] && arg="$1" || arg=""
eval file="\$$#"
sed 's/a/aA/g; s/__/aB/g; s/#/aC/g' "$file" |
gcc -P -E $arg - |
sed 's/aC/#/g; s/aB/__/g; s/aA/a/g'
Put it in a shell script and call it with the name of the file you
want parsed, optionally prefixed by a flag like "-ansi" to specify the
C standard to apply.
See https://stackoverflow.com/a/35708616/1745001 for details.
sed
you have. – terdon Apr 15 '21 at 09:11s//*(.|n)*?*///
. See https://mywiki.wooledge.org/Quotes, https://unix.stackexchange.com/q/68694/170373, https://unix.stackexchange.com/q/400447/170373, https://unix.stackexchange.com/q/503013/170373 – ilkkachu Apr 15 '21 at 09:33sed: -e expression #1, char 14: unknown option to 's'
because there'ss//...///
, i.e. extra slashes after thes///
. Unless your shell is something funky, that is, but the other common not-so-POSIX shells like fish, Zsh and tcsh would complain about that glob not matching anything. – ilkkachu Apr 15 '21 at 10:06\/\*(.|\n)*?\*\/
looks a lot like a Perl-style regex that would match single and multi-line comments. Please [edit] your answer to include the constraints, comments are mostly just good for stuffing information out of sight. – ilkkachu Apr 15 '21 at 10:11// hi /* there
is not the start of a/*
-style comment, andprintf("/* hello */");
also contains comments. – ilkkachu Apr 15 '21 at 10:58