3

What a long title. Essentially, what I have is a collection of files that need to be searched recursively with a regex, and replaced.

What I have so far works without capture groups, however it does nothing when using them. I am currently using a command that I found on another question:

grep -rlP "/\* *(\d+) *\*/ (.*)" . | xargs sed -i "s/\/\* *(\d+) *\*\/ (.*)/$2 \/\/ JD $1/g"

This regex is very confusing because it contains a lot of escaped asterisks and slashes, but essentially it takes in the string (for example)

/*  73 */   private static int last = -1000;

and replacing it with

private static int last = -1000; // JD 73

However, as I said earlier, it simply does not work, and the files are unchanged.

It works fine with an alternate regex that does not utilize capture groups

grep -rl "/\* *\*/ " . | xargs sed -i "s/\/\* *\*\/ //g"

but as soon as I try to introduce capture groups, it just silently fails.

I can tell it's searching through the files, as I can hear the drive spin up for a moment like with the successful one, but in the end the files remain unchanged.

Could it be possible to modify the command such that it works, or must I do it in a completely different way? Also, ideally the solution wouldn't require a bash loop. Thanks.

Moiré
  • 75

3 Answers3

4
  • Replace -P with -E in grep and use [[:digit:]] or [0-9]+ instead of (\d+) since you don't use any other Perl-compatible things and you don't need the parentheses
  • Remove (.*) from grep, this is redundant
  • Add -E to sed or you have to escape your capturing groups (...) and the +
  • Sed doesn't understand \d+, replace it with [[:digit:]] or [0-9]+
  • Replace the backreferences $1 with \1 and $2 with \2
  • I think you can safely remove the g, JD only creates one comment at the beginning of the line.

grep -Erl '/\* *[[:digit:]]+ *\*/' . |
  xargs sed -Ei 's/\/\* *([[:digit:]]+) *\*\/ (.*)/\2 \/\/ JD \1/'
Freddy
  • 25,565
  • Ah, so you recognized it's from JD! I was wondering if anybody would. I was under the impression the g would be required in order for it to work multiple times on the same file. And replacing the $ with \ creates an error: sed: -e expression #1, char 42: invalid reference \2 ons' command's RHS. (Command isgrep -rlE "/* [0-9]+ */ .*" . | xargs sed -iE "s//* ([0-9]+) */ (.*)/\2 // JD \1/"`) – Moiré Apr 29 '20 at 01:05
  • 1
    Yes, guilty, Java guy. No, you need the g to apply the substitution more than once on each line. Your command looks okay, you even have removed the parenthesis I forgot. Just replace -iE with -Ei. This confuses sed, it thinks E is the backup suffix and the expression is run without the -E option. You may also remove the .* in grep. – Freddy Apr 29 '20 at 01:21
  • Hmm. That almost worked. My command is now grep -rlE "/\* *[0-9]+ *\*/ " . | xargs sed -Ei "s/\/\* *([0-9]+) *\*\/ (.*)/\2 \/\/ JD \1/". However, it now puts \1 on a newline from \2 and I am not sure why. I assume that it is picking up the newline at the end, and inserting it in \2. However, I don't know how to get rid of this. I tried putting a \n at the end of the capture group (grep -rlE "/\* *[0-9]+ *\*/ .*\n" . | xargs sed -Ei "s/\/\* *([0-9]+) *\*\/ (.*)\n/\2 \/\/ JD \1/", however that made it never activate, I assume the newline isn't given to sed. Maybe I could trim it? – Moiré Apr 29 '20 at 03:20
  • 1
    Check if your files have DOS CRLF line endings (cat -A file on a modified file). This would "move" the modified comment to the the beginning of each line when printed with cat without -A. – Freddy Apr 29 '20 at 03:49
  • (Deleted comment was irrelevant) - Yes. It is DOS CRLF, as it displays ^M$ at the end of the line. After realizing this, I simply added a \r after the capture group so as not to capture it, and that solved it! Thank you for the help. – Moiré Apr 29 '20 at 16:46
1

In sed, captured groups are referenced with \1,\2, etc. instead of $1, $2, etc. See Back_002dreferences-and-Subexpressions.html

simonz
  • 131
1

Use only sed, like this example

echo "/*  73 */   private static int last = -1000;" | 
    sed 's#^/\*[[:blank:]]*\([0-9]*\)[[:blank:]]*\*/[[:blank:]]*\(.*\)$#\2 // JD \1#g'
private static int last = -1000; // JD 73