0

I suspect the following has been answered already but I don't know the terminology for the issue I'm having well enough to find an existing answer.

I'm working on a command to go through a list of files and output on each line the filename followed by the count of lines that start with P. I've gotten this far:

find -type f | xargs -I % sh -c '{ echo %; grep -P "^P \d+" % | wc -l; }  | tr "\n" ","; echo ""; '

(The actual find command is a bit more involved but short story is it finds about 11k files of interest in the directory tree below where I'm running this)

This command is about 98% working for my purposes, but I discovered there is a small subset of files with parentheses in their names and I can't ignore them or permanently replace the parentheses with something else.

As a result I'm getting some cases like this:

sh: -c: line 0: syntax error near unexpected token `('

I know parentheses are a shell special character so for example if I was running grep directly on a single file with parentheses in the name I'd have to enclose the filename in single quotes or escape the parentheses. I tried swapping the quote types in my command (doubles outermost, singles inner) so I could put the '%' in the grep call in single quotes but that didn't help.

Is there a way to handle parentheses in the find -> xargs -> sh chain so they get handled correctly in the sh call?

SSilk
  • 153

3 Answers3

4

Better not embed data (filenames) directly in code (the shell scriptlet). Instead pass the filename as an argument to the shell you have xargs run:

find -type f | xargs -I % \
  sh -c '{ echo "$1"; grep -c -P "^P \d+" "$1"; } | tr "\n" ","; echo' sh %

Also you should be able to use grep -c instead of grep | wc -l, it at least makes the command a bit shorter.

ilkkachu
  • 138,973
  • Thanks for the quick reply! I tried your approach and it's giving me the error sh: -c: line 1: syntax error: unexpected end of file. Am I missing something? – SSilk Apr 19 '23 at 12:44
  • 1
    @SSilk, there was a missing ; before the } which I've added. – Stéphane Chazelas Apr 19 '23 at 12:46
  • @StéphaneChazelas Yes, just saw your edit after commenting. It's working for me now. Thanks! – SSilk Apr 19 '23 at 12:47
  • @ilkkachu Re: grep -c, thanks for the reminder. I was pretty sure grep had a built in counting function but when I searched for how to count matching lines with grep the first thread I found only showed piping grep matches into wc -l. So I just ran with that. – SSilk Apr 19 '23 at 12:48
  • Acknowledging that there are some good recommendations in other answers about using find -exec rather than ``xargs`, I'm accepting this answer as it directly answers the initial question of how to handle parentheses in filenames in specific scenario. – SSilk Apr 24 '23 at 07:38
3

Since you omitted the . in find . -type f, I suppose your find is GNU find, then you can do:

find . -type f -printf %p, -exec grep -cP '^P \d' {} ';'

If the file paths don't contain : characters, you could also do (with GNU grep):

grep -rcP '^P \d' . | tr : ,

If they may contain : characters but don't contain newline characters, that can be worked around by replacing only the last : in the line with ,:

grep -rcP '^P \d' . | LC_ALL=C sed 's/\(.*\):/\1,/'

That approach can also be used with:

find ... -type f -exec grep -cHP '^P \d' {} + | ...

If you still need to use find, for instance because you have more selection criteria.

  • Thanks for the suggestions. I think the issue with this approach is my real find command is a bit involved. (Finds files, recursively, only those with numerical extensions, rejects those with spaces and certain special characters, but does allow a few special characters like parentheses. So not something I know how to do with grep's built in filename filtering.). It just didn't have much bearing on the question I was asking so I shortened it to what's shown above. – SSilk Apr 19 '23 at 12:57
  • @SSilk, you can still use find with that approach, see edit. – Stéphane Chazelas Apr 19 '23 at 13:00
0

ilkkachu's answer looks like a fine improvement and is probably what you should do.

Adding, for information purposes, a lighter touch fix to show where your problem lies:

find -type f | xargs -I % sh -c '{ echo "%"; grep -P "^P \d+" "%" | wc -l; }  | tr "\n" ","; echo ""; '

Basically -- quote wrap the % that will be replaced.

bxm
  • 4,855
  • Even with quotes, that's still a command injection vulnerability (like with a $(reboot) file). The place holder should never be embedded in the code argument. – Stéphane Chazelas Apr 19 '23 at 12:45
  • Thanks for the tip. – bxm Apr 19 '23 at 12:47
  • Thanks for the info. These finer details of embedding place holders vs passing as arguments is new to me. I'll have to study up on it. The method I was using was demonstrated in the following post https://linuxize.com/post/linux-xargs-command/#:~:text=To%20run%20multiple%20commands%20with,the%20argument%20passed%20to%20xargs. – SSilk Apr 19 '23 at 12:52
  • Also: no need for xargs (and find's output is not compatible with xargs expected input format unless you use -print0/-0) as you can use find's -exec. No need to run one sh per file as sh can loop over arguments. echo can't be used for arbitrary data. Seepaste -sd , -to join lines with,`. – Stéphane Chazelas Apr 19 '23 at 12:53