3

I'd like to run two piped commands on the results of find on some nested csv files, but I miserably fail.

Here is the idea:

$ find ./tmp/*/ -name '*.csv' -exec tail -n +2 {} | wc -l \;

in order not to count the header row of each CSV file.

The command is failing on:

wc: ';': No such file or directory
find: missing argument to `-exec'

Do I really need to do a for loop in that case?
E.g.:

$ for f in ./tmp/*/*.csv; do tail -n +2 ${f} | wc -l; done

but with that I'm losing the nice output of find which does include the filename along the count.

I'm also losing the file name when using this solution: pipe commands inside find -exec?

$ find ./tmp/*/ -type f -name "*.csv" -print0 | while IFS= read -d '' f; do tail -n +2 "${f}" | wc -l; done

A little precision; when I speak about the filename that gets printed, it's because I'm used to the following result when calling the commands on a single file:

$ tail -n +2 | wc -l ./tmp/myfile.csv 
2434 ./tmp/myfile.csv

I use Ubuntu 18.04.

s.k
  • 461
  • Are you sure about tail -n +2 | wc -l ./tmp/myfile.csv? Its wc does not read from the pipe. (... | wc -l or ... | wc -l file - would, but the former won't print the name of the file(s) whose lines come fro the pipe, while the latter won't read file's lines from the pipe). – fra-san Feb 28 '21 at 11:42
  • Does any of the CSV files have (or could they have) embedded newlines in any fields? – Kusalananda Mar 05 '21 at 10:02

2 Answers2

5

If you write

find ... -exec foo | bar \;

the vertical bar is interpreted by your shell before find is invoked. The left hand of the resulting pipeline is find ... -exec foo, which obviously gives a "missing argument to `-exec'" error; the right hand of the pipeline is bar.

Protecting the vertical bar from the shell, as in

find ... -exec foo \| bar \;

is of no help, because the first token after -exec is interpreted by find as a command and all the following tokens, up to (but not including) the ; or + terminator, are taken as arguments to that command.

See Understanding the -exec option of `find` for a thorough explanation.

To use a pipeline with -exec you need to invoke a shell. For instance:

find ./tmp/*/ -name '*.csv' -exec sh -c '
  printf "%s %s\n" "$(tail -n +2 "$1" | wc -l)" "$1"' mysh {} \;

Then, to avoid risking an "argument list too long" error, ./tmp/*/ can be rewritten as

find ./tmp -path './tmp/*/*' ...

or, more precisely, to also exclude tmp's hidden subdirectories (as ./tmp/*/ would likely do by default), as

find ./tmp -path './tmp/.*' -prune -o -path './tmp/*/*' ...

Finally, you may use the faster -exec ... {} + variant, which avoids invoking a shell for any single found file. For instance, with awk in place of tail and wc:

find ./tmp -path './tmp/.*' -prune -o -path './tmp/*/*' \
  -name '*.csv' -exec awk '
    BEGIN { skip = 1 }
    FNR > skip { lc[FILENAME] = (FNR - skip) }
    END { for (f in lc) print lc[f],f }' {} +

(Note that awk also counts those malformed lines that do not end in a newline character, while wc does not).

fra-san
  • 10,205
  • 2
  • 22
  • 43
0

if all you want is to essentially subtract 1 from each wc -l, this is very simple and clean:

find [whatever you want] -exec wc -l {} + | perl -pe 's/(\d+)/$1-1/e'