2

I have multiple fasta files I want to count the lines starting with ">" (no quotes).

What I usually do is

grep ">" file.fasta | wc -l

This works for one file at the time. I'm trying different alternatives using find command but nothing seems to do the trick. What I want at the end is a line by line, file name and count the lines starting with >. Ideally it has to be a one-liner

user1532587
  • 193
  • 1
  • 1
  • 6

1 Answers1

7

grep can do the counting for you with its -c flag, so wc -l is not needed. Also, grep takes multiple files as input, if you provide them.

For example,

grep -c '^>' some/dir/*.fa

To do it recursively, use grep -Rc '^>' dirname if you have a grep that knows the -R option (this would run over all files), otherwise use find:

find dirname -type f \( -name '*.fa' -o -name '*.fasta' \) -exec grep -c '^>' /dev/null {} +

The extra /dev/null in the command above ensures that grep gets at least two input files, which in turn ensures that it will always display the name of the file that it processes (it does not do that with a single input file). One could also use -H with grep, although this is a non-standard option.

Or, with your original command plugged into a loop that is fed with pathnames from find:

find dirname -type f \( -name '*.fa' -o -name '*.fasta' \) -exec sh -c '
    for pathname do
        printf "Counting in %s...\n" "$pathname"
        grep "^>" "$pathname" | wc -l
    done' sh {} +

Since your command does not report the filename by itself, I added a printf statement that mentions it.

Related:

Kusalananda
  • 333,661