If you stand quietly in the hallways of Unix&Linux and listen carefully,
you’ll hear a ghostly voice, pitifully wailing,
“What about filenames that contain newlines?”
ls -d *snp* | wc -l
or, equivalently,
printf "%s\n" *snp* | wc -l
will output all the filenames that contain snp
,
each followed by a newline,
but also including any newlines in the filenames,
and then count the number of lines in the output.
If there is a file whose name is
f o o
s n p \n
b a r
. t s v
then that name will be written out as
foosnp
bar.tsv
which, of course, will be counted as two lines.
There are a few alternatives that do better in at least some cases:
printf "%s\n" * | grep -c snp
which counts the lines that contain snp
,
so the foosnp(\n)bar.tsv
example from above counts only once.
A slight variation on this is
ls -f | grep -c snp
The above two commands differ in that:
- The
ls -f
will include files whose names begin with .
;
the printf … *
does not, unless the dotglob
shell option is set.
printf
is a shell builtin; ls
is an external command.
Therefore, the ls
might use slightly more resources.
- When the shell processes a
*
, it sorts the filenames;
ls -f
does not sort the filenames.
Therefore, the ls
might use slightly less resources.
But they have something in common:
they will both give wrong results in the presence of filenames
that contain newline and have snp
both before and after the newline.
Another:
filenamelist=(*snp*)
echo ${#filenamelist[@]}
This creates a shell array variable listing all the filenames that
contain snp
, and then reports the number of elements in the array.
The filenames are treated as strings, not lines,
so embedded newlines are not an issue.
It is conceivable that this approach could have a problem
if the directory is huge,
because the list of filenames must be held in shell memory.
Yet another:
Earlier, when we said printf "%s\n" *snp*
,
the printf
command repeated (reused) the "%s\n"
format string
once for each argument in the expansion of *snp*
.
Here, we make a small change in that:
printf "%.0s\n" *snp* | wc -l
This will repeat (reuse) the "%.0s\n"
format string
once for each argument in the expansion of *snp*
.
But "%.0s"
means to print the first zero characters of each string —
i.e., nothing.
This printf
command will output only a newline (i.e., a blank line)
for each file that contains snp
in its name;
and then wc -l
will count them.
And, again, you can include the .
files by setting dotglob
.