I occasionally see things like:
cat file | wc | cat > file2
Why do this?
When will the results (or performance) differ (favourably) from simply:
cat file | wc > file2
I occasionally see things like:
cat file | wc | cat > file2
Why do this?
When will the results (or performance) differ (favourably) from simply:
cat file | wc > file2
Both of those examples are useless uses of cat. Both are equivalent to wc < file1 > file2
. There is no reason to use cat
in this example, unless you are using cat file
as a temporary stand-in for something that dynamically generates output.
cat
is not necessarily useless here. The command wc file
prints the counters followed by the name of the file. The command cat file | wc
does not print the name of the file. The second cat
is useless. wc file1 file2
prints two lines of counts, one for each file (plus the file names). cat file1 file2 | wc
prints one line with the total counts, and no file names.
– alephzero
Aug 24 '15 at 22:34
cat file |wc
separates the line/word/character counts with more spaces than wc <file
does. I don't know why.
– Nate Eldredge
Aug 25 '15 at 02:12
wc < file1
causes wc
to run with stdin being a file descriptor for a regular seekable, mmappable file, file1
. cat file1 | wc
causes wc
to run with a non-seekable pipe on stdin.
– R.. GitHub STOP HELPING ICE
Aug 25 '15 at 02:19
cat
is a handy placeholder to be able to pop other commands in and out without rearranging the pipeline.
– Reid
Aug 25 '15 at 02:54
file1
does not end in a linefeed, then cat file1 file2 | wc
will count one fewer lines, and potentially one fewer words, than would wc file1 file2
(which sees the "break" between file1
and file2
).
– Iwillnotexist Idonotexist
Aug 26 '15 at 02:52
cat file | wc | cat > file2
would usually be two useless uses of cat
as that's functionally equivalent to:
< file wc > file2
However, there may be a case for:
cat file | wc -c
over
< file wc -c
That is to disable the optimisation that many wc
implementations do for regular files.
For regular files, the number of bytes in the file can be obtained without having to read the whole content of the file, but just doing a stat()
system call on it and retrieve the size as stored in the inode.
Now, one may want the file to be read for instance because:
the stat()
information cannot be trusted (like for some files in /proc
or /sys
on Linux):
$ < /sys/class/net/lo/mtu wc -c
4096
$ cat /sys/class/net/lo/mtu | wc -c
6
Of course, those are exceptions. In the general case, you'd rather use < file wc -c
for performance reasons.
Now, you can imagine even more far fetched scenarios where one may want to use: cat file | wc | cat > file2
:
wc
has an apparmor profile or other security mechanism that prohibits it from reading or writing to files while it's allowed for cat
(that would be unheard of)cat
is able to deal with large (as in > 232 bytes) files, but not wc
on that system (things like that have been needed for some commands on some systems in the past).wc
(and the first cat
) to run and read the whole file (and be killed at the very last minute) even if file2
can't be open for writing.file
. Though wc < file > file2 || :
would make more sense.lsof
(list open files)) the fact that he's getting a word count from file
or that he's storing a word count in file2
.While I don't disagree with the argument for saying it is a 'useless use of cat', there can be reasons for it:
In many languages (including English) words and sentences are read from left to right, so showing the flow of data in the same way can appear more natural to the reader.
A reason for the second cat
could be to mask the return code. Such as:
$ wc < /etc/passw
sh: /etc/passw: Cannot find or open the file.
$ echo $?
1
Whereas with cat
:
$ wc < /etc/passw | cat
sh: /etc/passw: Cannot find or open the file.
$ echo $?
0
This can come into play if the shell has set -e
set. In the first example, this would abort the shell after wc
whereas in the latter example it would continue on. Obviously there are other ways of dealing with this.
Also, the performance difference of the two statements (ie with or without cat) is negligible (esp. on today's machines) and if it was important, shell is the wrong language to use.
wc < /etc/passw
giving two completely different messages in contexts that are essentially indistinguishable.
– Scott - Слава Україні
Aug 25 '15 at 01:48
< file1 wc > file2
– pabouk - Ukraine stay strong
Aug 25 '15 at 07:15
cat
instead of a redirection.
– user
Aug 25 '15 at 08:12
wc
:p Thanks for an answer on "the other side", leaving larsk's accepted as for me - and it seems generally - cat
is not necessary here. But yours is a useful reference for behaviour with it, or scripting where it can easily be replaced by something else. (In particular I can see it being useful for say a "variable command" - might want to default, or change it to "nothing" without modifying the rest of the script.)
– OJFord
Aug 25 '15 at 09:40
|| true
is a lot more idiomatic and obvious than | cat
.
– Kevin
Aug 25 '15 at 16:51
set +o pipefail
, which is POSIX, but a common bash recommendation (and good idea IMO) is to set it, i.e. false | true
(note pipe not ||
) should fail.
– OJFord
Mar 30 '22 at 09:29
Let's suppose prog
forks a new subprocess and exits, and the new subprocess writes something to its standard output and then exits.
Then the command
prog
won't wait for the subprocess to exit, and it will display the shell prompt early. But the command
prog | cat
will wait for an EOF on the stdin of cat
, which effectively waits for the subprocess to exit. So this is a useful use of cat
.
bash
. To test it one may run ( (while ((i<10)); do echo $((i++)); sleep 1; done) & exit ; ) | cat
with and without final cat
.
– jimmij
Aug 25 '15 at 19:04
prog | cat
could return before prog
has returned (In those shells, it will return as soon as cat
returns, which will happen as soon as prog
(and its children if any) has closed all its fds to the pipe)). Try for instance with prog
being sh -c 'echo A; exec >&-; sleep 2; echo >&2 B'
.
– Stéphane Chazelas
Aug 26 '15 at 16:00
The statement contains two uses of cat.
cat file | wc | cat > file2
Clearly the 2nd cat is of no value, as
cat file | wc > file2
has the same meaning in all shells I have ever used.
However
< file wc > file2
does not work in all shells.
Not everyone is using a modem shell on a modem version of unix. (It can be off benefit to write pipeline in a way that work on all systems that have the commands in the pipe installed, even if some of these commons don't ship as standard with the given OS.)
< file wc > file2
is Bourne and POSIX and also works in (t)csh, rc and es. The only shell I could find that doesn't support it is fish
(which is the most modern of them all). It even worked in the pre-Bourne sh of Unix V1 in 1970!
– Stéphane Chazelas
Aug 26 '15 at 15:08
cat
(a typical Unix command) is available.
– Stéphane Chazelas
Aug 26 '15 at 15:52
cat
but I like for the input file to be the leftmost thing and for the commands that operate on it to appear after it. – chicks Aug 25 '15 at 19:55cat
would still be the leftmost thing and the file it's operating on would appear after it – Eric Renouf Aug 25 '15 at 21:58