The "definitive" answer is of course brought to you by The Useless Use of cat
Award.
The purpose of cat is to concatenate (or "catenate") files. If it's only one file, concatenating it with nothing at all is a waste of time, and costs you a process.
Instantiating cat just so your code reads differently makes for just one more process and one more set of input/output streams that are not needed. Typically the real hold-up in your scripts is going to be inefficient loops and actuall processing. On most modern systems, one extra cat
is not going to kill your performance, but there is almost always another way to write your code.
Most programs, as you note, are able to accept an argument for the input file. However, there is always the shell builtin <
that can be used wherever a STDIN stream is expected which will save you one process by doing the work in the shell process that is already running.
You can even get creative with WHERE you write it. Normally it would be placed at the end of a command before you specify any output redirects or pipes like this:
sed s/blah/blaha/ < data | pipe
But it doesn't have to be that way. It can even come first. For instance your example code could be written like this:
< data \
sed s/bla/blaha/ |
grep blah |
grep -n babla
If script readability is your concern and your code is messy enough that adding a line for cat
is expected to make it easier to follow, there are other ways to clean up your code. One that I use a lot that helps make scripts easiy to figure out later is breaking up pipes into logical sets and saving them in functions. The script code then becomes very natural, and any one part of the pipline is easier to debug.
function fix_blahs () {
sed s/bla/blaha/ |
grep blah |
grep -n babla
}
fix_blahs < data
You could then continue with fix_blahs < data | fix_frogs | reorder | format_for_sql
. A pipleline that reads like that is really easy to follow, and the individual components can be debuged easily in their respective functions.
cat
. However I think the bigger question here is code readability which often is a priority over performance. When faster can actually be written prettier, why not? Pointing out the issue withcat
usually leads to the user having a better understanding of pipelines and processes in general. It's worth the effort so they write comprehensible code next time around. – Caleb Jul 08 '11 at 15:03cat
; Caleb's point about using functions and redirection solves that as well.) – Cascabel Jul 17 '11 at 07:15