Why there is such a difference in execution time of echo and cat?

Question

Answering this question caused me to ask another question:
I thought the following scripts do the same thing and the second one should be much faster, because the first one uses cat that needs to open the file over and over but the second one opens the file only one time and then just echoes a variable:

(See update section for correct code.)

First:

#!/bin/sh
for j in seq 10; do
  cat input
done >> output

Second:

#!/bin/sh
i=`cat input`
for j in seq 10; do
  echo $i
done >> output

while input is about 50 megabytes.

But when I tried the second one, it was too ,too slow because echoing the variable i was a massive process. I also got some problems with the second script, for example the size of output file was lower than expected.

I also checked the man page of echo and cat to compare them:

echo - display a line of text

cat - concatenate files and print on the standard output

But I didn't get the difference.

So:

Why cat is so fast and echo is so slow in the second script?
Or is the problem with variable i ? ( because in the man page of echo it is said it displays "a line of text" and so I guess it is optimized only for short variables, not for very very long variables like i. However, that is only a guess.)
And why I got problems when I use echo?

UPDATE

I used seq 10 instead of `seq 10` incorrectly. This is edited code:

First:

#!/bin/sh
for j in `seq 10`; do
  cat input
done >> output

Second:

#!/bin/sh
i=`cat input`
for j in `seq 10`; do
  echo $i
done >> output

(Special thanks to roaima.)

However, it is not the point of the problem. Even if the loop occurs only one time, I get the same problem:cat works much faster than echo.

and what about cat $(for i in $(seq 1 10); do echo "input"; done) >> output ? :) — netmonk, Sep 17 '15 at 08:09
The echo is faster. What you're missing is that you're making the shell do far too much work by not quoting the variables when you use them. — Chris Davies, Sep 17 '15 at 08:14
Quoting the variables is not the problem; the problem is the variable i itself (i.e. using it as an intermediate step between input and output). — Aleksander, Sep 17 '15 at 09:24
echo $i -- don't do this. Use printf and quote the argument. — Petr Skocik, Sep 17 '15 at 10:11
@Aleksander : Would you explain more? Why using variable i as an intermediate step between input and output is a problem? Do you mean the same as what has been said in answers? — Mohammad, Sep 17 '15 at 12:48
@PSkocik: In comparison to the time that execution of loop takes, using the subshell is not a problem and does not take much time and memory. — Mohammad, Sep 17 '15 at 12:51
@PSkocik What I'm saying is you want printf '%s' "$i", not echo $i. @cuonglm explains some of the problems of echo well in his answer. For why even quoting isn't enough in some cases with echo, see http://unix.stackexchange.com/questions/65803/why-is-printf-better-than-echo — Petr Skocik, Sep 17 '15 at 13:25
@mohammad.k reading 50MB into a variable just to then dump it to stdout is useless, and cpu consuming. — Aleksander, Sep 17 '15 at 15:53

Stéphane Chazelas · Accepted Answer · 2015-09-17T15:02:01.783

There are several things to consider here.

i=`cat input`

can be expensive and there's a lot of variations between shells.

That's a feature called command substitution. The idea is to store the whole output of the command minus the trailing newline characters into the i variable in memory.

To do that, shells fork the command in a subshell and read its output through a pipe or socketpair. You see a lot of variation here. On a 50MiB file here, I can see for instance bash being 6 times as slow as ksh93 but slightly faster than zsh and twice as fast as yash.

The main reason for bash being slow is that it reads from the pipe 128 bytes at a time (while other shells read 4KiB or 8KiB at a time) and is penalised by the system call overhead.

zsh needs to do some post-processing to escape NUL bytes (other shells break on NUL bytes), and yash does even more heavy-duty processing by parsing multi-byte characters.

All shells need to strip the trailing newline characters which they may be doing more or less efficiently.

Some may want to handle NUL bytes more gracefully than others and check for their presence.

Then once you have that big variable in memory, any manipulation on it generally involves allocating more memory and coping data across.

Here, you're passing (were intending to pass) the content of the variable to echo.

Luckily, echo is built-in in your shell, otherwise the execution would have likely failed with an arg list too long error. Even then, building the argument list array will possibly involve copying the content of the variable.

The other main problem in your command substitution approach is that you're invoking the split+glob operator (by forgetting to quote the variable).

For that, shells need to treat the string as a string of characters (though some shells don't and are buggy in that regard) so in UTF-8 locales, that means parsing UTF-8 sequences (if not done already like yash does), look for $IFS characters in the string. If $IFS contains space, tab or newline (which is the case by default), the algorithm is even more complex and expensive. Then, the words resulting from that splitting need to be allocated and copied.

The glob part will be even more expensive. If any of those words contain glob characters (*, ?, [), then the shell will have to read the content of some directories and do some expensive pattern matching (bash's implementation for instance is notoriously very bad at that).

If the input contains something like /*/*/*/../../../*/*/*/../../../*/*/*, that will be extremely expensive as that means listing thousands of directories and that can expand to several hundred MiB.

Then echo will typically do some extra processing. Some implementations expand \x sequences in the argument it receives, which means parsing the content and probably another allocation and copy of the data.

On the other hand, OK, in most shells cat is not built-in, so that means forking a process and executing it (so loading the code and the libraries), but after the first invocation, that code and the content of the input file will be cached in memory. On the other hand, there will be no intermediary. cat will read large amounts at a time and write it straight away without processing, and it doesn't need to allocate huge amount of memory, just that one buffer that it reuses.

It also means that it's a lot more reliable as it doesn't choke on NUL bytes and doesn't trim trailing newline characters (and doesn't do split+glob, though you can avoid that by quoting the variable, and doesn't expand escape sequence though you can avoid that by using printf instead of echo).

If you want to optimise it further, instead of invoking cat several times, just pass input several times to cat.

yes input | head -n 100 | xargs cat

Will run 3 commands instead of 100.

To make the variable version more reliable, you'd need to use zsh (other shells can't cope with NUL bytes) and do it:

zmodload zsh/mapfile
var=$mapfile[input]
repeat 10 print -rn -- "$var"

If you know the input doesn't contain NUL bytes, then you can reliably do it POSIXly (though it may not work where printf is not builtin) with:

i=$(cat input && echo .) || exit # add an extra .\n to avoid trimming newlines
i=${i%.} # remove that trailing dot (the \n was removed by cmdsubst)
n=10
while [ "$n" -gt 10 ]; do
  printf %s "$i"
  n=$((n - 1))
done

But that is never going to be more efficient than using cat in the loop (unless the input is very small).

It's worth to mention that in case of long argument, you can get the out of memory. Example /bin/echo $(perl -e 'print "A"x999999') — cuonglm, Sep 17 '15 at 09:01
You are mistaken with the assumption that read size has a significant influence, so read my answer to understand the real reason. — schily, Sep 17 '15 at 12:13
@schily, doing 409600 reads of 128 bytes takes more time (system time) than 800 reads of 64k. Compare dd bs=128 < input > /dev/null with dd bs=64 < input > /dev/null. Of the 0.6s it takes to bash to read that file, 0.4 are spent in those read system calls in my tests, while other shells spend a lot less time there. — Stéphane Chazelas, Sep 17 '15 at 12:18
Well, you do not seem to have run a real performance analysis. The influence of the read call (when comparing different read sizes) is aprox. 1% of the whole time while the functions readwc() and trim() in the Burne Shell take 30% of the whole time and this is most likely underestimated as there is no libc with gprof annotation for mbtowc(). — schily, Sep 17 '15 at 12:30
@mohammad.k The x in \x was meant as a placeholder. By \x, I mean all the \015, \n, \r... that Unix conformant echo implementations are meant to expand. The behaviour for \x alone is not specified but some echo implementations expand it to the NUL byte (short for \x00). — Stéphane Chazelas, Sep 17 '15 at 14:42
@schily, you're talking of the Bourne shell, and I'm talking of bash. I'm saying that in i=\cat input``, at least on Linux, bash spends a large proportion of the time doing many small read system calls. It's not doing any readwc() nor trim() nor mbtowc(). — Stéphane Chazelas, Sep 17 '15 at 14:55
@StéphaneChazelas: What is the best way to see the different between shells about spending time to reading output from command substitution? — cuonglm, Sep 17 '15 at 15:33
@cuonglm, You can use times: bash -c 'a=$(cat a); times' — Stéphane Chazelas, Sep 17 '15 at 16:21
@cuonglm That will grossly be the system time I'd say. Note how it's the same as the system time for dd bs=128. — Stéphane Chazelas, Sep 17 '15 at 16:47
@StéphaneChazelas: Ah, of course, my bad. Got it now. Thanks. — cuonglm, Sep 17 '15 at 16:49
Well, that would rather be more like dd ibs=128 obs=1M > /dev/null since otherwise dd would also do a lot of write() system calls (to /dev/null). In that case though, the system time for dd is less than for bash in my tests, though from strace -c it's not clear why. — Stéphane Chazelas, Sep 17 '15 at 16:56

cuonglm · Answer 2 · 2015-09-17T12:18:14.023

12

The problem is not about cat and echo, it's about the forgotten quote variable $i.

In Bourne-like shell script (except zsh), leaving variables unquote cause glob+split operators on the variables.

$var

is actually:

glob(split($var))

So with each loop iteration, the whole content of input (exclude trailing newlines) will be expanded, splitting, globbing. The whole process require shell to allocate memory, parsing the string again and again. That's the reason you got the bad performance.

You can quote the variable to prevent glob+split but it won't help you much, since when the shell still need to build the big string argument and scan its content for echo (Replacing builtin echo with external /bin/echo will give you the argument list too long or out of memory depend on the $i size). Most of echo implementation isn't POSIX compliant, it will expand backslash \x sequences in arguments it received.

With cat, the shell only needs to spawn a process each loop iteration and cat will do the copy i/o. The system can also cache the file content to make the cat process faster.

edited Sep 17 '15 at 12:18

answered Sep 17 '15 at 08:02

cuonglm

153,898

2

@roaima: You didn't mention the glob part, which can be a huge reason, imaging something that /*/*/*/*../../../../*/*/*/*/../../../../ can be in the file content. Just want to point out the details. – cuonglm Sep 17 '15 at 08:11
Gotcha thank you. Even without that, the timing doubles when using an unquoted variable – Chris Davies Sep 17 '15 at 08:12
1

`time echo $( <xdditg106) >/dev/null
real 0m0.125s user 0m0.085s sys 0m0.025s time echo "$( <xdditg106)" >/dev/null

real 0m0.047s user 0m0.016s sys 0m0.022s `
– netmonk Sep 17 '15 at 08:17
I didn't got why quoting cannot solve the problem. I need more description. – Mohammad Sep 17 '15 at 08:25
1

@mohammad.k: As I wrote in my answer, quote variable prevent glob+split part, and it will speed up the while loop. And I also noted that it won't help you much. Since when most of the shell echo behavior isn't POSIX compliant. printf '%s' "$i" is better. – cuonglm Sep 17 '15 at 08:28
@cuonglm : You say quoting won't help me much. This is where I do not understand why.Why you say echo behavior is not POSIX compliant? Why you say printf is better? I tried it but I got almost the same execution time. – Mohammad Sep 17 '15 at 09:17
I added some details. For why printf is better than echo, you can see this answer. – cuonglm Sep 17 '15 at 09:30
This link will give a more description about glob+split:http://unix.stackexchange.com/questions/108963/expansion-of-a-shell-variable-and-effect-of-glob-and-split-on-it – Mohammad Sep 17 '15 at 13:52

schily · Answer 3 · 2015-09-17T12:51:31.170

2

If you call

i=`cat input`

this lets your shell process grow by 50MB up to 200MB (depending on the internal wide character implementation). This may make your shell slow but this is not the main problem.

The main problem is that the command above needs to read the whole file into shell memory and the echo $i needs to do field splitting on that file content in $i. In order to do field splitting, all text from file needs to be converted into wide characters and this is where most of the time is spent.

I did some tests with the slow case and got these results:

Fastest is ksh93
Next is my Bourne Shell (2x slower that ksh93)
Next is bash (3x slower than ksh93)
Last is ksh88 (7x slower than ksh93)

The reason why ksh93 is the fastest seems to be that ksh93 does not use mbtowc() from libc but rather an own implementation.

BTW: Stephane is mistaken that the read size has some influence, I compiled the Bourne Shell to read in 4096 byte chunks instead of 128 bytes and got the same performance in both cases.

edited Sep 17 '15 at 12:51

answered Sep 17 '15 at 12:10

schily

19,173

The i=\cat input`` command does not do field splitting, it's the echo $i that does. The time spent on i=\cat input`` will be negligible compared to echo $i, but not compared to cat input alone, and in the case of bash, the difference is for the best part due to bash doing small reads. Changing from 128 to 4096 will have no influence on the performance of echo $i, but that was not the point I was making. – Stéphane Chazelas Sep 17 '15 at 12:24
Also note that the performance of echo $i will vary considerably depending on the content of the input and the filesystem (if it contains IFS or glob characters), which is why I did not do any comparison of shells on that in my answer. For instance, here on the output of yes | ghead -c50M, ksh93 is the slowest of all, but on yes | ghead -c50M | paste -sd: -, it's the fastest. – Stéphane Chazelas Sep 17 '15 at 12:40
When talking about the total time, I was talking about the whole implementation and yes, of course the field splitting happens with the echo command. and this is where most of the whole time is spent. – schily Sep 17 '15 at 12:50
You are of course correct that the performance depends on the contents od $i. – schily Sep 17 '15 at 13:21

Chris Davies · Answer 4 · 2015-09-17T08:05:38.943

1

In both cases, the loop will be run just twice (once for the word seq and once for the word 10).

Futhermore both will merge adjacent whitespace, and drop leading/trailing whitespace, so that the output is not necessarily two copies of the input.

First

#!/bin/sh
for j in $(seq 10); do
    cat input
done >> output

Second

#!/bin/sh
i="$(cat input)"
for j in $(seq 10); do
    echo "$i"
done >> output

One reason why the echo is slower may be that your unquoted variable is being split at whitespace into separate words. For 50MB that will be a lot of work. Quote the variables!

I suggest you fix these errors and then re-evaluate your timings.

I have tested this locally. I created a 50MB file using the output of tar cf - | dd bs=1M count=50. I also extended the loops to run by a factor of x100 so that the timings were scaled to a reasonable value (I added a further loop around your entire code: for k in $(seq 100); do ... done). Here are the timings:

time ./1.sh

real    0m5.948s
user    0m0.012s
sys     0m0.064s

time ./2.sh

real    0m5.639s
user    0m4.060s
sys     0m0.224s

As you can see there is no real difference, but if anything the version containing echo does run marginally faster. If I remove the quotes and run your broken version 2 the time doubles, showing that the shell is having to do far more work that should be expected.

time ./2original.sh

real    0m12.498s
user    0m8.645s
sys     0m2.732s

edited Sep 17 '15 at 08:05

answered Sep 17 '15 at 06:26

Chris Davies

116,213
16
160
287

Actually the loop runs 10 times, not twice. – fpmurphy Sep 17 '15 at 07:06
I did as you said, but the problem has not been solved. cat is very, very faster than echo. The first script runs in an average of 3 second, but the second one runs in an average of 54 seconds. – Mohammad Sep 17 '15 at 07:26
@fpmurphy1 :No. I tried my code. The loop runs only twice, not 10 times. – Mohammad Sep 17 '15 at 08:02
@mohammad.k for the third time: if you quote your variables, the problem goes away. – Chris Davies Sep 17 '15 at 08:07
@roaima :What does the command tar cf - | dd bs=1M count=50 do? Does it make a regular file with same characters inside it? If so, in my case the input file is completely irregular with all kind of characters and whitespaces. And again, I used time as you have used, and the result was the one that I said: 54 seconds vs 3 seconds. – Mohammad Sep 17 '15 at 08:15
@mohammad.k Please just quote your variables and try it again. – Chris Davies Sep 17 '15 at 08:19

score -2 · Answer 5 · answered Sep 17 '15 at 06:27

-2

The echo is meant to put 1 line on the screen. What you do in the second example is that you put the content of the file in a variable and then you print that variable. In the first one you immediately put the content on the screen.

cat is optimised for this usage. echo is not. Also putting 50Mb in an environment variable is not a good idea.

answered Sep 17 '15 at 06:27

Marco

893

Curious. Why wouldn't echo be optimised for writing out text? – Chris Davies Sep 17 '15 at 07:01
2

There is nothing in the POSIX standard that says echo is meant to put one line on a screen. – fpmurphy Sep 17 '15 at 07:09

Aleksander · Answer 6 · 2015-09-17T09:25:40.543

-2

It's not about echo being faster, it's about what you're doing:

In one case you're reading from input and writing to output directly. In other words, whatever is read from input through cat, goes to output through stdout.

input -> output

In the other case you're reading from input into a variable in memory and then writing the contents of the variable in output.

input -> variable
variable -> output

The latter will be much slower, especially if input is 50MB.

edited Sep 17 '15 at 09:25

answered Sep 17 '15 at 09:08

Aleksander

469

I think you have to mention that cat has to open the file in addition to copying from stdin and writing it to stdout. This is the excellence of the second script, but the first one is very better than the second at total. – Mohammad Sep 17 '15 at 13:03
There's no excellence in the second script; cat needs to open the input file in both cases. In the first case the stdout of cat goes directly to the file. In the second case the stdout of cat goes first to a variable, and then you print the variable into the output file. – Aleksander Sep 17 '15 at 15:56
@mohammad.k, there is emphatically no "excellence" in the second script. – Wildcard May 14 '16 at 01:08

Why there is such a difference in execution time of echo and cat?

UPDATE

6 Answers6

Linked