Is there something wrong with my script or is Bash much slower than Python?

Question

I was testing the speed of Bash and Python by running a loop 1 billion times.

$ cat python.py
#!/bin/python
# python v3.5
i=0;
while i<=1000000000:
    i=i+1;

Bash code:

$ cat bash2.sh
#!/bin/bash
# bash v4.3
i=0
while [[ $i -le 1000000000 ]]
do
let i++
done

Using the time command I found out that the Python code takes just 48 seconds to finish while the Bash code took over 1 hour before I killed the script.

Why is this so? I expected that Bash would be faster. Is there something wrong with my script or is Bash really much slower with this script?

I'm not quite sure why you expected Bash to be faster than Python. — Kusalananda, Aug 13 '16 at 08:49
There aren't even any external programs used in that bash script, so it shouldn't need to be that slow. Though if parses the commands in the loop again every iteration, that could explain it. — ilkkachu, Aug 13 '16 at 08:57
@ilkkachu you mean let i++ command is slowing down this bash script? — Alex Jones, Aug 13 '16 at 08:59
Yes, bash parses line by line, every line, and repeats the process for each iteration. You could even change the script while it is running and bash would change its execution (or crash if you messed with offsets where it expected to continue) — Matija Nalis, Aug 13 '16 at 09:02
@MatijaNalis no you can't! The script is loaded into memory, editing the text file it was read from (the script file) will have absolutely no effect on the running script. A good thing too, bash is already slow enough without having to open and re-read a file every time a loop is run! — terdon, Aug 13 '16 at 12:05
Related: Why is using a shell loop to process text considered bad practice? — Stéphane Chazelas, Aug 13 '16 at 15:44
@Matija you're thinking of Windows cmd.exe which does exactly that. — cat, Aug 13 '16 at 16:39
@terdon well I can (bash 4.3.30(1))! Try this (each in new line) echo start, sleep 60, echo foo. now run the script with bash, edit it in new window with editor ehich preserves inode (I used joe), and change last line to say echo bar instead. then go watch original window - it will say start, bar instead of start, foo. At least it does for me in Debian Jessie. — Matija Nalis, Aug 13 '16 at 21:08
@cat I wouldn't know about cmd.exe, I last used windows while it was still command.com. See example for @terdon and try it yourself and share results. strace(8) for me shows bash llseeking and reading again after execing sleep. Now, maybe it doesn't ALWAYS do that, but sometimes at least it does. — Matija Nalis, Aug 13 '16 at 21:24
I'm... wildly guessing bash doesn't actually reopen / reparse the file between statements, but does something like not keep tokens in memory, but instead points into a (mmaped?) file? Which would explain the need to preserve the inode. — millimoose, Aug 14 '16 at 03:02
Bash reads the file line-by-line as it executes, but it remembers what it read if it comes to that line again (because it's in a loop, or a function). The original claim about re-reading each iteration isn't true, but modifications to yet-to-be-reached lines will be effective. An interesting demonstration: make a file containing echo echo hello >> $0, and run it. — Michael Homer, Aug 14 '16 at 10:28
@MatijaNalis ah, OK, I can understand that. It was the idea of changing a running loop that threw me. Presumably, each line is read sequentially and only after the last one has finished. However, a loop is treated as a single command and will be read in its entirety, so changing it won't affect the running process. Interesting distinction though, I had always assumed that the entire script is loaded into memory before execution. Thanks for pointing it out! — terdon, Aug 14 '16 at 13:57
Judging by the last name of the op, im guessing hes just trolling for funny answers...... haha — Chad G, Nov 19 '18 at 22:39
Not relevant here, but to put things in perspective any decent optimizing compiler or interpreter would realize the loop is a no-op and optimize it out completely, setting i to 1000000001. — trr, Mar 26 '24 at 22:27

Petr Skocik · Answer 1 · 2016-08-13T15:49:16.183

Shell loops are slow and bash's are the slowest. Shells aren't meant to do heavy work in loops. Shells are meant to launch a few external, optimized processes on batches of data.

Anyway, I was curious how shell loops compare so I made a little benchmark:

#!/bin/bash

export IT=$((10**6))

echo POSIX:
for sh in dash bash ksh zsh; do
    TIMEFORMAT="%RR %UU %SS $sh"
    time $sh -c 'i=0; while [ "$IT" -gt "$i" ]; do i=$((i+1)); done'
done


echo C-LIKE:
for sh in bash ksh zsh; do
    TIMEFORMAT="%RR %UU %SS $sh"
    time $sh -c 'for ((i=0;i<IT;i++)); do :; done'
done

G=$((10**9))
TIMEFORMAT="%RR %UU %SS 1000*C"
echo 'int main(){ int i,sum; for(i=0;i<IT;i++) sum+=i; printf("%d\n", sum); return 0; }' |
   gcc -include stdio.h -O3 -x c -DIT=$G - 
time ./a.out

( Details:

CPU: Intel(R) Core(TM) i5 CPU M 430 @ 2.27GHz
ksh: version sh (AT&T Research) 93u+ 2012-08-01
bash: GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
zsh: zsh 5.2 (x86_64-unknown-linux-gnu)
dash: 0.5.7-4ubuntu1

)

The (abbreviated) results (time per iteration) are:

POSIX:
5.8 µs  dash
8.5 µs ksh
14.6 µs zsh
22.6 µs bash

C-LIKE:
2.7 µs ksh
5.8 µs zsh
11.7 µs bash

C:
0.4 ns C

From the results:

If you want a slightly faster shell loop, then if you have the [[ syntax and you want a fast shell loop, you're in an advanced shell and you have the C-like for loop too. Use the C like for loop, then. They can be about 2 times as fast as while [-loops in the same shell.

ksh has the fastest for ( loop at about 2.7µs per iteration
dash has the fastest while [ loop at about 5.8µs per iteration

C for loops can be 3-4 decimal orders of magnitude faster. (I heard the Torvalds love C).

The optimized C for loop is 56500 times faster than bash's while [ loop (the slowest shell loop) and 6750 times faster than ksh's for ( loop (the fastest shell loop).

Again, the slowness of shells shouldn't matter much though, because the typical pattern with shells is to offload to a few processes of external, optimized programs.

With this pattern, shells often make it much easier to write scripts with performance superior to python scripts (last time I checked, creating process pipelines in python was rather clumsy).

Another thing to consider is startup time.

time python3 -c ' '

takes 30 to 40 ms on my PC whereas shells take around 3ms. If you launch a lot of scripts, this quickly adds up and you can do very very much in the extra 27-37 ms that python takes just to start. Small scripts can be finished several times over in that time frame.

(NodeJs is probably the worst scripting runtime in this department as it takes about 100ms just to start (even though once it has started, you'd be hard pressed to find a better performer among scripting languages)).

For ksh, you may want to specify the implementation (AT&T ksh88, AT&T ksh93, pdksh, mksh...) as there's quite a lot of variation between them. For bash, you may want to specify the version. It made some progress lately (that applies also to other shells). — Stéphane Chazelas, Aug 13 '16 at 15:42
@StéphaneChazelas Thanks. I added the versions of the used software and hardware. — Petr Skocik, Aug 13 '16 at 15:50
For reference: to create a process pipeline in python you have to do something like: from subprocess import *; p1=Popen(['echo', 'something'], stdout=PIPE); p2 = Popen(['grep', 'pattern'], stdin=p1.stdout, stdout=PIPE); Popen(['wc', '-c'], stdin=PIPE). This is indeed clumsy, but it shouldn't be hard to code a pipeline function that does this for you for any number of processes, resulting in pipeline(['echo', 'something'], ['grep', 'patter'], ['wc', '-c']). — Bakuriu, Aug 13 '16 at 18:11
I thought maybe the gcc optimizer was totally eliminating the loop. It's not, but it's still doing an interesting optimization: it uses SIMD instructions to do 4 adds in parallel, reducing the number of loop iterations to 250000. — Mark Plotnick, Aug 13 '16 at 21:35
@MarkPlotnick Interesting. I hadn't checked out the assembly. I first tried tcc -O3 -run but that performed about 10 times worse (that's why the post says 3--4 decimal orders of magnitude (3 is for tcc)) so I used the gcc result. tcc outputs much more usual-looking assembly. — Petr Skocik, Aug 13 '16 at 21:48
@MarkPlotnick I've learned that summation is pretty cheap op that prevents gcc from eliminating my benchmarking for loops, especially if I use the result later in a non-optimizable way (=if I print it). — Petr Skocik, Aug 13 '16 at 21:52
@PSkocik: It's right on the edge of what optimizers can do in 2016. It looks like C++17 will mandate that compilers must be able to calculate similar expressions at compile time (not even as an optimization). With that C++ capability in place, GCC may pick it up as an optimization for C as well. — MSalters, Aug 15 '16 at 12:18
Minor correction: I made a typo when I submitted your C program to godbolt to look at the asm code. I used 1000000 instead of 1000000000. Same optimization happened, though, but my comment should've said 250000000 instead of 250000. — Mark Plotnick, Aug 17 '16 at 21:41

score 20 · Accepted Answer · edited Apr 13 '17 at 12:36

This is a known bug in bash; see the man page and search for "BUGS":

BUGS
       It's too big and too slow.

;)

For an excellent primer on the conceptual differences between shell scripting and other programming languages, I highly recommend reading:

Why is using a shell loop to process text considered bad practice?

The most pertinent excerpts:

Shells are a higher level language. One may say it's not even a language. They're before all command line interpreters. The job is done by those commands you run and the shell is only meant to orchestrate them.

...

IOW, in shells, especially to process text, you invoke as few utilities as possible and have them cooperate to the task, not run thousands of tools in sequence waiting for each one to start, run, clean up before running the next one.

...

As said earlier, running one command has a cost. A huge cost if that command is not builtin, but even if they are builtin, the cost is big.

And shells have not been designed to run like that, they have no pretension to being performant programming languages. They are not, they're just command line interpreters. So, little optimisation has been done on this front.

Don't use big loops in shell scripting.

score 18 · Answer 3 · answered Aug 13 '16 at 16:37

18

I did a bit of testing, and on my system ran the following--none made the order of magnitude speedup that would be needed to be competitive, but you can make it faster:

Test 1: 18.233s

#!/bin/bash
i=0
while [[ $i -le 4000000 ]]
do
    let i++
done

test2: 20.45s

#!/bin/bash
i=0
while [[ $i -le 4000000 ]]
do 
    i=$(($i+1))
done

test3: 17.64s

#!/bin/bash
i=0
while [[ $i -le 4000000 ]]; do let i++; done

test4: 26.69s

#!/bin/bash
i=0
while [ $i -le 4000000 ]; do let i++; done

test5: 12.79s

#!/bin/bash
export LC_ALL=C

for ((i=0; i != 4000000; i++)) { 
:
}

The important part in this last one is the export LC_ALL=C. I've found that many bash operations end up significantly faster if this is used, in particular any regex function. It also shows an undocumented for syntax to use the {} and the : as a no-op.

answered Aug 13 '16 at 16:37

Erik Brandsberg

311

3

+1 for the LC_ALL suggestion, I did not know that. – einpoklum Aug 13 '16 at 16:58
+1 Interesting how the [[ is so much faster than [. I didn't know LC_ALL=C (BTW you don't need to export it) made a difference. – Petr Skocik Aug 13 '16 at 20:06
@PSkocik As far as I know, [[ is a bash builtin, and [ is really /bin/[, which is the same as /bin/test -- an external program. Which is why thay's slower. – tomsmeding Aug 14 '16 at 07:29
@tomsmending [ is a builtin in all common shells (try type [). The external program is mostly unused now. – Petr Skocik Aug 14 '16 at 09:09
I wonder why nobody did try this (I shortened the test by a factor of 1000, because time is valuable, and the environment is heating up anyway): (LC_ALL=C; i=0; time while [[ ++i -lt 1000000 ]]; do :; done) used 1.392 seconds user time, while (i=0; time while [[ ++i -lt 1000000 ]]; do :; done) used 1.503 seconds of user time. (Laptop with AMD Ryzen 7 mobile 4700U running openSUSE Leap 15.2) – U. Windl May 06 '21 at 10:12

Stéphane Chazelas · Answer 4 · 2016-08-14T20:03:56.920

A shell is efficient if you use it for what it has been designed for (though efficiency is rarely what you look for in a shell).

A shell is a command-line interpreter, it is designed to run commands and have them cooperate to a task.

If you want to count to 1000000000, you invoke a (one) command to count, like seq, bc, awk or python/perl... Running 1000000000 [[...]] commands and 1000000000 let commands is bound to be terribly inefficient, especially with bash which is the slowest shell of all.

In that regard, a shell will be a lot faster:

$ time sh -c 'seq 100000000' > /dev/null
sh -c 'seq 100000000' > /dev/null  0.77s user 0.03s system 99% cpu 0.805 total
$ time python -c 'i=0
> while i <= 100000000: i=i+1'
python -c 'i=0 while i <= 100000000: i=i+1'  12.12s user 0.00s system 99% cpu 12.127 total

Though of course, most of the job is done by the commands that the shell invokes, as it should be.

Now, you could of course do the same with python:

python -c '
import os
os.dup2(os.open("/dev/null", os.O_WRONLY), 1);
os.execlp("seq", "seq", "100000000")'

But that's not really how you'd do things in python as python is primarily a programming language, not a command line interpreter.

Note that you could do:

python -c 'import os; os.system("seq 100000000 > /dev/null")'

But, python would actually be calling a shell to interpret that command line!

I love your answer. So many other answers discuss improved "how" techniques, while you cover both the "why" and perceptively the "why not" addressing the error in methodology of approach of the OP. — greg.arnott, Nov 30 '18 at 01:51

score 7 · Answer 5 · edited Aug 13 '16 at 11:02

7

Answer: Bash is much slower than Python.

One little example is in blog post Performance of several languages.

edited Aug 13 '16 at 11:02

Peter Mortensen

1,025

answered Aug 13 '16 at 09:02

steve

21,892

score 3 · Answer 6 · answered Aug 13 '16 at 08:59

3

Nothing is wrong (except your expectations) as python is really rather fast for non-compiled language, see https://wiki.python.org/moin/PythonSpeed

answered Aug 13 '16 at 08:59

Matija Nalis

3,111
1
14
27

3

I rather discourage from answers like this one, this belongs to comments IMHO. – Vlastimil Burián Nov 22 '16 at 05:48

score 2 · Answer 7 · answered Aug 13 '16 at 09:25

2

Aside the comments, you could optimize the code a little, e.g.

#!/bin/bash
for (( i = 0; i <= 1000000000; i++ ))
do
: # null command
done

This code should take a bit less time.

But obviously not fast enough to be actually usable.

answered Aug 13 '16 at 09:25

Vlastimil Burián

28,462

score -3 · Answer 8 · answered Aug 15 '16 at 00:23

-3

I've noticed a dramatic difference in bash from the use of logically equivalent "while" and "until" expressions:

time (i=0 ; while ((i<900000)) ; do  i=$((i+1)) ; done )

real    0m5.339s
user    0m5.324s
sys 0m0.000s

time (i=0 ; until ((i=900000)) ; do  i=$((i+1)) ; done )

real    0m0.000s
user    0m0.000s
sys 0m0.000s

Not that it really bears tremendous relevance to the question, other than that perhaps sometimes small differences make a big difference, even though we'd expect they'd be equivalent.

answered Aug 15 '16 at 00:23

intrepid penguin

1

7

Try with this one ((i==900000)). – Aug 15 '16 at 00:47
2

You're using = for assignment. It will return true immediately. No loop will take place. – Wildcard Nov 05 '16 at 02:55
1

Have you actually used Bash before? :) – Vlastimil Burián Nov 22 '16 at 05:49
Neat trick: bypassing 899999 iterations of the loop is indeed quite faster ;) – Olivier Dulac Feb 14 '24 at 15:03

Is there something wrong with my script or is Bash much slower than Python?

8 Answers8

Linked