30

I am evaluating the expression 6^6^6 using python and bc separately.

The content of the python file is print 6**6**6. When I execute time python test.py, I get the output as

real        0m0.067s
user        0m0.050s
sys         0m0.011s

And then, I ran the command time echo 6^6^6 | bc which gave me the following output

real        0m0.205s
user        0m0.197s
sys         0m0.005s

From these results it is clear that the sys time taken by python and bc was 11ms and 5ms respectively. The bc command outperformed python at sys time level but when it comes to user and real time python was almost 4 times faster than bc. What might have gone there. I haven't given any priority to the processes as such. I am trying to understand this situation.

Braiam
  • 35,991
ganessh
  • 435
  • So do you mean the sys component only gives the time it takes to loads and the runtime will be given in user component of the output? – ganessh Feb 21 '14 at 17:28
  • I am really not sure, that's why I posted a comment. It is just a guess. – terdon Feb 21 '14 at 17:30
  • Nope, my guess was not really right, see my answer. – terdon Feb 21 '14 at 17:41
  • 8
    echo | bc involves launching a subshell because of the pipe -- that's where some of your extra user time probably came from. To make this an equitable test, the python script should read from stdin so you can time echo 6**6**6 | whatever.py. – goldilocks Feb 21 '14 at 17:55
  • That a good catch. I shall try that and update the result – ganessh Feb 21 '14 at 17:58
  • 1
    I'd rather put the be command line into a script and time the execution of that. Or use echo 6^6^6 | time bc. – daniel kullmann Feb 21 '14 at 18:22
  • 1
    Side note: in python the 6**6**6 expression is actually computed at compile time. However since you are launching the file directly instead of importing it from a module, this shouldn't matter. To see the difference put 10**12345678 into an a.py file and try to import it from the interactive interpreter. Then close the interpreter, restart it and import a again. The first time it should take an noticeable amount of time (because python is compiling the module), while the second time it loads the .pyc, which should be instantaneous, – Bakuriu Feb 21 '14 at 19:07

4 Answers4

25

Python imports a large number of files at startup:

% python -c 'import sys; print len(sys.modules)'
39

Each of these requires an even greater number of attempts at opening a Python file, because there are many ways to define a module:

% python -vv -c 'pass'
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# trying site.so
# trying sitemodule.so
# trying site.py
# trying site.pyc
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site.so
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sitemodule.so
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site.py
# /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site.pyc matches /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site.py
import site # precompiled from /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site.pyc
# trying os.so
# trying osmodule.so
# trying os.py
# trying os.pyc
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.so
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/osmodule.so
# trying /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py
# /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc matches /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.py
import os # precompiled from /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/os.pyc
    ...

Each "trying", except for those which are builtin, requires an os-level/system calls, and each "import" seems to trigger about 8 "trying" messages. (There was ways to reduce this using zipimport, and each path in your PYTHONPATH may require another call.)

This means there are almost 200 stat system calls before Python starts on my machine, and "time" assigns that to "sys" rather than "user", because the user program is waiting on the system to do things.

By comparison, and like terdon said, "bc" doesn't have that high of a startup cost. Looking at the dtruss output (I have a Mac; "strace" for a Linux-based OS), I see that bc doesn't make any open() or stat() system calls of its own, except for loading a few shared libraries are the start, which of course Python does as well. In addition, Python has more files to read, before it's ready to process anything.

Waiting for disk is slow.

You can get a sense for Python's startup cost by doing:

time python -c pass

It's 0.032s on my machine, while 'print 6**6**6' is 0.072s, so startup cost is 1/2rd of the overall time and the calculation + conversion to decimal is the other half. While:

time echo 1 | bc

takes 0.005s, and "6^6^6" takes 0.184s so bc's exponentiation is over 4x slower than Python's even though it's 7x faster to get started.

  • 4
    You kinda buried the lead there. You may want to move the ending bit to the top. – Riking Feb 22 '14 at 01:45
  • Just out of interest on my machine: time python -c 'pass' 0m0.025s, time python -c 'print 666' 0m0.087s but time python -c 'x=666' 0m0.028s So *most* of the time is outputting the large number. – Steve Barnes Feb 28 '14 at 18:56
  • Yes, conversion to base 10 takes quadratic time in the number of digits. As an extreme case, try printing one of the larger Mersenne primes. It's very fast to calculate, but takes a long time to print in base 10. – Andrew Dalke Mar 01 '14 at 05:06
12

I found a nice answer on SO explaining the different fields:

  • Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).

  • User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.

  • Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process. See below for a brief description of kernel mode (also known as 'supervisor' mode) and the system call mechanism.

So, in your specific example, the python version is faster in terms of the actual time it takes to complete. However, the python approach spends more time in kernel-space, making calls to kernel functions. The bc command spends essentially no time in kernel space and all it's time is spent in user-space, presumably running internal bc code.

This makes no difference to you, the only information you really care about is real which is the actual time elapsed between launching the command and getting it's output.

You should also be aware that these tiny differences are not stable, they will also depend on the load of your system and will change every time you run the command:

$ for i in {1..10}; do ( time python test.py > /dev/null ) 2>&1; done | grep user
user    0m0.056s
user    0m0.052s
user    0m0.052s
user    0m0.052s
user    0m0.060s
user    0m0.052s
user    0m0.052s
user    0m0.056s
user    0m0.048s
user    0m0.056s

$ for i in {1..10}; do ( time echo 6^6^6 | bc > /dev/null ) 2>&1; done | grep user user 0m0.188s user 0m0.188s user 0m0.176s user 0m0.176s user 0m0.172s user 0m0.176s user 0m0.180s user 0m0.172s user 0m0.172s user 0m0.172s

terdon
  • 242,166
10

I will explain it from another perspective.

To be fair, bc has advantage since it doesn't have to read anything from the disk and only needs its blob/binaries while python has to import a series of modules + reading a file. So your test might be biased towards bc. To actually test it you should use bc -q file where file contains:

6^6^6
quit

Changing just that modified the time of using echo:

bc  0.33s user 0.00s system 80% cpu 0.414 total

To using the file:

bc -q some  0.33s user 0.00s system 86% cpu 0.385 total

(you will have to use terdon's method to notice greater differences, but at least we know they are)

Now, from python perspective, python needs to read from the disk, compile and execute each time the file, plus loading modules as Andrew points, which makes execution time slower. If you compile the byte code of the python script you will notice that it takes 50% less total time to execute the code:

python some.py > /dev/null  0.25s user 0.01s system 63% cpu 0.413 total

compiled:

./some.pyc  0.22s user 0.00s system 77% cpu 0.282 total

As you can see, there are several factors that can affect time execution between different tools.

Braiam
  • 35,991
3

I've had the benefit of reading the other answers. For starters people like me should know the reason why we're dealing with such a huge integer here is that both Python and bc do right-associative exponentiation expansion, which means that this is not 6^36 we're evaluating but rather 6^46656 which is considerably larger.1

Using variations on the following commands, we can extract an average for a specific element of the output of both the time reserved word and command:

for i in {1..1000}; do (time echo 6^6^6 | bc > /dev/null) 2>&1; done | grep 'rea' | sed -e s/.*m// | awk '{sum += $1} END {print sum / NR}'

for i in {1..1000}; do (/usr/bin/time -v sh -c 'echo 6^6^6 | bc > /dev/null') 2>&1; done | grep 'Use' | sed -e s/.*:// | awk '{sum += $1} END {print sum / NR}'

It is possible to go another route and remove the file entirely from the comparison. Also, we can compare bc's timing with something like the dc command, as historically the former is a "front end processor" to the latter. The following commands were timed:

echo 6^6^6 | bc
echo 6 6 6 ^ ^ p | dc
echo print 6**6**6 | python2.7

Note the dc command is left-associative for exponentiation.2

We have some results with time(bash) for 1000 iterations (in seconds):

0.229678 real bc
0.228348 user bc
0.000569 sys bc
0.23306  real dc
0.231786 user dc
0.000395 sys dc
0.07 real python
0.065907 user python
0.003141 sys python

bc and dc offer comparable performance in this context.

Less accurate3 results from /usr/bin/time i.e. GNU time command (scale precision is not valid here but the results are similar):

0.2224 user bc
0 sys bc
0.23 Elapsed bc
0.22998 user dc
0 sys dc
0.23 Elapsed dc
0.06008 user python
0 sys python
0.07 Elapsed python

An advantage of /usr/bin/time is that it offers the -v option which yields much more information which could be useful eventually.

It is also possible to evaluate this internally so to speak with the timeit Python module:

python2.7 -m timeit -n 1000 -r 1 'print 6**6**6' | grep 'loops'
1000 loops, best of 1: 55.4 msec per loop

That's a bit faster than what we saw before. Let's try the interpreter itself:

>>> import timeit
>>> import sys
>>> import os
>>> T = timeit.Timer("print 6**6**6")
>>> n = int(1000)
>>> f = open(os.devnull, 'w')
>>> sys.stdout = f
>>> t = t.timeit(n)
>>> sys.stdout = sys.__stdout__
>>> print t/n
0.0553743481636

That's the quickest I've seen.


If we evaluate a lesser exponentiation like 6^6, then the time command yields surprising results - using the same for loop commands we used we now have:

0.001001 bc real
0.000304 user
0.000554 sys
0.014    python real i.e. 10x more than bc??
0.010432 user
0.002606 sys

So with a smaller integer bc is all of a sudden much faster?? From system reboot to second run makes no difference. Yet at the same time, if we use timeit for Python, we get:

python2.7 -m timeit -n 100000 -r 1 'print 6**6' | grep loops  
100000 loops, best of 1: 0.468 usec per loop

This is microseconds, not milliseconds, so this doesn't match with the much slower results using the for loop. Maybe other tools are required to test this further and as others have explained there's more than meets the eye here. It seems Python was faster in the question's scenario but it's not clear if conclusions can be drawn beyond that...


1. Needless to say it's beyond the scope of something like echo's arithmetic expansion i.e. echo $((6**6**6)) - bash also happens to be right-associative for that i.e. 6^6^6 = 6^(6^6).

2. Compare with this: 6 6 ^ 6 ^ p.

3. It's possible the GNU time command provides more info when run on BSD UNIX (GNU time info document): Most information shown by 'time' is derived from the 'wait3' system call. The numbers are only as good as those returned by 'wait3'. Many systems do not measure all of the resources that 'time' can report on; those resources are reported as zero. The systems that measure most or all of the resources are based on 4.2 or 4.3BSD. Later BSD releases use different memory management code that measures fewer resources. -- On systems that do not have a 'wait3' call that returns status information, the 'times' system call is used instead. It provides much less information than 'wait3', so on those systems 'time' reports most of the resources as zero.