How is ksh93 so fast?

Question

So, in general, I tend to look to sed for text processing - especially for large files - and usually avoid doing those sorts of things in the shell itself.

I think, though, that may change. I was poking around at man ksh and I noticed this:

<#pattern     Seeks forward to the beginning of the
              next line containing pattern.

<##pattern    The same as <# except that  the  por‐
              tion  of  the file that is skipped is
              copied to standard output.

Skeptical of real-world usefulness, I decided to try it out. I did:

seq -s'foo bar
' 1000000 >file

...for a million lines of data that look like:

1foo bar
...
999999foo bar
1000000

...and pitted it against sed like:

p='^[^0-8]99999.*bar'
for c in "sed '/$p/q'" "ksh -c ':<##@(~(E)$p)'"    
do </tmp/file eval "time ( $c )"
done | wc -l

So both commands should get up to 999999foo bar and their pattern matching implementation must evaluate at least the beginning and end of each line in order to do so. They also have to verify the first char against a negated pattern. This is a simple thing, but... The results were not what I expected:

( sed '/^[^0-8]99999.*bar/q' ) \
    0.40s user 0.01s system 99% cpu 0.419 total
( ksh -c ':<##@(~(E)^[^0-8]99999.*bar)' ) \
    0.02s user 0.01s system 91% cpu 0.033 total
1999997

ksh uses ERE here and sed a BRE. I did the same thing with ksh and a shell pattern before but the results did not differ.

Anyway, that's a fairly significant discrepancy - ksh outperforms sed 10 times over. I've read before that David Korn wrote his own io lib and implements it in ksh - possibly this is related? - but I know next to nothing about it. How is it the shell does this so well?

Even more amazing to me is that ksh really does leave its offset right where you ask it. To get (almost) the same out of (GNU) sed you have to use -u - very slow.

Here's a grep v. ksh test:

1000000         #grep + head
( grep -qm1 '^[^0-8]99999.*bar'; head -n1; ) \
    0.02s user 0.00s system 90% cpu 0.026 total
999999foo bar   #ksh + head
( ksh -c ':<#@(~(E)^[^0-8]99999.*bar)'; head -n1; )  \
    0.02s user 0.00s system 73% cpu 0.023 total

ksh beats grep here - but it doesn't always - they're pretty much tied. Still, that's pretty excellent, and ksh provides lookahead - head's input starts before its match.

It just seems too good to be true, I guess. What are these commands doing differently under the hood?

Oh, and apparently there's not even a subshell here:

ksh -c 'printf %.5s "${<file;}"'

score 8 · Accepted Answer · answered Dec 22 '14 at 13:48

8

Not only does ksh use sfio but it uses its own custom memory allocator.

Nevertheless, my guess is sfio makes the difference in this case. I just tried to run your example under strace and can see that ksh calls read/write ~200 times (65 KB blocks) while sed does it ~3400 times (4 KB blocks). With sed -u my laptop almost melted, reads are done per byte and writes per line. Ksh simple uses lseek. Grep uses read ~400 times (32 KB blocks).

answered Dec 22 '14 at 13:48

Miroslav Franc

1,721

Yeah - unbuffered is not for the faint of heart. I wonder if ksh's regex engine is efficient as its io? Anyway, thanks very much for the answer. My apologies to your laptop. What about the custom memory allocator, though? Do you have any more on that? – mikeserv Dec 22 '14 at 14:01
1

Sadly, no. You can of course download source code from at&t website, but that's about it. The library is called AST and contains allocator, regex engine and many other things. So it's entirely possible that combination of all those things make ksh much faster. – Miroslav Franc Dec 22 '14 at 14:32
http://www2.research.att.com/~astopen/download/ast/ast.html – Miroslav Franc Dec 22 '14 at 14:36
Thank you - this looks promising, too: *Some of the components available in the AST software collection are: POSIX commands
Most of the standard POSIX commands are available in the AST collection. Many are coded as library functions which can be added to ksh as built-in command which dramatically improves performance.* - Now I've just gotta figure out how to build it,
– mikeserv Dec 22 '14 at 14:44
1

@mikeserv ksh can be built to use Phong Vo's vmalloc allocator. Journal articles available at that link. – Mark Plotnick Dec 22 '14 at 15:35
@MarkPlotnick - dang, you guys. I'm going to be knocking at this stuff all day, now. – mikeserv Dec 22 '14 at 15:38

How is ksh93 so fast?

1 Answers1

Linked