eval limitation with piped commands

Question

We have a shell script that builds a long piped command chain in a variable and executes it with eval (the following code is simplified to the essential):

 cmd="cat /some/files | grep -v \"this\" | grep -v \"that\""
 cmd="$cmd | grep -v \"much more dynamical filter with variables\""
 ...
 result=`eval $cmd`

All worked fine so far but now it seems that the content of the cmd variable exceeds a limit. When it exceeds about 95970 bytes I will receive the error (although syntax is correct):

eval: line ...: syntax error near unexpected token `|'

I did some research, but I didn't get a clue (getconf ARG_MAX echoes 2621440, ulimit -a doesn't helped me, too).

Could someone please explain which limit this could be and maybe how to increase the limit or what is the best way to avoid it?

EDIT: I have tested it now on three different servers (centos) with a designated script. On all servers I ended up reaching 3333 pipes in one command with eval.

And I have found another page where someone experienced the same but without eval. So it seems to be just a limit of pipes.

Knowing that the limitation is probably caused by the number of pipes will help me to workaround the problem. So this is not the question anymore.

But I am still interested in how this limit is set or at least how to detect the value of the limit (probably not on every system 3333) without running a script for that.

It can be reproduced with:

yes cat | head -n 3334 | paste -sd '|' - | bash

Why do it through eval? Is there are some objection to do it directly result=$(grep -v "this" /some/files | grep -v "that") or through function cmd(){......}? — Costas, Feb 20 '15 at 13:53
You can shorten grep -v this | grep -v that to grep -v 'this\|that'. — choroba, Feb 20 '15 at 14:13
@Costas: The command has to be build dynamically with external data. Therfore I have to collect the different grep commands in a variable. The example above is extremely simplified. — hellcode, Feb 20 '15 at 14:32
How many greps do you need? And does eval "$sixkstring" "$sixkstring" work? Also, since you're building it programmatically, what is the problem with { printf %s\\n 'grep -v this |'; echo grep -v that \|; ...; } | sh — mikeserv, Feb 20 '15 at 15:11
@mikeserv: actually there are more than 3000 greps with dynamic filter values in it. — hellcode, Feb 20 '15 at 15:14
@hellcode - that has got to be the ugliest idea I've ever heard. You've got to come up with something else - you need some organizational process (I'd use sed) to parse all of that into something sensible. You don't just toss 3000 greps in the process pool for this and that. — mikeserv, Feb 20 '15 at 15:15
@Archemar: 1) with big effort I could group many of the patterns to one big pattern file but unless I don't know the reason for the limitation I wouldn't do that 2) yes, I am sure. — hellcode, Feb 20 '15 at 15:21
It's difficult to tell without seeing your real code. I do see at least one mistake (though it wouldn't cause this problem with this exact code): missing double quotes in eval "$cmd". To debug this, arrange to build the string with plenty of newlines, print out the string before evaluating it, and check what the line number in the error corresponds to. If you need more help, post code that would allow us to reproduce the error. — Gilles 'SO- stop being evil', Feb 20 '15 at 22:18
@hellcode If you need more than 3333 pipes for your program, there is a good chance that there is something really wrong with your design. Maybe you could tell us more about what you really have to do? To get back to the topic at hand, it is possible to make the bulk of what you want to do in a single grep command : cat file.txt | grep -v 'str1' | grep -v 'str2' is equivalent to grep -v -e 'str1' -e 'str2' file.txt. — user43791, Feb 20 '15 at 22:24
@Gilles: Why is it a mistake? As I can see the double quotes don't change anything. The line number in the error corresponds to the code line with eval. — hellcode, Feb 20 '15 at 22:41
@hellcode http://unix.stackexchange.com/questions/131766/why-does-my-shell-script-choke-on-whitespace-or-other-special-characters The quotes wouldn't change anything in the snippet you posted, but they might have an effect on your real code, depending on what's actually in there. — Gilles 'SO- stop being evil', Feb 20 '15 at 22:48
@Gilles: Thank you for the link and the explanations. In my case it doesn't matter. — hellcode, Feb 20 '15 at 22:58
@hellcode: are all the grep commands really grep -v "some pattern"? If so, you can use grep -v -e "some pattern" -e "some other pattern" -e "yet another pattern", and that grep command could be built up using an array instead of eval, which would require less quoting. — rici, Feb 21 '15 at 04:47
@rici - definitely - or even just as a $(command substitution) in a heredoc w/ one pattern per line. 3000 patterns might be a little much for a single grep, but that kind of thing can be distributed - at least it wouldn't require a shell to ask the kernel to allocate 3000 pipes and start 3000 subshells and 3000 grep processes. — mikeserv, Feb 21 '15 at 05:40

rici · Accepted Answer · 2015-02-23T23:36:19.790

The problem here is actually an issue with the bash parser. There is no workaround other than editing and recompiling bash, and the 3333 limit is likely to be the same on all platforms.

The bash parser is generated with yacc (or, typically, with bison but in yacc mode). yacc parsers are bottom-up parsers, using the LALR(1) algorithm which builds a finite state machine with a pushdown stack. Loosely speaking, the stack contains all not-yet-reduced symbols, along with enough information to decide which productions to use to reduce the symbols.

Such parsers are optimized for left-recursive grammar rules. In the context of an expression grammar, a left-recursive rule applies to a left-associative operator, such as a−b in ordinary mathematics. That's left associative because the expression a−b−c groups ("associates") to the left, making it equal to (a−b)−c rather than a−(b−c). By contrast, exponentiation is right-associative, so that a^{b^c} is by convention evaluated as a^{(b^c)} rather than (a^b)^{^c}.

bash operators are process operators, rather than arithmetic operators; these include short-circuit booleans (&& and ||) and pipes (| and |&), as well as sequencing operators ; and &. Like mathematical operators, most of these associate to the left, but the pipe operators associate to the right, so that cmd1 | cmd2 | cmd3 is parsed as though it were cmd1 | { cmd2 | cmd3 ; } as opposed to { cmd1 | cmd2 ; } | cmd3. (Most of the time the difference is not important, but it is observable. [See Note 1])

To parse an expression which is a sequence of left associative operators, you only need a small parser stack. Every time you hit an operator, you can reduce (parenthesize, if you like) the expression to the left of it. By contrast, parsing an expression which is a sequence of right associative operators requires that you put all of the symbols onto the parser stack until you reach the end of the expression, because only then can you start reducing (inserting parentheses). (That explanation involves quite a bit of hand-waving, since it was intended to be non-technical, but it is based on the working of the real algorithm.)

Yacc parsers will resize their parser stack at runtime, but there is a compile-time maximum stack size, which by default is 10000 slots. If the stack reaches the maximum size, any attempt to expand it will trigger an out-of-memory error. Because | is right associative, an expression of the form:

statement | statement | ... | statement

will eventually trigger this error. If it were parsed in the obvious way, that would happen after 5,000 pipe symbols (with 5,000 statements). But because of the way the bash parser handles newlines, the actual grammar used is (roughly):

pipeline: command '|' optional_newlines pipeline

with the consequence that there is an optional_newlines grammar symbol after every |, so each pipe occupies three stack slots. Hence, the out-of-memory error is generated after 3,333 pipe symbols.

The yacc parser detects and signals the stack overflow, which it signals by calling yyerror("memory exhausted"). However, the bash implementation of yyerror tosses away the provided error message, and substitutes a message like "syntax error detected near unexpected token...". That's a bit confusing in this case.

Notes

The difference in associativity is most easily observed using the |& operator, which pipes both stderr and stdout. (Or, more accurately, duplicates stdout into stderr after establishing the pipe.) For a simple example, suppose that the file foo does not exist in the current directory. Then

# There is a race condition in this example. But it's not relevant.
$ ls foo | ls foo |& tr n-za-m a-z
ls: cannot access foo: No such file or directory
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
# Associated to the left:
$ { ls foo | ls foo ; } |& tr n-za-m a-z
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel
# Associated to the right:
$ ls foo | { ls foo |& tr n-za-m a-z ; }
ls: cannot access foo: No such file or directory
yf: pnaabg npprff sbb: Nb fhpu svyr be qverpgbel

Can the same limitation be expected in other shells? Fantastic answer, by the way? — mikeserv, Feb 23 '15 at 22:35
@mikeserv: not necessarily. I don't know my way around the source code of other shells as well as bash; I do know that zsh uses a hand-built parser which probably does not have the same limitation. Any parser will run out of memory eventually if given a sufficiently complex input; the only real question is how it manifests the problem (and, if it is controlled, at what point.) — rici, Feb 23 '15 at 23:05
Great answer. Same issue with yes 1 | head -n 5000 | paste -sd '^' | bc or gawk "$(yes 1 | head -n 5000 | paste -sd '^')" — Stéphane Chazelas, Feb 24 '15 at 10:04
@StéphaneChazelas: Worth noting that, unlike bash, both bc and gawk display the "memory exhausted" message produced by the bison-generated parser. One might prefer "parser stack overflow", but it's still more informative than a generic "syntax error". — rici, Feb 24 '15 at 15:27

eval limitation with piped commands

1 Answers1

Notes