Processing input line by line (with empty lines) from shell

Question

In a shell script, I need to parse the output of a command line-by-line. The output may include empty lines, and these are relevant. I am using ash, not bash, so cannot resort to process substitution. I am trying this:

    OUT=`my_command`
    IFS=$'\n'              
    i=1
    for line in $OUT; do
        echo $line                                          
        eval VAL$i=$line             
        i=$((i+1))             
    done

However this is discarding empty lines in $OUT. How can I fix this so that empty lines are also processed ?

Are you sure that IFS=$'\n' does what you think it does in ash? If is Debian ash (dash) it almost definitely does not do so. — mikeserv, Feb 13 '15 at 16:04
You may want to have a look to Why is using a shell loop to process text considered bad practice? — Stéphane Chazelas, Feb 13 '15 at 16:23
I looked at it already (before posting the question). Doesn't apply to my problem. — Grodriguez, Feb 13 '15 at 16:28

score 3 · Answer 1 · edited Jun 11 '20 at 12:04

A workable shell loop could look like...

set -f -- "-$-"' -- "$@" '"
    ${IFS+IFS=\$2} ${out+out=\$3}" \
    "$IFS" "$out" "$@"
IFS='
';for out in $(my command|grep -n '.\|')
do  : something with "${out%%:*}" and "${out#*:}"
done
unset IFS out
eval "set +f $1"
shift 3

You only need to arrange it so there aren't any blank lines. Though I initally suggested nl for this purpose, on second thought there is a slight chance that nl's logical page divider could occur in input and distort its output (it would wind up resulting in a blank line, actually, and would influence which line was numbered - it is a very handy feature for other purposes though). Other than not interpreting logical page breaks, grep -n '.\|''s results will be identical.

Using a pipeline like that with a little parameter substitution and you can not only avoid the blank line issue, but also each iteration comes pre-numbered at the same time - (the current iteration's number will now be at the head of every value served you for $out followed by a :).

The set ... IFS=... lines are there to ensure the shell's state is restored to where you left it before altering it. Those precautions may be overkill if it is a script rather than a function. Still, you should at least set -f before shell splitting to avoid unintentional globbing on your input.

But about `(d)ash` and `<(`process substitution`)`

Then again, in a Debian (dash) derived ash (such as busybox ash) you might find that its handling of file-descriptor links and here-documents provides a superior alternative to what you might be accustomed to doing with <(process substitution).

Consider this example:

exec "$((i=3))"<<R "$((o=4))"<<W 3<>/dev/fd/3 4<>/dev/fd/4
R
W
sed -u 's/.*/here I am./' <&"$o" >&"$i" &
echo "hey...sed?" >&"$o"
head -n1          <&"$i"

Because dash and derivatives back here-documents with anonymous pipes rather than (as most other shells do) with regular files, and because the /dev/fd/[num] links on linux systems provide an indirect way of referring to a file-descriptor's backing file (even when it cannot be referenced in a file-system - such as for anonymous pipes) the above sequence demonstrates a very simple means of setting up what some shells might refer to as a coprocess. For example, in busybox ash or dash on a linux system (I won't vouch for others) the above will print:

here I am.

...and will continue to do so until the shell closes its $i and $o file-descriptors. It takes advantage of the -unbuffered switch GNU sed offers to avoid buffering issues, but even without it the backgrounded process's input could be filtered and conv=synchronized on blocks of \0NUL bytes w/ dd in a pipeline if necessary.

Here's a way in which I typically use the above with sed in an interactive shell:

: & SEDD=$$$!
sed -un "/^$SEDD$/!H;//!d;s///;x;/\n/!q;s///;s/%/&&/g;l" <&"$o" >&"$i" &

...which backgrounds a sed that will read and store input until it encounters a unique delimiter, at which time it will double any occurrence of % in its Hold buffer and print to my exec'd anonymous pipe a printf-format friendly C-escaped string on a single-line - or, on multiple lines if the result is greater than 80 chars. This last - for GNU sed - can be handled w/ sed -l0 which is a switch that would instruct sed never to wrap lines on \, or else like:

fmt=
while IFS= read -r r <&"$i" 
      case $r in (*$) 
      ! fmt=$fmt$r ;;esac
do    fmt=$fmt${r%?} 
done

Anyway, I build its buffer like:

echo something at sed >&"$o"
printf '%s\n' more '\lines%' at sed "$SEDD" >&"$o"

Then I pull it in like...

IFS= read -r fmt <&"$i"

This is what $fmt's contents look like afterward:

printf %s\\n "$fmt"
something at sed\nmore\n\\lines%%\nat\nsed$

sed will also do C-style octal escapes for non-printable chars.

So I can I use it like...

printf "%d\n${fmt%$}\n" 1 2 3

...which prints...

1
something at sed
more
\lines%
at
sed
2
something at sed
more
\lines%
at
sed
3
something at sed
more
\lines%
at
sed

And I can kill sed and release the pipes as needed like...

printf %s\\n "$SEDD" "$SEDD" >&"$o"
exec "$i">&- "$o">&-

This is the kind of thing you can do when you get to hold onto an fd rather than use it only once. You can maintain a back-pipe for as long as you might need to do - and it is more secure than a named pipe would be because the kernel doesn't offer up those links to any but the process that owns them (your shell), whereas a named pipe can be found (and tapped/stolen) in a file-system by any process with permissions to its reference file.

To do similar things in a shell which does process substitution you can probably do like...

eval "exec [num]<>"<(:)

...but I've never tried it.

Even more creative and full of interesting ideas, so I upvoted this. However I must say that I am more comfortable with my final solution; it is straightforward and (I assume) should work on any POSIX shell. — Grodriguez, Feb 17 '15 at 08:48

wurtel · Answer 2 · 2015-02-13T15:55:12.730

1

Do it this way:

i=1
my_command | while read line; do
    echo $line
    eval VAL$i="$line"
    i=$((i+1))
done

As the command's output is read line by line, those lines are processed individually (including empty lines) without having to store those lines in a variable first. This also saves memory as the output doesn't end up in memory twice, and the bash script can start processing those lines as soon as they're output and not only after the command has completed.

EDIT: As the VALx variables are set in a subshell above, a modification is needed:

eval `i=1
my_command | while read line; do
    # echo $line
    echo "VAL$i=\"$line\""
    i=$((i+1))
done`

If you really need the echo $line as well, some modifications would be needed.

edited Feb 13 '15 at 15:55

answered Feb 13 '15 at 15:41

wurtel

16,115

Doesn't work. The pipeline is processed in a subshell, so the VALx variables won't be availabe once the while loop exits. That's why I mentioned that I cannot use process substitution (otherwise I could do while read line; do ... done < <(cmd)) – Grodriguez Feb 13 '15 at 15:48
Argh, you're right, I didn't reckon with the eval thing. See my edit. – wurtel Feb 13 '15 at 15:55
1

You need while IFS= read -r line to avoid mangling the output. The eval bit is nonsense; to keep the values after the loop, use … | { while … done; …more stuff… } – Gilles 'SO- stop being evil' Feb 14 '15 at 00:09
@Gilles The eval bit is required because the "variable name" is dynamic. – Grodriguez Feb 16 '15 at 07:50
@Grodriguez But only on the assignment line, like in the question, not on the whole block of code. – Gilles 'SO- stop being evil' Feb 16 '15 at 13:33
@Gilles that's to make the pipeline work in ash which lacks process substitution (your approach |{ ... } is another possibility) – Grodriguez Feb 16 '15 at 14:50
@Grodriguez - it doesn't make it work, it makes it dangerous. I am a little astounded that this answer was upvoted at all. If it does work in any scenario, then that is only a side-effect of irresponsible coding practice. This stuffs a bunch of unknowns together, asks them to come up with a value, then executes the result as shell code. – mikeserv Feb 17 '15 at 13:16
@mikeserv I may be completely wrong here but I'd say that the code between the backticks just outputs (via echo) a set of "VALi=value" lines, then these lines are evaluated via eval so that the variables are actually assigned. Is this wrong? – Grodriguez Feb 17 '15 at 14:15
@Grodriguez - The code between the backticks echos: echo VAL="[unknown value]". [unknown value] could be anything - it could be $(rm your face). It could be ";rm your face. Using eval without input validation is irresponsible. The above answer is an exploit waiting to happen - and nothing less. – mikeserv Feb 17 '15 at 18:00
@Grodriguez - if you wanted to do the above safely, you would use ' hardquotes, and not while read. Like: eval "$(sed "s/'"'/&\\&&/g;s/.*/'"'&'/;=" | sed "N;s/$.*$\n/line\1=/")". Or, if you really wanted to use a shell loop, then you'd use alias: eval "$(i=0;while IFS= read -r line; do alias "line$((i+=1))=$line" "line$i"; done)" because alias is spec'd for safe shell reinput – mikeserv Feb 17 '15 at 18:45

Grodriguez · Accepted Answer · 2015-02-16T14:52:37.820

0

I have implemented this with a here doc:

    i=1
    while read -r line; do
        eval VAL$i=\$line
        i=$((i+1))
    done <<EOF
$(my_command)
EOF

Works just fine.

Update: Incorporated feedback from Gilles and mikeserv.

edited Feb 16 '15 at 14:52

answered Feb 13 '15 at 16:22

Grodriguez

956

1

The the command-sub drops trailing blank lines and the read drops leading, trailing $IFS. The eval needs (at least) to be eval "VAL$i=\$line", but you could also do i=0; eval "VAL$((i+=1))=\$line". – mikeserv Feb 13 '15 at 16:26
It is OK to drop leading & trailing whitespace within each line. Trailing blank lines are also OK. Can you elaborate on the $line bit (vs just $line) ? – Grodriguez Feb 13 '15 at 16:32
1

eval $line results in $line's expansion then in the value of $line's evaluation as shell code - so possibly multiple expansions if possible. That is not your goal: you want to expand VAL$i into then VAL[num]=$. What you're doing is VAL[num]=[$line's expansion]. Try line='value;echo rm your face'; eval VAL$i=$line and eval "VAL$i=\$line" to see the difference. This is the same problem with the other answer. – mikeserv Feb 13 '15 at 16:37
I see. Thank you for the detailed explanation! – Grodriguez Feb 13 '15 at 16:55
1

Make that while IFS= read -r line – Gilles 'SO- stop being evil' Feb 14 '15 at 00:09

Processing input line by line (with empty lines) from shell

3 Answers3

A workable shell loop could look like...

But about `(d)ash` and `<(`process substitution`)`

Linked

Processing input line by line (with empty lines) from shell

3 Answers3

A workable shell loop could look like...

But about (d)ash and <(process substitution)

Linked

But about `(d)ash` and `<(`process substitution`)`