sed with while read misses line

Question

Can someone please explain this?

I have a file:

cat listi.txt
sdfasdfsf123sadfasdf123
jlkjh2345ljkh245lkh4325
57hghf456ghf457gf467

Here sed misses the first line:

while read line ; do sed 's/[^0-9]//g'; done < listi.txt
23452454325
57456457467

Seen here:

while read line ; do echo $line; done < listi.txt
sdfasdfsf123sadfasdf123
jlkjh2345ljkh245lkh4325
57hghf456ghf457gf467

This works but feels redundant and I could have missed it because I assumed sed would deliver all lines:

while read line ; do echo $line | sed 's/[^0-9]//g'; done < listi.txt
123123
23452454325
57456457467

Why is this? Regain my trust in bash as this makes me skeptic

You read one line, you do nothing with it, then sed consumes/processes the remaining two lines. That's all. The proper way to do it is sed 's/[^0-9]//g' listi.txt , you don't need while..read — don_crissti, Feb 03 '23 at 11:36
@ilkkachu Conceptually, they would read a line, process it, and then output it. It's common to see code like this from users that are not fully familiar with "the Unix way" of using the provided tools as filters, and that may be more familiar with scripting languages like Python or Perl, etc. — Kusalananda, Feb 03 '23 at 11:52
This was a dummy example, I was working with multiple files so I thought of using 'while read' ... my bad, it was irrelevant anyway. — AWE, Feb 03 '23 at 12:00
Writing a buggy one-liner is scarcely a valid reason to lose trust in Bash. — Paul_Pedant, Feb 03 '23 at 21:12
A sleepless idiot with vibrating arteries full of coffee loses trust in anything buggy, self-inflicted or not — AWE, Feb 04 '23 at 22:36

score 6 · Accepted Answer · edited Feb 03 '23 at 11:52

Your initial loop:

while read line; do
    sed 's/[^0-9]//g'
done <listi.txt

What happens here is that the read reads one line from the loop's input stream, which comes from the listi.txt file. The value is stored into the variable line (with some caveats) and not further used.

The call to sed is then done without mentioning an input file, which means sed will read from its standard input stream.

The standard input stream of sed is inherited from the loop, so it reads and processes the second line from listi.txt along with all other lines until the end of the file is reached.

The loop then executes read again, but since there's nothing more to read, the call fails and the loop terminates.

The overall effect of the above is that the first line of the file listi.txt is ignored, while sed is processing the file from the second line onward, removing non-digits from each of these and outputting them to the terminal.

If you simply want to apply the sed expression to all lines in listi.txt, you would use

sed 's/[^0-9]//g' listi.txt

That is, there is no need to use a separate shell loop since sed will apply its editing expression(s) to each line in the input file(s) by default.

If what you want to do is to delete all non-digits, then you may also do that with tr, which is a tool that does single character transformations:

tr -d -c '0-9\n' <listi.txt

This deletes (-d) any character from the input that is part of the complement (-c) of the mentioned set of characters (0-9\n; we probably want to keep the newline characters that divide the input into lines, which is why that is included here). The 0-9\n bit could also be written [:digit:]\n, which would match any digit in the current locale, and the newline character.

Also related:

Why is using a shell loop to process text considered bad practice?

So while..read could be a 'worst practice' way of skipping the header, a common issue — AWE, Feb 03 '23 at 12:03
@AWE To skip the first line, yes, but that is more commonly done with tail -n +2 or sed 1d. Note that a "header" could contain newline characters and thus spread multiple lines, if it's a file in a structured document format such as CSV. — Kusalananda, Feb 03 '23 at 12:07
@AWE, a worst practice indeed, since the loop doesn't do anything there. A slightly less bad one would be something like (read x; sed ... ) < file. But yeah, using tail or doing it in sed would be better, esp. as you can do e.g. sed -e 1d -e 's/[^0-9]//g' < file in one command — ilkkachu, Feb 04 '23 at 15:47

sed with while read misses line

1 Answers1