Delete line in each text files if the first field value is greater than 400

Question

I have a large number of txt files. the format of each txt files is similar to this

200 0.2 0.1 0.5 0.4
500 0.4 0.9 0.9 0.1

I am trying to delete each line in each txt file that has the first field value greater than 400. So the above file should only contain this now:

200 0.2 0.1 0.5 0.4

Code

for file in *.txt; do 
        echo "$(awk '{ if ($1 < 401) print }' *.txt)" > tmp && mv tmp *.txt 
done 
rm -f tmp

but this doesn't work as it moves all the files to the next text file.

Rarely is echo $( some_command ) materially different from just writing some_command — Chris Davies, May 03 '22 at 21:33
Certainly no material benefit, but it can make things at least slightly worse. It depends on how much and what kind of output it produces. For starters, unquoted it would convert newlines in the output to spaces, so the entire output is only one line. And, whether quoted or unquoted, it also wastes CPU & real time - e.g. on my system time echo "$(find /usr/share/doc/ | wc)" takes about 1.7 seconds vs about 0.8 seconds as time find /usr/share/doc/ | wc (with repeated runs to eliminate caching differences). — cas, May 04 '22 at 00:55

score 5 · Answer 1 · edited May 04 '22 at 12:35

If you're using GNU awk (which you almost certainly are if you're using Linux), you can use GNU awk's in-place edit library, and you don't even need a shell for loop or any temporary files to do it.

 awk -i inplace '$1 < 401' ./*.txt

This will remove all lines where field 1 is > 400 from each text file. It works by first loading GNU awk's inplace library, and then only outputting lines where $1 < 401 evaluates to true.

If you want awk to make a backup copy of each original file (e.g. with a .bak filename extension) before it changes it, you can use awk's INPLACE_SUFFIX variable:

 awk -i inplace -v INPLACE_SUFFIX=.bak '$1 < 401' ./*.txt

Note: unlike some other programs (e.g. sed and perl), which have a -i option for in-place edit, GNU awk's -i option is short for --include...i.e. include the gawk library named in the next argument. It's this library (called "inplace") which provides the in-place edit functionality.

Beware using gawk -i inplace like that introduces a security vulnerability unless you modify $AWKPATH not to include . or make sure you run that command from within a working directory where nobody could create a file called inplace or inplace.awk. — Stéphane Chazelas, Jun 24 '23 at 21:18
Yeah, I personally wouldn't use awk for this, I'd use perl. e.g. something like perl -i -lane 'print if $F[0] <= 400' ./*.txt. — cas, Jun 25 '23 at 10:06

score 4 · Accepted Answer · edited May 04 '22 at 12:36

You need to refer to file in your loop; there’s also no need to use echo:

for file in *.txt; do 
        awk '{ if ($1 < 401) print }' < "$file" > tmp && mv -- tmp "$file"
done 
rm -f tmp

The AWK code can be simplified too:

for file in *.txt; do 
        awk '$1 < 401' < "$file" > tmp && mv -- tmp "$file"
done 
rm -f tmp

To match your requirement exactly, the test should be changed:

for file in *.txt; do 
        awk '!($1 > 400)' < "$file" > tmp && mv -- tmp "$file"
done 
rm -f tmp

score 0 · Answer 3 · edited May 04 '22 at 12:37

You can change the code like this to do the work:

for file in *.txt; do 
        awk '{ if ($1 < 401) print }' < "$file" > tmp && mv -- tmp "$file" 
done 
rm -f tmp

But be careful when overwrite original file, better create modified files in different directory (something like): mv tmp modified/$file and do not forget to create this directory before copy files there.

And if you define the logic as "greater than 400" you if should be: $1 <= 400 (the number can be 400.1)

Delete line in each text files if the first field value is greater than 400

3 Answers3