Is it possible to find any lines in a file that exceed 79 characters?
2 Answers
In order of decreasing speed (on a GNU system in a UTF-8 locale and on ASCII input) according to my tests:
grep '.\{80\}' file
perl -nle 'print if length$_>79' file
awk 'length>79' file
sed -n '/.\{80\}/p' file
Except for the perl
¹ one (or for awk
/grep
/sed
implementations (like mawk
or busybox) that don't support multi-byte characters), that counts the length in terms of number of characters (according to the LC_CTYPE
setting of the locale) instead of bytes.
If there are bytes in the input that don't form part of valid characters (which happens sometimes when the locale's character set is UTF-8 and the input is in a different encoding), then depending on the solution and tool implementation, those bytes will either count as 1 character, or 0 or not match .
.
For instance, a line that consists of 30 a
s a 0x80 byte, 30 b
s, a 0x81 byte and 30 UTF-8 é
s (encoded as 0xc3 0xa9), in a UTF-8 locale would not match .\{80\}
with GNU grep
/sed
(as that standalone 0x80 byte doesn't match .
), would have a length of 30+1+30+1+2*30=122 with perl
or mawk
, 3*30=90 with gawk
.
If you want to count in terms of bytes, fix the locale to C
with LC_ALL=C grep/awk/sed...
.
That would have all 4 solutions consider that line above contains 122 characters. Except in perl
and GNU tools, you'd still have potential issues for lines that contain NUL characters (0x0 byte).
¹ the perl
behaviour can be affected by the PERL_UNICODE
environment variable though

- 544,893

- 31,277
Shell approach:
while IFS= read -r line || [ -n "$line" ];
do
[ "${#line}" -gt 79 ] && printf "%s\n" "$line"
done < input.txt
Python approach:
python -c 'import sys;f=open(sys.argv[1]);print "\n".join([ l.strip() for l in f if len(l) >79 ]);f.close()' input.txt
Or as a short script for readability:
#!/usr/bin/env python
import sys
with open(sys.argv[1]) as f:
for line in f:
if len(line) > 79:
print line.strip()
If we wanted to exclude newline character \n
from calculations, we can make if len(line) > 79
be if len(line.strip()) > 79
Side note: this is Python 2.7 syntax. Use print()
for Python 3

- 16,527
awk
can come closer if you drop($0)
, which is implicit anyway ;). – Thor Jul 12 '12 at 18:36grep
had a surprise for me: it beatawk
. So I had to edit it. – manatwork Jul 12 '12 at 18:38sed
seems to be doing something wrong. – Thor Jul 13 '12 at 11:23^
, it's slightly faster: e.g.grep '^.\{80\}' file
. – cas Jul 29 '12 at 09:32grep '^.\{1000\}' file
returnsgrep: invalid repetition count(s)
, whileawk 'length>1000' file
succeeds.) – mdahlman Dec 18 '14 at 21:00grep -n '.\{80\}' file | cut -f1 -d:
– Anthony Hatzopoulos Sep 23 '15 at 16:07