exactly 120 characters
With grep
:
grep -xE '.{120}' < your-file
grep -x '.\{120\}' < your-file # more portable
With awk
:
awk 'length == 120' < your-file
from 0 to 120 characters
With grep
:
grep -xE '.{0,120}' < your-file
grep -x '.\{0,120\}' < your-file # more portable
With awk
:
awk 'length <= 120' < your-file
For strictly less than 120, replace 120 with 119 or <=
with <
.
120 characters or over:
With grep
:
grep -E '.{120}' < your-file # lines that contain a sequence of 120 characters
grep '.\{120\}' < your-file # more portable
And some more alternatives:
grep -E '^.{120}' < your-file # lines that start with a sequence of 120 characters
grep '^.\{120\}' < your-file # more portable
grep -xE '.{120,}' < your-file # lines that have 120 or more characters
# between start and end.
grep -x '.\{120,\}' < your-file # more portable
With awk
:
awk 'length >= 120' < your-file
For strictly more than 120, replace 120 with 121 or >=
with >
.
Those assume that the input is valid text properly encoded as per the locale's charmap. If the input contains NUL characters, sequences of bytes that don't form valid characters, lines larger than LINE_MAX
(in number of bytes), or a non-delimited last line (in the case of grep
; awk
would add the missing delimiter), your mileage may vary.
If you want to do that filtering based on the number of bytes instead of characters, set the locale to C
or POSIX
(LC_ALL=C grep...
).
To do the filtering based on number of grapheme clusters instead of characters and if your grep
supports a -P
option, you can replace the E
with P
above and .
with \X
.
Compare:
$ locale charmap
UTF-8
$ echo $'e\u0301te\u0301' | grep -xP '\X{3}'
été
$ echo $'e\u0301te\u0301' | grep -xE '.{5}'
été
$ echo $'e\u0301te\u0301' | LC_ALL=C grep -xE '.{7}'
été
(that été
is 3 grapheme clusters, 5 characters, 7 bytes).
Not all grep -P
implementations support \X
. Some only support the UTF-8 multibyte charmap.
Note that filtering based on display width is yet another matter, and display width for a given string of characters depends on the display device. See Get the display width of a string of characters for more on that.
length==120
is unquoted here - nice code-golfing trick :) – Sergiy Kolodyazhnyy Jul 14 '20 at 07:05awk
command lines. – Stéphane Chazelas Jul 14 '20 at 07:15grep -xE '.{120,}'
? – glenn jackman Jul 14 '20 at 15:21grep -E '.{120}'
for lines that contain at least 120 character looks simpler to me.grep -E '^.{120}'
could possibly improve performance for lines that contain fewer than 120 character, though could change the outcome if there are non-characters in the input. – Stéphane Chazelas Jul 14 '20 at 15:31