Grep is the wrong tool for this. You should use a tool that is designed to dea with fields, like awk
. For example, to get all lines whose 5th field is greater than 1:
$ awk -F, '$5 > 1' file
Wonderwall,Oasis,Creation,Oct-95,2,1502270
Or whose 6th field is at least two million:
awk -F, '$6 >= 2000000' file
It is not possible to do such things with grep
since that doesn't let you compare values. The best you can do is some horrible hack like this to get those lines with 1
as the 5th field:
$ grep -E '([^,]+,){4}1,' file
Imagine,John Lennon,Apple,Oct-75,1,1714351
Uptown Funk,Mark Ronson featuring Bruno Mars,RCA,Dec-14,1,1647310
And reverse the match to get those which were not number 1:
$ grep -vE '([^,]+,){4}1,' file
Wonderwall,Oasis,Creation,Oct-95,2,1502270
That means "find exactly 4 repetitions of one or more non-,
([^,]+
) followed by a comma and then a 1
and a comma after it".
Your attempt was looking for something completely different. In regular expressions, [ ]
denote a character class. So [abc]
means "one of a
, or b
, or c
" and [^abc]
means "one of anything except a
, b
, or c
. So [^*,*,*,*,[1],]
is the same as [^*,[]1]
and will match any character that is not a [
, a ]
, a 1
, a ,
or a *
. I think you were trying to do something like this:
$ grep -vE '^.*?,.*?,.*?,.*?,1,' file
Wonderwall,Oasis,Creation,Oct-95,2,1502270
The *
is a modifier, it means "0 or more of the previous". So it makes no sense by itself. To match any character 0 or more times, you would use .*
not *
alone. Next, a single .*
would match all the way to the end of the line. This is called "greedy matching". For non-greedy, to find the shortest match possible instead of the longest, you want ?
which is why I used .*?
above.
grep pattern file.txt
without needingcat file.txt | grep pattern
. – terdon Nov 15 '22 at 19:30