Using 'grep' to find lines that contain all of three specified characters in any order

Question

We have a document which contains lines and we have to find if [a|b|c] appears at least once in every line no matter which order.

For example:

Input:

abc
bca
cab
hhfdhdhfabjfdjdjff
acjfdjdfjdf
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

Desired Output (the fourth line is absent since it only contains a and b but no c):

abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

We are using Linux.

@terdon That many people want to answer a question does not mean it is a good one. The wording is bad (see the comments to wurtel's answer) and so was the formatting. Why upvote that? BTW: I don't consider your erasing the highlighting an improvement. — Hauke Laging, Feb 04 '15 at 16:06
@HaukeLaging my edit was so that the text can be copied directly into a file for testing (I could have used quotes instead of codeblocks to keep the formatting, true). The previous version had lists which were harder to read and much harder to copy. Also remember that this is a new user who doesn't know the formatting tools but gave all information needed to answer the question. Anyway, people's votes are their own but usually if you find a question interesting enough to answer, it is also good enough for an upvote. — terdon, Feb 04 '15 at 16:08
@terdon: Note that Why don't people upvote questions they answer? addresses your last remark. — G-Man Says 'Reinstate Monica', Feb 08 '15 at 20:09
@G-Man yes, I know, we also have http://meta.unix.stackexchange.com/q/3133/ here. I left that comment (which I have now deleted) because at the time, there was a question with 0 upvotes and 4 answers. It was clear and easy to understand and only lacked formatting. All information needed to answer was provided, so I felt it was strange that nobody had upvoted it. Unfortunately, we have an issue with not upvoting questions on the site and that has been bugging me for a while which is why I made that comment. — terdon, Feb 08 '15 at 20:23
See Check if all of multiple strings or regexes exist in a file. — codeforester, Apr 20 '18 at 02:16

score 26 · Answer 1 · answered Feb 04 '15 at 13:58

26

Pipe it:

grep a file | grep b | grep c

answered Feb 04 '15 at 13:58

muru

72,889

Simple and neat. – vy32 Feb 05 '15 at 02:02
4

I'm surprised at the convoluted solutions below. This should be an obvious answer to anyone how knows about grep. – Clearer Feb 05 '15 at 03:31
1

@Clearer it doesn't scale well at all – shadowtalker Feb 05 '15 at 07:57
@ssdecontrol: Expand on why not? – Nathan Tuggy Feb 05 '15 at 11:12
@NathanTuggy I guess it depends on what you mean by "scale." I was thinking "scales to more letters" but I realize now people mean "scales to longer files." – shadowtalker Feb 05 '15 at 14:40

score 8 · Answer 2 · edited Feb 05 '15 at 14:00

8

The sed advantage of grep easy to see in such examples

sed -n '/a/{/b/{/c/p;};}' file

or:

sed '/a/!d;/b/!d;/c/!d' file

edited Feb 05 '15 at 14:00

Stéphane Chazelas

544,893

answered Feb 04 '15 at 13:59

Costas

14,916

score 7 · Answer 3 · answered Feb 04 '15 at 13:55

7

awk '/a/ && /b/ && /c/' file

Or with grep (which wouldn't scale well, though):

grep -e 'a.*b.*c' -e 'a.*c.*b' -e 'b.*a.*c' -e 'b.*c.*a' -e 'c.*a.*b' -e 'c.*b.*a'  file

answered Feb 04 '15 at 13:55

Hauke Laging

90,279

score 7 · Answer 4 · edited Apr 13 '17 at 12:37

7

Let's compare all proposed solutions!

I have a text file test.txt of size ~230M. I'm on Mac Mini, updated to 10.10.

1) awk solution by Hauke Laging (better not...):

$ time bash -c "awk '/a/ && /b/ && /c/' >> /dev/null"
19.51 real        19.23 user         0.20 sys

2) "bruteforced" grep by Raghuraman R and Hauke Laging (better, but not really...):

$ time bash -c "grep -e 'a.*b.*c' -e 'a.*c.*b' -e 'b.*a.*c' -e 'b.*c.*a' -e 'c.*a.*b' -e 'c.*b.*a' test.txt >> /dev/null"
10.02 real         9.93 user         0.07 sys

3) chained grep by muru (ok!):

$ time bash -c "grep a test.txt | grep b | grep c >> /dev/null"
1.61 real         3.08 user         0.29 sys

4) perl solution by terdon (even better!):

$ time bash -c "perl -ne 'print if /a/ && /b/ && /c/' test.txt >> /dev/null"
0.83 real         0.75 user         0.07 sys

So, I think "chained grep" is ok, but you can also use Perl for even better performance. I could not test sed approach, because the program provided by Costas does not work "as is" in mac os console.

BTW I'm no expert on benchmarking, sorry if I did something wrong.

edited Apr 13 '17 at 12:37

Community

1

answered Feb 05 '15 at 13:13

dragn

187

2

This answer is being discussed on meta. – Shog9 Feb 05 '15 at 17:26
3

@dragn I'd suggest you add in credits for all the proposed solutions. E.g., where you currently just say "4) perl", say "4) terndon's perl solution". – derobert Feb 05 '15 at 17:32
agreed, I should have done that... I'll edit the post later, when I'll be at my PC – dragn Feb 05 '15 at 17:36
Your answer would have a much higher chance of being found via a search engine if you had made it into separate Q&A pair referring to the OPs question and the solutions you benchmarked, but clearly titling the Q as benchmark of text search strategies. That would also make it more easy for others to comment on the specifics or provide additional speed optimisation (e.g. what the effect of non-pattern grep -F might have), which is more difficult to do in comments on this answer (due to size and format limitations). – Anthon Feb 05 '15 at 21:05
If comparing with perl, you'd need to pass -Mopen=locale to perl, or use LC_ALL=C for the other solutions. – Stéphane Chazelas Feb 10 '15 at 09:29

score 6 · Answer 5 · answered Feb 04 '15 at 17:15

6

Through grep which accept -P (Perl-regexp) parameter.

$ grep -P '^(?=.*a)(?=.*b)(?=.*c)' file
abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

Explanation:

^ Matches the start of a line
(?=.*a) Only if the string going to be matched must contain a letter a
(?=.*b) Must contain b
(?=.*c) Must contain c

answered Feb 04 '15 at 17:15

Avinash Raj

3,703

gnu grep accepts -P, to install it on a mac do "brew install grep", see also https://unix.stackexchange.com/questions/563092/is-gnu-greps-p-option-safe-to-use-in-production – George Colpitts Feb 20 '23 at 17:15

terdon · Answer 6 · 2015-02-04T16:07:12.817

I would do this in perl instead:

$ perl -ne 'print if /a/ && /b/ && /c/' file 
abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

If you just want to check whether each line matches all three letters (without printing the line itself), you could do:

$ perl -lne '$k++ if /a/ && /b/ && /c/; 
 END{$k==$. ? print "yes" : print "no"}' file

Or

$ awk '(/a/ && /b/ && /c/){k++} END{if(k==NR){print "yes"} else{print "no"}}' file

score 3 · Answer 7 · edited May 23 '17 at 12:39

3

If it is just a,b,c then we can use a mix of 'grep -o' and 'grep -e' option as below

grep -e "a.*b.*c" -e "a.*c.*b" -e "b.*a.*c" -e "b.*c.*a" -e "c.*a.*b" -e "c.*b.*a" file

You can also check already asked question at https://stackoverflow.com/questions/1546711/can-grep-show-only-words-that-match-search-pattern

edited May 23 '17 at 12:39

Community

1

answered Feb 04 '15 at 14:01

Raghuraman R

131
2

1

That answer has already been given. – Hauke Laging Feb 04 '15 at 14:02
1

@HaukeLaging your answer and this one were posted within 2 minutes of each other. It is safe to assume that you both wrote them at the same time and nobody copied from anyone. – terdon Feb 04 '15 at 15:50
@terdon My intention was not to claim this was copied from me. I have been on the other side often enough. But in such a case I would delete my answer if it was the later one. – Hauke Laging Feb 04 '15 at 16:02

score 0 · Answer 8 · answered Feb 04 '15 at 14:01

0

if [ $(grep a file | grep b | grep -c c) -eq $(wc -l file | cut -f1 -d' ') ]; then
    echo yes
else
    echo no
fi

answered Feb 04 '15 at 14:01

wurtel

16,115

Are you sure you have understood the question correctly...? – Hauke Laging Feb 04 '15 at 14:03
The question was whether the combination of a, b and c occurs in every line. My snippet checks that... If the question was meant to be "show those lines where a, b and c occur" then the other answers answer that :) – wurtel Feb 04 '15 at 14:48

Using 'grep' to find lines that contain all of three specified characters in any order

8 Answers8