10

We have a document which contains lines and we have to find if [a|b|c] appears at least once in every line no matter which order.

For example:

Input:

abc
bca
cab
hhfdhdhfabjfdjdjff
acjfdjdfjdf
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

Desired Output (the fourth line is absent since it only contains a and b but no c):

abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

We are using Linux.

vy32
  • 194
  • 1
    @terdon That many people want to answer a question does not mean it is a good one. The wording is bad (see the comments to wurtel's answer) and so was the formatting. Why upvote that? BTW: I don't consider your erasing the highlighting an improvement. – Hauke Laging Feb 04 '15 at 16:06
  • @HaukeLaging my edit was so that the text can be copied directly into a file for testing (I could have used quotes instead of codeblocks to keep the formatting, true). The previous version had lists which were harder to read and much harder to copy. Also remember that this is a new user who doesn't know the formatting tools but gave all information needed to answer the question. Anyway, people's votes are their own but usually if you find a question interesting enough to answer, it is also good enough for an upvote. – terdon Feb 04 '15 at 16:08
  • @G-Man yes, I know, we also have http://meta.unix.stackexchange.com/q/3133/ here. I left that comment (which I have now deleted) because at the time, there was a question with 0 upvotes and 4 answers. It was clear and easy to understand and only lacked formatting. All information needed to answer was provided, so I felt it was strange that nobody had upvoted it. Unfortunately, we have an issue with not upvoting questions on the site and that has been bugging me for a while which is why I made that comment. – terdon Feb 08 '15 at 20:23

8 Answers8

26

Pipe it:

grep a file | grep b | grep c
muru
  • 72,889
8

The sed advantage of grep easy to see in such examples

sed -n '/a/{/b/{/c/p;};}' file

or:

sed '/a/!d;/b/!d;/c/!d' file
Costas
  • 14,916
7
awk '/a/ && /b/ && /c/' file

Or with grep (which wouldn't scale well, though):

grep -e 'a.*b.*c' -e 'a.*c.*b' -e 'b.*a.*c' -e 'b.*c.*a' -e 'c.*a.*b' -e 'c.*b.*a'  file
Hauke Laging
  • 90,279
7

Let's compare all proposed solutions!

I have a text file test.txt of size ~230M. I'm on Mac Mini, updated to 10.10.

1) awk solution by Hauke Laging (better not...):

$ time bash -c "awk '/a/ && /b/ && /c/' >> /dev/null"
19.51 real        19.23 user         0.20 sys

2) "bruteforced" grep by Raghuraman R and Hauke Laging (better, but not really...):

$ time bash -c "grep -e 'a.*b.*c' -e 'a.*c.*b' -e 'b.*a.*c' -e 'b.*c.*a' -e 'c.*a.*b' -e 'c.*b.*a' test.txt >> /dev/null"
10.02 real         9.93 user         0.07 sys

3) chained grep by muru (ok!):

$ time bash -c "grep a test.txt | grep b | grep c >> /dev/null"
1.61 real         3.08 user         0.29 sys

4) perl solution by terdon (even better!):

$ time bash -c "perl -ne 'print if /a/ && /b/ && /c/' test.txt >> /dev/null"
0.83 real         0.75 user         0.07 sys

So, I think "chained grep" is ok, but you can also use Perl for even better performance. I could not test sed approach, because the program provided by Costas does not work "as is" in mac os console.

BTW I'm no expert on benchmarking, sorry if I did something wrong.

dragn
  • 187
  • 2
    This answer is being discussed on meta. – Shog9 Feb 05 '15 at 17:26
  • 3
    @dragn I'd suggest you add in credits for all the proposed solutions. E.g., where you currently just say "4) perl", say "4) terndon's perl solution". – derobert Feb 05 '15 at 17:32
  • agreed, I should have done that... I'll edit the post later, when I'll be at my PC – dragn Feb 05 '15 at 17:36
  • Your answer would have a much higher chance of being found via a search engine if you had made it into separate Q&A pair referring to the OPs question and the solutions you benchmarked, but clearly titling the Q as benchmark of text search strategies. That would also make it more easy for others to comment on the specifics or provide additional speed optimisation (e.g. what the effect of non-pattern grep -F might have), which is more difficult to do in comments on this answer (due to size and format limitations). – Anthon Feb 05 '15 at 21:05
  • If comparing with perl, you'd need to pass -Mopen=locale to perl, or use LC_ALL=C for the other solutions. – Stéphane Chazelas Feb 10 '15 at 09:29
6

Through grep which accept -P (Perl-regexp) parameter.

$ grep -P '^(?=.*a)(?=.*b)(?=.*c)' file
abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

Explanation:

  • ^ Matches the start of a line
  • (?=.*a) Only if the string going to be matched must contain a letter a
  • (?=.*b) Must contain b
  • (?=.*c) Must contain c
Avinash Raj
  • 3,703
  • gnu grep accepts -P, to install it on a mac do "brew install grep", see also https://unix.stackexchange.com/questions/563092/is-gnu-greps-p-option-safe-to-use-in-production – George Colpitts Feb 20 '23 at 17:15
4

I would do this in perl instead:

$ perl -ne 'print if /a/ && /b/ && /c/' file 
abc
bca
cab
fhfhfhfabcjdfjdjfk
ahfhfbkjfjdjffc

If you just want to check whether each line matches all three letters (without printing the line itself), you could do:

$ perl -lne '$k++ if /a/ && /b/ && /c/; 
 END{$k==$. ? print "yes" : print "no"}' file

Or

$ awk '(/a/ && /b/ && /c/){k++} END{if(k==NR){print "yes"} else{print "no"}}' file
terdon
  • 242,166
3

If it is just a,b,c then we can use a mix of 'grep -o' and 'grep -e' option as below

grep -e "a.*b.*c" -e "a.*c.*b" -e "b.*a.*c" -e "b.*c.*a" -e "c.*a.*b" -e "c.*b.*a" file

You can also check already asked question at https://stackoverflow.com/questions/1546711/can-grep-show-only-words-that-match-search-pattern

Raghuraman R
  • 131
  • 2
  • 1
    That answer has already been given. – Hauke Laging Feb 04 '15 at 14:02
  • 1
    @HaukeLaging your answer and this one were posted within 2 minutes of each other. It is safe to assume that you both wrote them at the same time and nobody copied from anyone. – terdon Feb 04 '15 at 15:50
  • @terdon My intention was not to claim this was copied from me. I have been on the other side often enough. But in such a case I would delete my answer if it was the later one. – Hauke Laging Feb 04 '15 at 16:02
0
if [ $(grep a file | grep b | grep -c c) -eq $(wc -l file | cut -f1 -d' ') ]; then
    echo yes
else
    echo no
fi
wurtel
  • 16,115
  • Are you sure you have understood the question correctly...? – Hauke Laging Feb 04 '15 at 14:03
  • The question was whether the combination of a, b and c occurs in every line. My snippet checks that... If the question was meant to be "show those lines where a, b and c occur" then the other answers answer that :) – wurtel Feb 04 '15 at 14:48