Grep fail on multiple matches

Question

I want to grep a search pattern but only succeed (and output the matching line) if there is only one unique match. If two lines match, grep should fail or output nothing.

What have you tried and where are you stuck? – Kamil Maciorowski Jul 15 '22 at 11:37 — Kamil Maciorowski, Jul 15 '22 at 11:37

terdon · Accepted Answer · 2022-07-16T15:49:30.743

7

You can't do this with grep, but you can simply count the matches. I don't know what shell, what grep or what operating system you are using, but here's an example of a bash function that can do that:

maxOne() (
    pattern="$1"
    file="$2"
    IFS=$'\n'
    set -f
results=( $(grep -m2 -- &quot;$pattern&quot; &quot;$file&quot;) )
if [ &quot;${#results[@]}&quot; -eq 1 ]; then
    printf -- '%s\n' &quot;${results[@]}&quot;
    return 0
else
    return 1
fi

)

Add those lines to your ~/.bashrc or just paste them into a terminal with a running bash session, and you can then do:

maxOne foo file

To search for foo in file. Note that the -m option (maximum results) which is used here for efficiency to make grep exit after two matches, isn't supported by all versions of grep so if it gives you an error, just remove it. It isn't needed, it just speed things up.

Important: this will not work for multi-line search strings which you can use with grep -z if your grep supports that. If you need to be able to handle multi-line search patterns, you will need a different approach. Also, this will not work with patterns that match empty lines (e.g. grep '^$' file). Stéphane's solution will handle empty lines, so that would be a better option if this is an issue. His will also work on multiple files, unlike mine, which is a nice perk.

edited Jul 16 '22 at 15:49

answered Jul 15 '22 at 11:51

terdon

242,166

(The question said "only succeed if there is only one unique match", which I took to mean that zero matches should not succeed. Anyway, printf -- '%s\n' "${results[@]}" would still print one empty line if the array was empty. Not because the array expansion would conjure up an empty element, but because printf prints the format string at least once.) – ilkkachu Jul 15 '22 at 12:35
aaand set -f has a different meaning in zsh (but isn't really necessary). Not sure if it's worth making the function usable in both with that issue... – ilkkachu Jul 15 '22 at 12:43
1

@ilkkachu set -o noglob works the same in zsh and bash (and is more legible IMO) – Stéphane Chazelas Jul 15 '22 at 12:58
1

Beware array=( $(grep...) ) would remove empty lines from the output of grep, so you can't use that if the pattern may match empty lines. With bash, you can use readarray -t array < <(grep...) instead which avoids having to mess with IFS and noglob. See also the f parameter expansion flag in zsh. – Stéphane Chazelas Jul 15 '22 at 14:07
@ilkkachu fair point about printf but the rest of your edits seem to only have made it worse: I want the function () since I don't want this to be run on shells that don't support function. As you said, set -f doesn't do the same thing in zsh, so why add it? Where do you want to disable globbing? – terdon Jul 15 '22 at 14:30
@StéphaneChazelas I was always thinking that this would not handle patterns with newlines (but I forgot to make that explicit). Is there any reason to mess with IFS if I do not need to handle newlines? – terdon Jul 15 '22 at 14:34
@terdon, well, I expect you'd want to disable word-splitting and globbing when splitting the output of the $(...) to the array. Consider a file where the lines have multiple words or consist of e.g. a lone asterisk. echo hello world > test.txt; maxOne hello test.txt and it fails since hello and world produce two elements in the array. Or echo '*' > test.txt; maxOne . test.txt, where the glob gets expanded probably giving more than one array element. – ilkkachu Jul 15 '22 at 15:56
@terdon, as for function maxOne(), that's not supported in ksh (where function foo and foo() are both supported but subtly different). The rest of the array stuff required would work in ksh, though (and Bash's arrays are borrowed from ksh anyway). So I'm not sure why you'd want to make that part an arbitrary filter. (It looks to me that arrays are the feature actually needed here, and a shell that doesn't support (ksh) arrays would likely croak at "${#results[@]}" or one of the others anyway.) But sure, it's your answer. – ilkkachu Jul 15 '22 at 15:58
@ilkkachu ah! Of course, in the array. Absolutely yes, thanks. I'll add set -o noglob as Stéphane suggested. As for function, if removing it makes it work in ksh as well, then thank you again and I'll do that. I had thought it was the POSIX shells like sh and dash that would choke on function and since I didn't want this to work for them anyway, I saw no point. I learned a few things today, thanks! – terdon Jul 15 '22 at 16:03
@terdon, you need to change IFS too, since the default would split on any whitespace, splitting words within a line, not just the lines from each other. And then there's the issue that those changes affect global state, so you'd need to reset IFS and the noglob flag at the end to avoid messing up other parts of the script... So easiest to wrap the whole function in ( ) instead of { } to run it in a subshell. Or use local - IFS; in Bash (the - makes noglob and other flags local too), but local is where ksh is different and I'm not sure it can localize the flags... – ilkkachu Jul 15 '22 at 16:19
@terdon, Um, yeah. I though about writing a longer comment at first, before (or instead of) editing, but, I guess, I thought it was an obvious enough word-splitting issue anyway and wanted to spare everyone from the verbose explanation... Sorry. – ilkkachu Jul 15 '22 at 16:30
Let us continue this discussion in chat. – terdon Jul 15 '22 at 16:46
Adding the -m2 option to grep (GNU-specific though) would avoid it looking for occurrences past the second like in my answer to make it more efficient. – Stéphane Chazelas Jul 16 '22 at 11:59
Of course! Thanks, @StéphaneChazelas! – terdon Jul 16 '22 at 15:49

Stéphane Chazelas · Answer 2 · 2022-07-16T10:04:21.550

6

You could do with:

unique_egrep() (
  export ERE="$1"; shift
  exec gawk -e '
    BEGIN               {ret = 1}
    BEGINFILE           {n = 0}
    $0 ~ ENVIRON["ERE"] {if (n++) nextfile; found = $0}
    ENDFILE             {if (n == 1) {print FILENAME":"found; ret = 0}}
    END                 {exit ret}' -E /dev/null "$@"
)

And then unique_egrep pattern *.txt for instance.

Here using the -e 'code' -E /dev/null (in place of 'code') trick to be able to process arbitrary file paths.

All of -e, -E, BEGINFILE, ENDFILE and nextfile are GNU extensions (though nextfile is now found in many other implementations as well).

edited Jul 16 '22 at 10:04

answered Jul 15 '22 at 12:41

Stéphane Chazelas

544,893

This answer looks very good, but could you briefly explain what is the purpose of -E /dev/null in the last line of the gawk script? – user000001 Jul 16 '22 at 09:45
2

@user000001, see Why does awk stop and wait if the filename contains = and how to work around that? – Stéphane Chazelas Jul 16 '22 at 09:47

Grep fail on multiple matches

2 Answers2