4

I know that | is the logical "OR" operator inside a RegExp expression. But what is the equivalent "AND" operator (again, inside a RegExp)?

Note:

  • This is not about the multiple expressions' operator of "AND", which is just &&.
  • For example, something like /A&B/ to match both A and B.
AdminBee
  • 22,803
lylklb
  • 303

2 Answers2

11

There is no such operator in any of the regular expression flavors I am familiar with. If you want to match inputs that have both A and B you can write A.*B or B.*A, both of which require them in that particular order; or combine both expressions to accept either order with A.*B|B.*A.

Alternatively, do two separate matches. For example, in awk:

awk '/A/ && /B/' file

or manually with two grep instances:

grep A file | grep B

You don't really need an AND operator in regular expressions. The idea of a regex is that it describes a string. By definition, you put in the regex the thing you are trying to match. So an OR is needed to allow matching either A or B, but the AND is basically built in to the regular expression: anything you write in a regex needs to be matched so everything is basically joined by AND operators making a dedicated AND kind of pointless.

terdon
  • 242,166
  • Sometimes that "and" might be useful, though. E.g. you could have input where the lines contain lists of strings, and you want to find the ones where the list has both foo and bar and just need the quick solution instead of parsing properly. Especially if you want the filenames and line numbers too, piping greps would have the second match on those, and with AWK you'd need more work to print them in the first place. – ilkkachu Feb 03 '23 at 12:06
  • 1
    With Perl regexes, you could do grep -P '(?=.*A)(?=.*B)' or something like that, not that that's very pretty either. – ilkkachu Feb 03 '23 at 12:08
  • 1
    Of course, A.*B|B.*A won't work the same as && if there's possible overlap between A and B - for example, if you're searching for lines containing both "alpha" and "beta" then according to that specification you should match the string "betalpha". – Daniel Schepler Feb 03 '23 at 20:20
3

Note: As comments by Stéphane Chazelas suggest, this answer is somewhat invalidated by the existence of RegEx implementations that do allow an AND-Operator. The reasoning below is still correct in that such an operator only makes sense if you ensure that the imposed conditions are mutually compatible.


I think the answer is that there cannot be the "AND" equivalent of the |-operator in RegExes, because in the end, regular expressions perform matching on the character level of the input string (albeit sometimes implicitly via repetition operators), and thereby directly tied to a particular position in the string (see e.g. this Q&A for a similar discussion).

The point is that if you have an expression of the form (I'm using explicitly awk syntax here because of your question title)

$0 ~ /something(A|B)somethingelse/

this requires the string to have either A or B at the specific position immediately behind something and before somethingelse to match. The position requirement can be more dynamic if you have patterns with repetition operators, such as

$0 ~ /[a-f]+(A|B)[0-9]+/

but still, the point is that the occurence of either A or B is tied specifically to the position after the pattern consisting of only lowercase a ... f(1) and before the pattern consisting of only digits 0 ... 9.

There cannot be a corresponding "AND" condition

$0 ~ /something(A&B)somethingelse/

because that would mean that the input string would have to contain A as well as B at the very same position - which obviously wouldn't work.

The only use case where an "AND" operator is useful is therefore in describing general properties of the string, where each of the desired properties can be expressed by a single RegEx, e.g. "the string must contain at least one A and at least one B regardless of their exact absolute and relative position", but that would again leave us at the && operator for combining multiple expressions, which you said you are not interested it, and of course the various alternative formulations of this workaround in @terdon's answer.


(1) in C collating order, at least

AdminBee
  • 22,803