102
#!/bin/bash
INT=-5

if [[ "$INT" =~ ^-?[0-9]+$ ]]; then

echo "INT is an integer."

else

echo "INT is not an integer." >&2

exit 1

fi

What does the leading ~ do in the starting regular expression?

jasonwryan
  • 73,126
ragnarok
  • 1,029

2 Answers2

122

The ~ is actually part of the operator =~, which performs a regular expression match of the string to its left to the extended regular expression on its right.

[[ "string" =~ pattern ]]

Note that the string should be quoted, and the regular expression shouldn't be quoted (unless you want to match literal strings).

A similar operator is used in the Perl programming language and several other general-purpose and domain-specific languages to perform regular expression matching.

The regular expressions understood by bash are the same as those that GNU grep understands with the -E flag, i.e. the extended set of regular expressions.


Somewhat off-topic, but good to know:

When matching against a regular expression containing capturing groups, the part of the string captured by each group is available in the BASH_REMATCH array. The zeroth/first entry in this array corresponds to & in the replacement pattern of sed's substitution command (or $& in Perl), which is the bit of the string that matches the pattern, while the entries at index 1 and onwards correspond to \1, \2, etc. in a sed replacement pattern (or $1, $2 etc. in Perl), i.e. the bits matched by each parenthesis.

Example:

string=$( date +%T )

if [[ "$string" =~ ^([0-9][0-9]):([0-9][0-9]):([0-9][0-9])$ ]]; then printf 'Got %s, %s and %s\n'
"${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}" fi

This may output

Got 09, 19 and 14

if the current time happens to be 09:19:14.

The REMATCH bit of the BASH_REMATCH array name comes from "Regular Expression Match", i.e. "RE-Match".


In non-bash Bourne-like shells, one may also use expr for limited regular expression matching (using only basic regular expressions).

A small example:

$ string="hello 123 world"
$ expr "$string" : ".*[^0-9]\([0-9][0-9]*\)"
123
Kusalananda
  • 333,661
  • 2
    It's the same as what grep -E understands only on GNU systems and only when using an unquoted variable as the pattern [[ $var = $pattern ]] (see [[ 'a b' =~ a\sb ]] vs p='a\sb'; [[ 'a b' =~ $p ]]). Also beware that shell quoting affects the meaning of RE operators and that some characters need to be quoted for the shell tokenising that may affect the RE processing. [[ '\' =~ [\/] ]] returns false. ksh93 has even worse issues. See zsh (or bash 3.1) for a saner approach where shell and RE quoting are clearly separate. The [ builtin of zsh and yash also have a =~ operator. – Stéphane Chazelas Jan 27 '17 at 10:06
  • @StéphaneChazelas How is it "saner" that both of this match in zsh?: [[ "This is a fine mess." =~ T.........fin*es* ]]; [[ "This is a fine mess." =~ T.........fin\*es\* ]]. Or that a quoted * also match? [[ "This is a fine mess." =~ "T.........fin*es*" ]]. –  Jan 30 '17 at 00:28
  • 1
    It's saner (IMO) in that it's much simpler rules. Shell quoting and RE escaping are clearly separate. In [[ a =~ .* ]] or [[ a =~ '.*' ]] or [[ a =~ \.\* ]], the same .* RE is passed to the =~ operator. OTH, in bash, [[ '\' =~ [)] ]] returns an error, would you know without trying it whether [[ '\' =~ [\)] ]] matches? How about [[ '\' =~ [\/] ]] (it does in ksh93). How about c='a-z'; [[ a =~ ["$c"] ]] (compare with the = operator)? See also: [[ '\' =~ [^]"."] ]] which returns false... Note that you can do shopt -s compat31 in bash to get the zsh behaviour. – Stéphane Chazelas Jan 30 '17 at 07:41
  • zsh/bash -o compat31's behaviour for [[ a =~ '.*' ]] is also consistent with [ a '=~' '.*' ] (for [ implementations that support =~) or expr a : '.*'. OTOH, it's not consistent with [[ a = '*' ]] vs [[ a = * ]] (but then, globs are part of the shell language, while REs are not). – Stéphane Chazelas Jan 30 '17 at 08:00
  • 1
    To deal with characters in the pattern that might be interpreted by the shell, it's often recommended to do something like this: pat="..."; if [[ "$string" =~ $pat ]]; then .... (@StéphaneChazelas's topmost comment suggested it, I'm just emphasizing it.) – dubiousjim Sep 30 '17 at 22:43
11

You should read the bash man pages, under the [[ expression ]] section.

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered an extended regular expression and matched accordingly (as in regex(3)).

Long story short, =~ is an operator, just like == and !=. It has nothing to do with the actual regex in the string to its right.

Sokel
  • 1,964
  • 1
    Can you figure out some examples demonstrating the use of =~ in real life...? – George Vasiliou Jan 27 '17 at 06:44
  • 1
    @GeorgeVasiliou I use it fairly often in scripts that put the output from a command into a variable. Then the variable is checked to see if it matches some string pattern. This is useful for example if you want to take some action based on some error output from that command. – Michael Martinez Jan 15 '19 at 19:09
  • 4
    @Sokel For some, “RTFM” is easier said than done. ⋯ man [[ expresssion ]] and man [[ return nothing. help [[ returns useful information—since [[ an internal bash command—but does not say whether =~ uses basic or extended regex syntax. ⋯ The text you quoted is from the bash man page. I realize you said “read the bash man pages” but at first, I thought you meant read the man pages within bash. At any rate, man bash returns a huge file, which is 4139 lines (72 pages) long. It can be searched by pressing /▒▒▒, which takes a regex, the flavor of which—like =~—is not specified. – Alex Quinn Jul 05 '19 at 20:10
  • a link to the page where that does appeared would have helped the answer – KansaiRobot Apr 21 '23 at 04:43