0

I have two questions about using regular expressions inside a shell.

1. Use of *

According to wikipedia page on regular expressions:

*: The asterisk indicates there is zero or more of the preceding element. For example, ab*c matches "ac", "abc", "abbc", "abbbc", and so on.

However, when I write rm test*.iso, it will delete all files beginning by "test" and finishing by ".iso" whatever there is (or not) between "test" and ".iso". So, the file "tes.iso" will be not deleted.

If you take the abc example, according to wikipedia, "ab*c" matches "ac". Therefore rm ab*c should delete an "ac" file. Why rm does not use regular expression as wikipedia describes them?

2. Use of - and ?

Still according to wikipedia page on regular expressions:

?: The question mark indicates there is zero or one of the preceding element. For example, colou?r matches both "color" and "colour".

+: The plus sign indicates there is one or more of the preceding element. For example, ab+c matches "abc", "abbc", "abbbc", and so on, but not "ac".

"?" seems to work like "*", that is to say "?" symbol replaces one or zero element (but not the preceding one contrary as what wikipedia says).

I don't know how to use "+" and this is my second question. I'm also interested by any tricks concerning regular expressions and rm, cp, ....

ppr
  • 1,907
  • 7
  • 23
  • 40
  • 2
    This question is not about rm using regular expressions (rm doesn't know anything about regexes). It is about the shell's pathname expansion and regexes. What makes you believe that the shell treats whatever* as an regex? – Hauke Laging May 11 '14 at 14:29
  • I reformulated my question. I hope it's correct now. – ppr May 11 '14 at 14:32
  • It seems you don't get it. The shell does not consider such code as regex at all. Thus it doesn't matter what you have read about regexes. The question is: How to make the shell use regex instead of plain globbing? – Hauke Laging May 11 '14 at 14:39
  • @HaukeLaging You said that the shell does not consider such code as regex but the shell do use * (for example) as tool for manipulating files.* means "whatever"; so * is a regular expression in a shell. – ppr May 11 '14 at 14:47
  • I did not downvote but in general, you should not ask multiple questions in a single post. Please break them up into separate questions. – terdon May 11 '14 at 15:09
  • @ppr "so * is a regular expression in a shell" No, that is plain wrong. There is a certain meaning of the term "regular expression" and the behaviour you point out does not fit this description. You cannot act like "Oh, there is something. I will refer to it as 'regular expression'." That does not make any sense at all. The shell uses * but that has nothing to do with regular expressions. It is called globbing. – Hauke Laging May 11 '14 at 15:16
  • @HaukeLaging On the content, you're wrong. A regular expression is "a sequence of characters that forms a search pattern, mainly for use in pattern matching with strings, or string matching" (wikipedia). So * in a shell is a regex (in a broad sense) even if the effect is a little different than * the common/most popular regex. * is not treated as plain text by the shell... I have also some remarks on the form of your comment, but this is not the place, so I created a chatroom for this. – ppr May 11 '14 at 16:12

2 Answers2

2

The shell's pattern matching notation is described in this standards document.

As the document says in its introduction, pattern matching notation is related to, but slightly different from, regular expression notation.

In particular, ? in the shell acts like . in a regular expression, and * in the shell acts like .* in a regular expression. (But neither of them will match a . at the beginning of a filename.) + in the shell does not have any special pattern-matching ability. However, as @HaukeLaging says in his answer, certain shells can optionally have regular expression notation enabled, although doing so is nonstandard.

Mark Plotnick
  • 25,413
  • 3
  • 64
  • 82
0

The shell uses regexes under certain conditions only (which probably differ from shell to shell).

In bash you have to activate them with: shopt -s extglob

After that you can use something like:

echo a*(b)c

See the block Pattern Matching in man bash.

Hauke Laging
  • 90,279
  • That is not what I asked. I asked why regular expressions of the shell (available out of the box) doesn't work as regex. And Mark Plotnick answered well. – ppr May 11 '14 at 14:54
  • 1
    No. @(ab*c) is not regex. @(...) is used for alternation (like @(foo|bar)) in ksh regexp (a subset of which bash understands with the extglob option). The equivalent of regexp ab*c with ksh patterns is a*(b)c. – Stéphane Chazelas May 11 '14 at 15:13
  • To get regexp with ksh patterns, it's ~(G:basic-regexp), ~(E:extended-regexp), ~(P:PCRE) or ~(X:augmented-regexp) (ksh93/ast specific). Those are not supported by bash or zsh though. – Stéphane Chazelas May 11 '14 at 15:16