Basically, the issue is with the doubled repetition of the a
in the pattern. The part a*
allows for any number of a
's, while the surrounding (·)*
also allows any number of the contained pattern.
This allows for a huge number of possible ways to match the pattern against a string of a
's. Ignoring the b
for now, a string like aaaaa
(five a
's) could be matched as (aaaaa)
, (aaaa)(a)
, (aaa)(aa)
, (aaa)(a)(a)
, (aa)(aaa)
, (aa)(aa)(a)
... There's an exponential number of ways to match the string.
With the b
at the end, a backtracking engine will try one way of matching the a
's, realizes it doesn't find the b
, goes back one step, tries another way, realizes it can't find the b
, ... and takes a long time to exhaust all possible arrangements, after which it fails.
There are much better texts on this subject online than I could ever write. Go read these:
In practice, if you can, avoid patterns that allow for multiple ways to match a string. The example here, (a*)*c
, is obviously silly, since it's equivalent to a*c
which doesn't have the nested repetition.