10

Why does this fail?

touch "$(printf "a\nb")"; find . -regex './.\n.'

I also tried these, none of which work:

find . -regextype posix-extended -regex '.\n.'
find . -regextype posix-awk -regex '.\n.'
find . -regextype posix-basic -regex '.\n.'
find . -regextype posix-egrep -regex '.\n.'

The only way it seems to work is (thanks @MichaelMrozek)

find . -regex './.'$'\n''.'

Which is cumbersome to say the least. So, why do find's regular expressions seem to be unable to deal with \n?


Update in response to answers so far:

OK, I understand that \n is not part of ERE and that was one of my misunderstandings but find claims to support posix-awk and both gawk and mawk match \n as expected:

$ printf "f1l1\nhas newline:f2l1#f1l2 does not:f2l2#" | 
    mawk -F: 'BEGIN{RS="#"}; ($1~/\n/){print $1}' 
f1l1
has newline

I don't have a pure awk to test with so perhaps POSIX awk does not match? Otherwise is find not actually implementing posix-awk regular expressions?

terdon
  • 242,166

3 Answers3

18

Because GNU find doesn't support \n as an escape sequence. The regexp \n matches the character n. GNU find copies the traditional Emacs syntax, which doesn't have this feature either¹.

While GNU find supports other regex syntax, none support backslash-letter or backslash-octal to denote control characters. You need to include the control character literally in the argument.

There are many different regex syntaxes around. Neither POSIX basic regular expressions (BRE) nor extended regular expressions (ERE) include \n or backslash-octal escapes. Both definitions leave the meaning of backslash when not followed by a special character undefined. The utilities awk and sed both support \n to mean a newline; this is specific to these utilities (and commonplace, but as you see not universal).

From a shell script, you can write

find . -regex $'./.\n.'     # ksh/bash/zsh only
find . -regex './.
.'
find . -name '*
*'

¹ Quite logically: for interactive use, you can type any character with C-q; for programming use, \n exists as part of the string literal syntax.

8

You can't match a newline with '\n' because it has no special meaning in a regular expression (break line for example), but you can match the end of line with $ regular expression.

babasbot
  • 157
  • \n most certainly does have a meaning in a regex, try printf "aa\nbb" | perl -ne 'print if /\n/', that will only match aa\n and skips the bb for example. There do seem to be differences in implementation though cause grep -P won't match that. But how is $ relevant here? I want to match a literal newline, $ matches even in the absence of one: printf "aa" | grep 'a$' – terdon Mar 10 '14 at 17:17
  • 1
    @terdon \n has no special meaning, even in Perl regular expressions. It does, however, have special meaning in interpolated perl strings, of which qr// is one type. Search for \n in man perlre... – derobert Mar 10 '14 at 17:20
  • @derobert fair point, I expressed myself badly. I meant that \n matches newlines in regular expressions. You and babaslovesyou are quite right that it has no special meaning as such, I just mean that is is "matchable" . – terdon Mar 10 '14 at 17:22
  • @terdon There are many different regex syntaxes, so claiming that “\n has a meaning in a regex” without specifying the regex syntax doesn't make sense. – Gilles 'SO- stop being evil' Mar 10 '14 at 17:24
  • 2
    @terdon Well, except you're trying to match the character 0x0A (newline), and you're trying to do it with the character sequence 0x5C (backslash) 0x6E (n). Since \n has no special meaning, it tries to match itself. The \ may or may not get stripped out (invalid escape) depending on RE engine, but you're trying to match vs \n or n, neither matches. – derobert Mar 10 '14 at 17:27
  • 2
    @terdon in your Perl example, what's actually happening is that the string parsing is turning \n into , before passing it off to the regexp engine. That's a feature of Perl string parsing. – derobert Mar 10 '14 at 17:28
  • @derobert Oh. Thanks, had no idea. And +1 to babasloves you. – terdon Mar 10 '14 at 17:29
1

I think because find using fnmatchfunction in standard C library, so if FNM_NOESCAPE is not set, a backslash character in pattern followed by any other character will match that second character in string.

FNM_NOESCAPE

Don't treat the `\' character specially in patterns. Normally, `\' quotes
the following character, turning off its special meaning (if any) so that it 
matches only itself. When quoting is enabled, the pattern `\?' matches only 
the string `?', because the question mark in the pattern acts like an 
ordinary character. If you use FNM_NOESCAPE, then `\' is an ordinary character.

I check with find (GNU findutils) 4.4.2 and glibc 2.15, this option is off. check line 42 in fnmatch.h:

#define FNM_NOESCAPE    (1 << 1) /* Backslashes don't quote special chars.  */
cuonglm
  • 153,898