5

Is this related to a bug, etc. or this is how it should be?

find ./frontend -mindepth 1 -regex '^./dir1/dir2\(/.*\)?' works on Ubuntu but not Alpine (docker)

find ./frontend -mindepth 1 -regex '^./dir1/dir2\(/.*\)\?' works on Alpine (docker) but not Ubuntu

Alpine: 3.14

Ubuntu: 18.04

1 Answers1

7

They use different syntaxes for regular expressions.

GNU find's -regex uses Emacs regular expressions by default. This can be changed with the option -regextype which is specific to GNU find; other choices include POSIX BRE (basic regular expressions, as in grep and sed) and POSIX ERE (extended regular expressions, as in grep -E and (almost) awk).

BusyBox find's -regex uses POSIX BRE (the default for the regexc function). Because BusyBox is designed to be small, there is no option to use a different regex syntax.

FreeBSD, macOS and NetBSD default to BRE, and can use ERE with the -E option.

POSIX does not standardize -regex.

For your command:

  • In BRE (basic), grouping is \(…\). The zero-or-one operator is \? if present, but it is an optional feature, present in BusyBox when built with Glibc (I'm not sure about other libc) but not on BSD. Zero-or-one can also be spelled \{0,1\}.
  • In Emacs RE, grouping is \(…\) and the zero-or-one operator is ?. Although Emacs itself also supports \{0,1\} to mean zero-or-one, GNU find's Emacs regex syntax doesn't.
  • In ERE (extended), grouping is (…) and the zero-or-one operator is ?.

If you need portability between the various implementations of find that implement -regex, you need to stick to POSIX BRE constructs (for the sake of BusyBox) that are spelled the same in GNU find's Emacs syntax. This means there's no zero-or-one operator.

find ./frontend -mindepth 1 \( -regex '^./dir1/dir2/.*' -o -regex '^./dir1/dir2' \)

Or, alternatively, arrange to pass -regextype posix-basic to GNU find.

case $(find --help 2>/dev/null) in
  *-regextype*) find_options='-regextype posix-basic';;
  *) find_options=;;
esac
find ./frontend $find_options -mindepth 1 -regex '^./dir1/dir2\(/.*\)\{0,1\}'

If dir1 and dir2 are plain strings an not regexes, you're not getting any use from -regex and you can just write

find ./frontend/dir1/dir2 -maxdepth 1
  • There’s also a fourth option that’s just as portable as the last one, but still works if you need the regex syntax and avoids the extra boilerplate of the second approach: Just use find to generate the base list of files, and then pipe that to grep to do the regex filtering. Not as efficient, but guaranteed to work on Busybox, GNU coreutils, and any POSIX compliant system. For example: find ./frontend -maxdepth 1 | grep '^./dir1/dir2\(/.*\)\{0,1\}'. – Austin Hemmelgarn Dec 01 '21 at 00:03