4

I am just starting learning regex and want to use it instead of others everywhere for practice.

I encounter such a situation when tried to find files with extensions sh or md

$ find . regex ".*\.(sh|md)$"
.
./bogus.py
./cofollow.py
./data8.txt
./example.sh
./longest_word_2.sh
./posit_param.sh
./cobroadcast2.py

Unfortunately it output /bogus.py,

I notice the BRE rules and tried escape ()

$ find . -regex ".*\.\(sh|md\)$"
#get nothing return

After series of search, I got -regextype solution Regular Expressions - Finding Files

$ find . -regextype posix-extended -iregex ".*\.(sh|md)$"
./example.sh
./longest_word_2.sh
./posit_param.sh

$ find . -regextype egrep -iregex ".*\.(sh|md)$"
./example.sh
./longest_word_2.sh
./posit_param.sh
./table_regex_bat.md

Additionally, a nice modular solution

$ find -type f | egrep ".*\.(sh|md)$"
./example.sh
./longest_word_2.sh
./posit_param.sh
./table_regex_bat.md

However, there is a shortcut in BSD to accomplish such a task with a -E predicate.

$ /usr/bin/find -E . -regex ".*\.(sh|md)$"
./example.sh
./longest_word_2.sh
./posit_param.sh

I am determined to exclusively take the GNU tool in order to make my codes and skills portable.

So I am starting to alias 'find -regextype egrep`,
Unfortunately find obtain the $1 as path.

How could I solve them problem in a handy way?

Inian
  • 12,807
Wizard
  • 2,503

3 Answers3

7

Don't use an alias to pass arguments around. They are not portable and useful only on interactive shells. Use a function instead and pass the arguments as paths needed

regexFind() {
    (( "$#" )) || { printf 'Insufficient arguments provided \n' >&2; return 1; }
     find "$1" -regextype egrep -iregex ".*\.(sh|md)$"
}

and call the function as

regexFind "/home/foo/bar"

Also to add to your findings, note that bash also has an intrinsic way to glob files. You just need to enable a couple of extended shell options to make it work. The -s enables the option and -u disables it.

The nullglob allows to ignore un-expanded glob results as valid matches. So assuming you want to match files ending with *.sh and *.md, you just need to navigate to that particular directory and do

shopt -s nullglob
fileList=(*.sh)
fileList+=(*.md)
shopt -u nullglob

and print the results to see below. Remember to quote the expansion to prevent the filenames from undergoing Word-Splitting.

printf '%s\n' "${fileList[@]}"
Inian
  • 12,807
5

Note that GNU find's default regexps are not BRE, but regexps from some ancient versions of GNU emacs (some sort of hybrid between BRE and ERE where for instance, + is supported, but you need \(...\) and | is supported, but as \|).

With BSD find, the default is BRE, and you can use the -E option to enable EREs, so there, it's just a matter of:

alias efind='find -E'

or:

efind() { find -E "$@"; }

In GNU find, enabling EREs is with a -regextype posix-extended predicate, not option. That predicate must appear after the file names which if present must appear after the options and before the -regex or -iregex that make use of them.

The GNU find syntax is:

find [options] [files] [predicates]
                      ^

So you need to insert it there (at the position marked with ^).

So, when defining a wrapper function or script, you need to take that into account: skip all the options and file names and insert the -regextype posix-extended right after them.

efind() (
  found_predicate=false
  for arg do
    "$found_predicate" || case $arg in
      (-[LPDd]|-[OD]*) ;;  # skip options
      (-*|['()!'])
        set -- "$@" -regextype posix-extended
        found_predicate=true;;
    esac
    set -- "$@" "$arg"
    shift
  done

exec find "$@" )

A couple other notes:

  • your first one printed bogus.py not because BRE were used but because you used regex instead of -regex. regex was taken as a file name, not predicate.
  • find . | egrep ... is not valid because file paths may be made of more than one line. With GNU tools or compatible, you can do find . -print0 | grep -zE ... to work with NUL-delimited records (and pipe to tr '\0' '\n' if it's for display.
3
find . -type f \( -name '*.sh' -o -name '*.md' \)

This would work with all implementations of find since it doesn't require support for regular expression matching.

To make this more flexible:

suffixfind () (
    dir=$1
    shift

    for suf do
        set -- "$@" -o -name "*.$suf"
        shift
    done
    shift

    find "$dir" -type f \( "$@" \)
)

This helper shell function (which would work in any sh-like shell) would pick out the first command line argument and put it in the variable dir. Then it would construct a list of -name "*.<suf1>" -o -name "*.<suf2>" (etc.) with all the filename suffixes on the function's command line before calling find with that list to find files in or under $dir.

You would use it like

suffixfind /usr sh md txt

to find all regular files with names ending in .sh, .md or .txt in or under the path /usr.

A slightly more verbose variation of the above using bash arrays and bash local variables:

suffixfind () {
    local dir=$1
    shift

    local names

    names=( -name "*.$1" )
    shift
    for suf do
        names+=( -o -name "*.$suf" )
    done

    find "$dir" -type f \( "${names[@]}" \)
}

About your mentioning of GNU tools and portability: Note that GNU tools on non-Linux systems are sometimes available, but with a g prefix to the tool names. GNU find would therefore be available as gfind to distinguish it from the native find implementation on the system.

Your "GNU portable" approach would therefore have to test whether gfind was available before testing whether find is in fact GNU find. Not until you've done that (possibly by testing the return status and output of find --version) can you be comfortable in knowing that you are dealing with GNU find.

Kusalananda
  • 333,661
  • And a big portability problem is that you cannot e.g. access GNU find under it's native name gfind if you are on Linux. So you are right, the portable way is to use POSIX features only - in special if you can do the job with POSIX features. – schily Nov 02 '18 at 08:50
  • I'm afraid that the OP explicitly stated they want to use a regex match wherever possible (for learning purposes). Although your answer would otherwise be useful, it doesn't satisfy this requirement. – TooTea Nov 02 '18 at 10:20
  • 1
    @TooTea There's a time and place for regular expressions. Finding files using them is neither. Also, the purpose seems to be to create a simple way to search for certain filenames, which is the issue I solved. – Kusalananda Nov 02 '18 at 10:40
  • @Kusalananda I agree completely with your reasoning, but OP explicitly wants to abuse REs everywhere to exercise their regex-fu. When they want to learn how to tighten screws with a hammer, giving them a screwdriver guide does not really help. – TooTea Nov 02 '18 at 10:59