3

I am to find all files that start with certain char , e.g

find . -maxdepth 1 \( -name "^m*" -a ! -name "g$" \) -print

but what if someone created file that has special characters in the name of that file? for example

touch "
marst"

this won't be found although it meets the criteria. How should I alter the code in order it to find even files which start with a space?

Also \( -name "^m*" -a ! -name "g$" \) will not work because files in find are not "marr" but "./marr" which means this would find nothing. How to alter the code to match the start of the word too?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
trolkura
  • 407

5 Answers5

3

-name always matches just the name, i.e. without the path; and it matches the whole name. Its value is a pattern, not a regular expression, so filenames starting with m could be found with

-name 'm*'

and names ending in g with

-name '*g'

To use regular expressions, see the -regex option.

choroba
  • 47,233
3

If you want to match on file names that start with m or follow a newline character, then that would be:

NL='
'
find . \( -name 'm*' -o -name "*${NL}m*" \) -print

Note that at least with GNU find, * won't match a byte sequence that don't form a valid character sequence. You'd probably be better of using the C locale if that's a potential issue.

LC_ALL=C find . \( -name 'm*' -o -name "*${NL}m*" \) -print

Example:

$ touch mom $'two\nminutes' $'mad\x80'
$ find . -name 'm*'
./mom
$ find . \( -name 'm*' -o -name "*${NL}m*" \) -print
./two?minutes
./mom
$ LC_ALL=C find . \( -name 'm*' -o -name "*${NL}m*" \) -print
./mad?
./two?minutes
./mom

For file names that have a line starting with m and not line ending with g:

LC_ALL=C find . \( -name 'm*' -o -name "*${NL}m*" \) ! \(
  -name '*g' -o -name "*g${NL}*" \) -print

Some find implementations have some non-standard options to match the file path (usually not name) using regular expressions but the behaviour varies between implementation and those are not needed here.

Where you'd need regular expressions would be for instance to find files whose name has lines starting with m none of which end in g (like $'cat\nman\ndog' but not $'plate\nmug\ncup' nor $'cat\nman\nmug')

With GNU find:

LC_ALL=C find . -regextype posix-extended -regex \
  ".*/(([^m$NL/][^/$NL]*|m[^/$NL]*[^$NL/g]|m|)($NL|\$))*"

Or files whose name have at least a line starting with m and not ending in g (like $'mad\nmug' but not $'ming\nmong'):

LC_ALL=C find . -regextype posix-extended -regex \
  ".*/([^/]*$NL)?m([^$NL/]*[^g$NL/])?(\$|${NL}[^/]*)"
0

You could use the -regex flag to find if you need more sophisticated matching that globs provide. It matches against the whole path though, so if you want to match just the filename part you could do something like

find . -maxdepth 1 -regex '/[ 
]?m[^/]*[^g]$' -print

Note that per this answer you can't use \n to match a newline, so we put a litteral newline in our character class with a space, since you had asked for that.

Eric Renouf
  • 18,431
0

The file created with...

touch "
marst"

... doesn't match any of the two criterias in the question. Because it doesn't start with an m, it starts with a newline. What you search may be something like this:

find . -maxdepth 1 -regex ".*/\s*m[^/]*[^g]"

The -regex matches the whole path of the file. .*/ matches anything until the last slash, which delimits the file and its directory. Now \s* matches whitespace characters (this can be a space, newline, tab); zero or more times. After that the m matches the "begnning" of the filename (without whitespaces of course). [^/]* matches anything that is not a slash. And the final [^g] matches the last character in the filename, which should not be a g.

This will now match:

./?marst
./ marst
./  marst
./marst

The ? indicates where the newline is.


Notce: When you continue processing that output, use the -print0 flag of find:

find . -maxdepth 1 -regex ".*/\s*m[^/]*[^g]" -print0 | xargs -0 ...

So you can process the filelist further, even with such special filenames. It will delimit the list of filenames with a nullbyte. The next utility should read the input also by null byte delimited. For example xargs with the -0 flag. Of course, it depends on what you want to do with those files.

chaos
  • 48,171
0

You do not need the ^ or the $ for simple names in find.
Find use patterns for names. A pattern will:

  • Match the whole name. From start to end. Always.
  • find strips out the path for any file found before using the pattern.
  • the only special characters are * ? and [ ] (not ^ or $).

So, for matching files that start with an m and not finish with an g:

 find . -maxdepth 1 -name 'm*[!g]' -o -name 'm'

The 'm'covers the case where the file has only one character.

However, the file you created with touch $'\nmarst' (yes, a newline could be written like that in bash) does not start with an m, it starts with a new-line $'\n'. There is no way to alternate in simple patterns, but you could use the OR (-o) option of find:

find . -maxdepth 1 \( -name 'm*' -o -name $'\n'"m*" \) -a ! -name '*g'

That will become difficult with longer requirements.
For really complex strings, there is the -regex option in find.