Get list of subdirectories which contain a file whose name contains a string

Question

How can I get a list of the subdirectories which contain a file whose name matches a particular pattern?

More specifically, I am looking for directories which contain a file with the letter 'f' somewhere occurring in the file name.

Ideally, the list would not have duplicates and only contain the path without the filename.

score 92 · Accepted Answer · edited Sep 11 '21 at 18:36

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort |uniq

The above finds all files below the current directory (.) that are regular files (-type f) and have f somewhere in their name (-name '*f*'). Next, sed removes the file name, leaving just the directory name. Then, the list of directories is sorted (sort) and duplicates removed (uniq).

The sed command consists of a single substitute. It looks for matches to the regular expression /[^/]+$ and replaces anything matching that with nothing. The dollar sign means the end of the line. [^/]+' means one or more characters that are not slashes. Thus, /[^/]+$ means all characters from the final slash to the end of the line. In other words, this matches the file name at the end of the full path. Thus, the sed command removes the file name, leaving unchanged the name of directory that the file was in.

Simplifications

Many modern sort commands support a -u flag which makes uniq unnecessary. For GNU sed:

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort -u

And, for MacOS sed:

find . -type f -name '*f*' | sed -E 's|/[^/]+$||' |sort -u

Also, if your find command supports it, it is possible to have find print the directory names directly. This avoids the need for sed:

find . -type f -name '*f*' -printf '%h\n' | sort -u

More robust version (Requires GNU tools)

The above versions will be confused by file names that include newlines. A more robust solution is to do the sorting on NUL-terminated strings:

find . -type f -name '*f*' -printf '%h\0' | sort -zu | sed -z 's/$/\n/'

Simplified using dirname

Imagine needing the command in a script where command will be in single quotes, escaping sed command is painful and less than ideal, so replace with dirname.

Issues regard special chars and newline are also mute if you did not need to sort or directories names are not affected.

find . -type f -name "*f*" -exec dirname "{}" \; |sort -u

take care of newline issue:

find . -type f -name "*f*" -exec dirname -z "{}" \; |sort -zu |sed -z 's/$/\n/'

I have a lot of files that makes sorting them all too costly. Throwing uniq into the mix helps a lot by removing the repeated lines that are already right next to each other. find . -type f -name '*f*' -printf '%h\0' | uniq -z | sort -zu | tr '\0' '\n'. Or if your tools are a little older, then uniq may not have the -z option. find . -type f -name '*f*' -printf '%h\n' | uniq | sort -u — jbo5112, Jun 30 '17 at 18:06
RE: file names that include newlines - bonus points for diligence and taking extra care; don't think I have ever encountered those, though - in what context would those ever occur ? you aren't talking about whitespace as in spaces or tabs, are you ? — ssc, Jun 28 '20 at 08:00
@ssc newlines in file names are rare. Good programmers, however, don't want to write code that works merely most of the time. The goal is to have code that works all of the time. — John1024, Jun 28 '20 at 21:46

score 43 · Answer 2 · edited Apr 07 '15 at 21:54

43

Why not try this:

find / -name '*f*' -printf "%h\n" | sort -u

edited Apr 07 '15 at 21:54

slm

369,824

answered Apr 07 '15 at 21:38

Patrick Taylor

761

1

Best answer. Entirely POSIX-compatible, unlike some answers above, above, and also earns the special The Shortest Pipeline prize :). – kkm -still wary of SE promises Aug 18 '15 at 21:04
1

I would love to see someone show the timing of this vs the others above, because I get the feeling this is by far the fastest. – dlamblin Oct 18 '15 at 16:40
10

@kkm I agree this is the best solution but POSIX specs for find are actually quite sparse—the -printf operator is not specified. This doesn't work with BSD find. So, not "entirely POSIX compatible." (Though sort -u is in POSIX.) – Wildcard Apr 01 '16 at 18:54
1

what is mac equivalent for this? find: -printf: unknown primary or operator – AhmFM Jan 13 '23 at 19:49
Wouldn't uniq be enough? The same directory would always be traversed in sequence by find, or am I thinking wrong? – CervEd Jan 11 '24 at 14:25

slm · Answer 3 · 2014-02-01T03:17:49.417

There are essentially 2 methods you can use to do this. One will parse the string while the other will operate on each file. Parsing the string use a tool such as grep, sed, or awk is obviously going to be faster but here's an example showing both, as well as how you can "profile" the 2 methods.

Sample data

For the examples below we'll use the following data

$ touch dir{1..3}/dir{100..112}/file{1..5}
$ touch dir{1..3}/dir{100..112}/nile{1..5}
$ touch dir{1..3}/dir{100..112}/knife{1..5}

Delete some of the *f* files from dir1/*:

$ rm dir1/dir10{0..2}/*f*

Approach #1 - Parsing via strings

Here we're going to use the following tools, find, grep, and sort.

$ find . -type f -name '*f*' | grep -o "\(.*\)/" | sort -u | head -5
./dir1/dir103/
./dir1/dir104/
./dir1/dir105/
./dir1/dir106/
./dir1/dir107/

Approach #2 - Parsing using files

Same tool chain as before, except this time we'll be using dirname instead of grep.

$ find . -type f -name '*f*' -exec dirname {} \; | sort -u | head -5
./dir1/dir103
./dir1/dir104
./dir1/dir105
./dir1/dir106
./dir1/dir107

NOTE: The above examples are using head -5 to merely limit the amount of output we're dealing with for these examples. They'd normally be removed to get your full listing!

Comparing the results

We can use time to take a look at the 2 approaches.

dirname

real        0m0.372s
user        0m0.028s
sys         0m0.106s

grep

real        0m0.012s
user        0m0.009s
sys         0m0.007s

So it's always best to deal with the strings if possible.

Alternative string parsing methods

grep & PCRE

$ find . -type f -name '*f*' | grep  -oP '^.*(?=/)' | sort -u

sed

$ find . -type f -name '*f*' | sed 's#/[^/]*$##' | sort -u

awk

$ find . -type f -name '*f*' | awk -F'/[^/]*$' '{print $1}' | sort -u

@Muhd - yes the calls to dirname are slow. I'm working on an alternative. — slm, Feb 01 '14 at 02:50

score 3 · Answer 4 · answered Apr 01 '16 at 18:25

3

Here's one I find useful:

find . -type f -name "*somefile*" | xargs dirname | sort | uniq

answered Apr 01 '16 at 18:25

Martin Tapp

241

score 2 · Answer 5 · answered Mar 03 '21 at 22:10

2

You can use the -exec switch to run dirname and get the directory name instead of the file name. This has the added benefit of being POSIX compatible.

find . -name "*file*" -exec dirname {} \;

answered Mar 03 '21 at 22:10

Snowbldr

121

score 1 · Answer 6 · answered Apr 03 '15 at 23:33

This answer is shamelessly based on slm answer. It was an interesting approach, but it has a limitation if the file and/or directory names had special chars (space, semi-column...). A good habit is to use find /somewhere -print0 | xargs -0 someprogam.

Sample data

For the examples below we'll use the following data

mkdir -p dir{1..3}/dir\ {100..112}
touch dir{1..3}/dir\ {100..112}/nile{1..5}
touch dir{1..3}/dir\ {100..112}/file{1..5}
touch dir{1..3}/dir\ {100..112}/kni\ fe{1..5}

Delete some of the *f* files from dir1/*/:

rm dir1/dir\ 10{0..2}/*f*

Approach #1 - Parsing using files

$ find -type f -name '*f*' -print0 | sed -e 's#/[^/]*\x00#\x00#g' | sort -zu | xargs -0 -n1 echo | head -n5
./dir1/dir 103
./dir1/dir 104
./dir1/dir 105
./dir1/dir 106
./dir1/dir 107

NOTE: The above examples are using head -5 to merely limit the amount of output we're dealing with for these examples. They'd normally be removed to get your full listing! also, replace the echowhich whatever command you want to use.

Stéphane Chazelas · Answer 7 · 2023-03-06T17:19:35.760

1

With zsh:

typeset -aU dirs # array with unique values
dirs=( **/*f*(ND:h) )
printf -rC1 -- $dirs

That makes no assumption as to what characters or non-characters file names otherwise contain.

edited Mar 06 '23 at 17:19

answered Apr 01 '18 at 07:04

Stéphane Chazelas

544,893

score 1 · Answer 8 · answered Apr 25 '21 at 22:17

1

I've found this variation that doesn't use sort or unique usefull

find . -type d -print0 | xargs -0 -I{} find {} -maxdepth 1 -iname '*.log' -print -quit

The advantage is that you don't have to wait for the whole tree to be traversed before sorted.

Find all directories find . -type d -print0
For each directory | xargs -0 -I{}, find a file in the current directory -maxdepth 1 that matches the pattern -iname '*.log' (case insensitive). If found, print the filename -print and stop traversing that directory quit

Alternatively

find . -type d -print0 | xargs -0 -I// find // -maxdepth 1 -iname '*.log' -exec dirname {} \; -quit

which just prints the parent directories name, as inspired by Snowbuilders answer.

answered Apr 25 '21 at 22:17

CervEd

174

Nice idea, but sadly slower. I compared time find "$src_path" -mindepth 2 -maxdepth 2 -type d -path "$src_path/4???_?/archive" -print0 | xargs -0 -I{} find {} -maxdepth 1 -type f -printf "%h\n" -quit (1.5s) with time find "$src_path" -mindepth 3 -maxdepth 3 -type f -path "$src_path/4???_?/archive/*" -printf "%h\0" | sort -zu (0.5s). In my case I have ~1000 dirs without any files and ~20 dirs with files. – mgutt Jan 11 '24 at 14:07
@mgutt why the different depth options? – CervEd Jan 11 '24 at 14:18
I wouldn't be surprised if this solution is slower. One of the main reasons for this approach was not waiting for the entire tree traversal, required by sort. I would probably opt for one of the top rated answers if I was trying to do this today, but probably just find / -name '*f*' -printf "%h\n" | uniq. Sorting shouldn't be necessary – CervEd Jan 11 '24 at 14:23
The different depths are (hopefully) performance tweaks to avoid traversal into unnecessary (sub)directories. For example "depth 2" targets directly the parent directory of the files without going deeper. Regarding uniq: yes should work as find does not return the files in a random order. – mgutt Jan 11 '24 at 15:01

score 0 · Answer 9 · answered Mar 06 '23 at 16:39

As long as you don't have more than one file matching per directory (otherwise you will have duplicates), in bash you can leverage from the for construct and on variable expansion (as in the example here, in this case we remove from the path everything after the last /)

for i in `find -name \*<your_filename_string>\*`; do echo ${i%/*}; done

Get list of subdirectories which contain a file whose name contains a string

9 Answers9

Simplifications

More robust version (Requires GNU tools)

Simplified using dirname

Sample data

Approach #1 - Parsing via strings

Approach #2 - Parsing using files

Comparing the results

Alternative string parsing methods

Sample data

Approach #1 - Parsing using files

Linked