13

I want to find a subdirectory of the current directory, which (that is the subdirectory) contains 2 or more regular files.

I am not interested in directories containing less than 2 files, neither in directories which contain only subdirectories.

αғsнιη
  • 41,407
porton
  • 2,186

4 Answers4

15

Here is a completely different approach based on GNU find and uniq. This is much faster and much CPU-friendly than answers based on executing a shell command that counts files for each directory found.

find . -type f -printf '%h\n' | sort | uniq -d

The find command prints the directory of all files in the hierarchy and uniq only displays the directories that appear at least twice.

xhienne
  • 17,793
  • 2
  • 53
  • 69
  • 2
    You shouldn't parse the output of find. In this case, because GNU find will mangle the names of directories that have characters that are not printable in the current locale (like "ä" in the C locale). See also https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice – Kusalananda Oct 09 '17 at 05:33
  • 4
    @Kusalananda, not when the output doesn't go to a tty. Here, the only problem is with the newline characters, which you can fix by using -printf '%h\0' | sort -z | uniq -zd | xargs -r0 ... – Stéphane Chazelas Oct 09 '17 at 16:11
6

With the help of Gilles's answer on SU and its reverse and some modification, here what you need.

find . -type d -exec sh -c 'set -- "$1"/*;X=0; 
    for args; do [ -f "$args" ] && X=$((X+1)) ;done; [ "$X" -gt 1 ] ' _ {} \; -print

Directory tree.

.
├── test
│   ├── dir1
│   │   ├── a
│   │   ├── b
│   │   └── c
│   ├── dir2
│   │   ├── dira
│   │   │   └── a file\012with\012multiple\012line
│   │   ├── dirb
│   │   │   ├── file-1
│   │   │   └── file-2
│   │   └── dirc
│   ├── diraa
│   ├── dirbb
│   ├── dircc
│   └── x
│   └── x1
│   └── x2
└── test2
    ├── dir3
    └── dir4

Result:

./test
./test/dir1
./test/dir2/dirb
αғsнιη
  • 41,407
  • I had this at first too, but you will have problem with directories containing multiple subdirectories and files. It also does not weed out directories only containing subdirectories. – Kusalananda Oct 08 '17 at 13:36
  • It doesn't really solve it. It finds both the test and the dir2 directories in my test setup (see my answer). – Kusalananda Oct 08 '17 at 13:40
  • Works for your example, but add test/x1 and test/x2 as files as well... $1 and $2 will be directories for test, and the directory will be missed. – Kusalananda Oct 08 '17 at 14:47
  • @Kusalananda No way I found except what you answered, I tried to change some part of my command to don't be exact duplicate of yours (I didn't exclude hidden files as you did), my apologize. – αғsнιη Oct 08 '17 at 16:04
  • 1
    No worries whatsoever :-) – Kusalananda Oct 08 '17 at 16:10
5
find . -type d \
    -exec sh -c 'c=0; for n in "$1"/*; do [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 )); done; [ "$c" -ge 2 ]' sh {} ';' \
    -print

This will find all names in or under the current directory and then filter out all names that are not names of directories.

The remaining directory names will be given to this short script:

c=0
for n in "$1"/*; do
    [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 ))
done

[ "$c" -ge 2 ]

This script will count the number of regular files (skipping symbolic links) in the directory given as the first command line argument (from find). The last command in the script is a test to see if the count was 2 or greater. The result of this test is the return value (exit status) of the script.

If the test succeeded, -print will cause find to print out the path to the directory.

To also consider hidden files (files whose names begins with a dot), change the sh -c script from saying

for n in "$1"/*; do

to

for n in "$1"/* "$1"/.*; do

Testing:

$ tree
.
`-- test
    |-- a
    |-- dir1
    |   |-- a
    |   |-- b
    |   `-- c
    `-- dir2
        |-- dira
        |-- dirb
        |   |-- file-1
        |   `-- file-2
        `-- dirc

6 directories, 6 files

$ find . -type d -exec sh -c 'c=0; for n in "$1"/*; do [ -f "$n" ] && [ ! -h "$n" ] && c=$(( c + 1 )); done; [ "$c" -ge 2 ]' sh {} ';' -print
./test/dir1
./test/dir2/dirb
Kusalananda
  • 333,661
  • Your solution doesn't count files with a name starting with a dot. You should also initialize c=0 in order to avoid error messages with directories that do not contain any file. – xhienne Oct 08 '17 at 14:20
  • @xhienne I considered hidden files and will add a note about it. There is no error if there are no regular files in a directory since [ "" -ge 2 ] is a valid test. – Kusalananda Oct 08 '17 at 14:25
  • Not sure how you define "valid". POSIX requires arg1 to be an integer value. dash, bash --posix and test all display an error message and exit with 2 (i.e. "An error occurred") – xhienne Oct 08 '17 at 14:36
  • @xhienne Ah, I was testing on a system that mas ksh running as sh. Will amend immediately. Thanks for poking at me! :-) – Kusalananda Oct 08 '17 at 15:00
  • Also, [ -f ... ] dereferences symbolic links. You should add a test to eliminate them since the question specifies that only regular files should be counted. – xhienne Oct 08 '17 at 15:49
  • @xhienne Sorted. – Kusalananda Oct 08 '17 at 15:52
  • Actually, the change for dot-files should be to for n in "$1"/* "$1"/.[!.]* "$1"/..?*; do –  Oct 08 '17 at 23:35
  • @Arrow Why? Look at how the pattern is used. – Kusalananda Oct 09 '17 at 05:23
  • @Arrow Or rather, how $n is used. – Kusalananda Oct 09 '17 at 05:36
  • Ahh, no dir called .. will be processed, clear now, thanks. –  Oct 09 '17 at 06:37
4

Another find + wc approach:

find path/currdir -maxdepth 1 -type d ! -empty ! -path "path/currdir" \
-exec sh -c 'count=$(find "$1" -maxdepth 1 -type f | wc -l); [ $count -ge 2 ]' _ {} \; -print

  • path/currdir - path to your current directory

  • -maxdepth 1 - consider only direct child subfolders

  • ! -empty - ignore empty subfolders

  • ! -path "path/currdir" - ignore the current directory path

  • count=$(find "$1" -maxdepth 1 -type f | wc -l) - count is assigned with the number of files for each found subfolder

  • [ $count -ge 2 ] ... -print - print subfolder name/path containing 2 or more regular files