4

Supposing this file system structure:

ROOT
    DIR1A
        FILE
        DIR2A
        DIR2B
            DIR3A
    DIR1B
        DIR2C
        DIR2D
            DIR3B
    DIR1C
        DIR2E
            FILE

Starting from an arbitrary directory, how can I list only the shallowest of it's child directories which in turn contain either a) nothing or b) only empty directories all the way down, but without listing said empty children?

That is, in the case above, if I started at ROOT:

  1. DIR1A would NOT be listed, because it contains a file.
  2. DIR2A WOULD be listed, because it contains nothing.
  3. DIR2B WOULD be listed, because it contains only empty directories.
  4. DIR3A would NOT be listed, because it is within a shallower directory that's already been listed.
  5. DIR1B WOULD be listed, because it contains only empty directories.
  6. Children of DIR1B would NOT be listed, because they are within a shallower directory that's already been listed.
  7. Both DIR1C and DIR2E would NOT be listed, because there's a file nested in there.

I'm confident there's a more efficient way to say this. Perhaps "I want to list only the highest-order directories which contain either nothing or solely empty directories, all the way down"?

EDIT: I attempted to clarify some of the language above.

bland328
  • 141
  • Can you elaborate this Children of DIR1B would NOT be listed, because they are within a listed parent? – RomanPerekhrest Mar 23 '18 at 15:41
  • Yes--since DIR1B contains no FILES (in this case, some empty directories, yes, but no files) all the way down, IT is listed. But its empty-dir children are not listed because they are contained by a listed dir Perhaps I should reword this as "Starting at an arbitrary directory, I want to recursively list every directory that is either empty or contains only empty directories, but without listing any of the children of those directories." – bland328 Mar 24 '18 at 20:15

2 Answers2

1

To avoid traversing the directory tree too many times and minimizing the number of commands you run, you could do (assuming GNU find and sort and a awk like GNU's that supports NUL as the Record Separator):

find . -type d -print0 -o -printf 'f/%h\0' |
  LC_ALL=C sort -zru |
  LC_ALL=C awk -F/ -vRS='\0' '
    function parent(path) {
      sub("/[^/]*$", "", path)
      return path
    }
    $1 == "f" {
      sep = path = ""
      for (i = 2; i <= NF; i++) {
        black[path = path sep $i]
        sep = FS
      }
      next
    }
    ! ($0 in black) && ($0 == "." || parent($0) in black)'

Where we paint black all the directories that contain a non-directory file anywhere below it, and then print the non-black dirs that have a black parent (or no parent for the special case of .).

Note that if the aim is to delete those directories, you could just do:

find . -depth -type d -empty -delete

-delete implies -depth, but I still add it here for clarity (as the GNU find manual recommends). -delete would only delete empty directories anyways, with -empty we avoid the error message when it fails to delete non-empty dirs,. By working our way depth first, we end up deleting whole structures that don't contain non-directory files, deleting the leaves before the branches they're on.

-delete and -empty are non-standard extensions -delete from BSD, -empty from GNU find, but both fairly common these days. If your find doesn't have them, you can always replace both with -exec rmdir {} + (and maybe discard error messages with 2> /dev/null, though you'd then miss all error messages by both find and rmdir).

0

Belatedly, here you go:

find -type d -exec sh -c '[ -z "$(find "$@" -type f -print -quit)" ]' _ {} \; -print -prune

Example

# Setup your configuration
mkdir -p root/{dir1a/{dir2a,dir2b/dir3b},dir1b/{dir2c,dir2d/dir3b},dir1c/dir2e}
touch root/{dir1a,dir1c/dir2e}/file

Run the finder

find root -type d -exec sh -c '[ -z "$(find "$@" -type f -print -quit)" ]' _ {} ; -print -prune

Output

root/dir1b root/dir1a/dir2b root/dir1a/dir2a

Explanation

The exec subshell is called for each directory in turn, starting from the top-level and working downwards (i.e. width-first). It searches for files from the current point, returning true iff there are none. The main find takes the status result from its exec, and if it was successful then prints the current directory and stops searching the remainder of that subtree.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287