how to delete all files with specific extension in specific named folders in large tree?

Question

I have large tree, with many pdf files in it. I want to delete the pdf files in this tree, but only those pdf files in sub folders named rules/ There are other type of files inside rules/. The rules/ subfolders have no other subfolders.

For example, I have this tree. Everything below 'source'

  source/
         A/
            rules/*.pdf, *.txt, *.c,etc..
            etc/
         B/
            keep_this.pdf                
            rules/*.pdf
            whatever/
         C/ 
            D/
               rules/*.pdf
               something/

and so on. There are pdf files all over the place, but I only want to delete all the pdf files which are in folders called rules/ and no other place.

I think I need to use

  cd source
  find  / -type d -name "rules"  -print0 | xargs -0 <<<rm *.pdf?? now what?>>>

But I am not sure what to do after getting list of all subfolders named rules/

Any help is appreciated.

On Linux mint.

score 9 · Accepted Answer · answered Mar 16 '16 at 00:03

I would execute a find inside another find. For example, I would execute this command line in order to list the files that would be removed:

$ find /path/to/source -type d -name 'rules' -exec find '{}' -mindepth 1 -maxdepth 1 -type f -iname '*.pdf' -print ';'

Then, after checking the list, I would execute:

$ find /path/to/source -type d -name 'rules' -exec find '{}' -mindepth 1 -maxdepth 1 -type f -iname '*.pdf' -print -delete ';'

score 5 · Answer 2 · answered Mar 16 '16 at 01:48

With a shell that supports extended globs and null globs e.g. zsh:

for d in ./**/rules/
do
set -- ${d}*.pdf(N)                               
(( $# > 0 )) && printf %s\\n $@
done

or bash:

shopt -s globstar
shopt -s nullglob
for d in ./**/rules/
do
set -- "${d}"*.pdf
(( $# > 0 )) && printf %s\\n "$@"
done

replace printf %s\\n with rm if you're happy with the result.

Since you are on gnu/linux you could also run:

find . -type f -regextype posix-basic -regex '.*/rules/[^/]*.pdf' -delete

remove -delete if you want to perform a dry-run.

score 1 · Answer 3 · answered Mar 16 '16 at 16:38

1

Easiest would be

find source -name '*.pdf' -path '*/rules/*.pdf' -exec rm '{}' +

Why the first -name? Because it'll be a bit faster this way. Also + instead of ; executes one rm with many arguments instead of many with one argument. So less process spawning. In bash you can get away without quoting {}.

answered Mar 16 '16 at 16:38

Torinthiel

179

Yes, I had this in mind but I avoided using it due to the initial requirements. Question was edited and this works now because "the rules/ subfolders have no other subfolders." otherwise with subdirs, a file like ./somedir/rules/noway/somefile.pdf would be deleted even if it's not in rules but in one of its children so in that case prolly something like find . -path '*/rules/*/*' -prune -o -path '*/rules/*.pdf' -delete Anyway, you get my vote. As to in bash you can get away without quoting {} - only a couple of shells need that – don_crissti Mar 16 '16 at 18:57
Why -exec rm with all the parsing pitfalls and race conditions waiting to happen instead of -delete? – Caleb Mar 16 '16 at 19:55
@Caleb - would you mind elaborating a bit on the first part ? What "parsing pitfalls" are we talking about when using find with -exec rm {} ? – don_crissti Mar 16 '16 at 21:22

raf · Answer 4 · 2023-11-17T07:02:37.977

1

Disclaimer: I am the current author of rawhide (rh) program that is used in this answer (see https://github.com/raforg/rawhide):

rh -UUU '"*/rules/*.pdf".path'

The '"*/rules/*.pdf".path' argument searches for pdf files inside/under directories named "rules".

The -UUU argument unlinks/deletes/removes the matching files.

edited Nov 17 '23 at 07:02

answered Apr 24 '23 at 08:28

raf

171

dma1324 · Answer 5 · 2016-03-16T00:34:02.727

-1

You can use a bash script to do it (not the best way):

#!/bin/bash

# Don't screw us up with spaces!
IFS=$'\n'; set -f

DIRS=$(find . -type d -name "rules")

for i in $DIRS; do
  set +f
  rm $i/*.pdf
done
set +f

This iterates through the directories you find in your find command and removes the pdf's under each directory.

The line IFS=$'\n' is to cope with spaces in file names, and set -f is to cope with wildcard characters. Of course, this is assuming you don't have newlines in any of your filenames. If you do, the solution will become a lot more complicated.

edited Mar 16 '16 at 00:34

answered Mar 15 '16 at 23:49

dma1324

107

3

What if any of those dirs has funky chars in its name (path) ? (hint: IFS) – don_crissti Mar 15 '16 at 23:57
@don_crissti The script would fail. Don't put funky chars in your filenames, children! Although, any whitespace would also fail the script. That's a good point. – dma1324 Mar 16 '16 at 00:01
2

That's still pretty bad. Avoid the for f in $(find...) construct. find is a robust tool and you can do this find-only in a reliable manner. – don_crissti Mar 16 '16 at 00:15
Better than fiddling with IFS, you should just avoid the split+glob operator. There are robust ways to do it, why insist on fragile ways? – Gilles 'SO- stop being evil' Mar 16 '16 at 00:25
It's just one way to do it. There are other ways that I haven't done before. In fact, this is the way I would have done it before I learned about these things in find. – dma1324 Mar 16 '16 at 00:27
Have find execute your shell code e.g. find . -type d -name rules -exec sh -c 'set -- "$0"/*.pdf; printf %s\\n "$@"' {} \; (here instead of removing it prints the names of pdf's if any) that way it will always work with any kind of file names. – don_crissti Mar 16 '16 at 01:04

how to delete all files with specific extension in specific named folders in large tree?

5 Answers5