3

I want to move files bigger than "300Mb" from one directory tree where each file is located in subfolders

Example: I have a directory structure:

dirA/
dirA/file1
dirA/x/
dirA/x/file2
dirA/y/
dirA/y/file3

Here is the expected result, a "move" of the directory tree where each file is a moved to the sub folders:

dirB/            #  normal directory
dirB/file1       #  moved from dirA/file1
dirB/x/          #  normal directory
dirB/x/file2     #  moved from dirA/x/file2
dirB/y/          #  normal directory
dirB/y/file3     #  moved from dirA/y/file3

The find /path/ -type f -size +300m but then what? and unfortunately some of the files have all sorts of characters you can find on your keyboard.

I have been looking at this thread where someone is talking about cpio but I don't know that program...

PS: have GNU Parallel installed if this could speed up things?

Joakim
  • 31

4 Answers4

4

The easy way is with zsh. You can use glob qualifiers to match files according to criteria such as their type and size. The wildcard pattern **/ matches any level of subdirectories. The history modifiers h and t are easy ways of extracting the directory and the base part of a filename. Call mkdir -p to create the directories when needed.

cd dirA
for x in **/*(.Lm+300); do
  mkdir -p ../dirB/$x:h &&
    mv -- $x ../dirB/$x
done

The portable way is with find. Use -exec to invoke a shell snippet for every file.

cd dirA
find . -type f -size +300000k -exec sh -c 'for x do
  mkdir -p "../dirB/${x%/*}"
  mv "$x" "../dirB/$x"
done' sh {} +

Parallelization is rarely useful for input/output: it lets you take advantage of multiple CPUs but the CPU is rarely a bottleneck in I/O.

  • Thx for your sharing, i tried your first suggestion, but it gave an error syntax error near unexpected token `(' so I tried your second suggestion and that one worked beautifully = +1 from me – Joakim Nov 08 '16 at 20:35
  • @Joakim The first command needs to be run in zsh, not in (guessing) bash. – Gilles 'SO- stop being evil' Nov 08 '16 at 20:40
  • How would I run it in zsh? and PS just noticed a huge mistake in the second suggestion dirA/5/8/0/2/6/$filename became dirB/5/8/0/2/6/$filename/$filename ... – Joakim Nov 08 '16 at 21:01
  • @Joakim Zsh is a shell, i.e. a command interpreter. You just start it (install it first of course, it comes with macOS but on Linux you often need to install the package explicitly) and type the command. Or put the command in a script starting with #!/bin/zsh. I've fixed the bug in the second snippet, thanks. – Gilles 'SO- stop being evil' Nov 08 '16 at 21:14
  • hi @Gilles you done a great code job here :) have tested both and they works nicely :) thx – Joakim Nov 11 '16 at 16:43
  • Note that here, unless dirB is on a different file system, there will be little I/O and parallelisation may very well help. zmodload zsh/files as well – Stéphane Chazelas Nov 15 '16 at 18:20
2

Perl rename is the obvious choice. It may be installed as ren, rename, or pren:

find dirA -type f -size +300M | ren 's:^dirA/:dirB/:'

It does, however, not work if the files are moved to a different mount point, and will fail if the dirs are not there.

GNU Parallel will be slower:

cd dirA
find . -type f -size +300M | parallel mkdir -p ../dirB/{//}
find . -type f -size +300M | parallel mv {} ../dirB/{}

but will work even if it needs to do the copy-then-remove routine to get the files onto a different file system.

Miati
  • 3,150
Ole Tange
  • 35,514
  • With parallel, might work better since it can support null characters with -print0 in find, -0 in parallel per OP: "and unfortunately some of the files have all sorts of characters you can find on your keyboard." – Miati Nov 08 '16 at 18:50
  • Tried this one too, but I most have got something wrong :( parallel returned Warning: Did you mean to use the --null option? and added an extra dot in file location.. But thx for your reply – Joakim Nov 08 '16 at 20:36
  • GNU Parallel deals correctly with weird charaters. Only if you have newline in the filename then you need 'find ... -print0 | parallel --null ...'. – Ole Tange Nov 08 '16 at 22:51
  • Hi @Ole-Tange tested this one again, and it worked find, but I'm a bit sad that everytime I use parallel I see this output, rather than what is happening. Academic tradition requires you to cite works you base your article on. When using programs that use GNU Parallel to process data for publication please cite:

    Hvordan for jeg den til at stoppe med det?

    – Joakim Nov 11 '16 at 16:42
  • @Joakim It tells you: 'To silence the citation notice: run 'parallel --citation'.' – Ole Tange Nov 11 '16 at 19:08
0

In short:

find dirA -type f -size +300m -printf "mv %p dirB/%P\n" | sh

But, all the subdirectories in dirB must be exists before you start. For this reason I suggest you to do the following two steps:

cd dirA
find . -type f -size +300m -printf "mkdir -p ../dirB/%h\nmv %p ../dirB/%P\n" | sh

Regarding cpio (actually it resolves the subdirectories problem):

(cd dirA; find . -type f -size +300m) | cpio -p -md  dirB

(Regarding the cp(1) in same thread you mention, it is not good for you, because it will copy all the files and build subdirectory named dirA under dirB. The flag -T can resolve this problem)

Udi
  • 75
-1

This ought to cover it.

find /path -type f -size +300m | while read A ; do DEST=${A/dirA/dirB} ; echo mkdir -p $(dirname $DEST) 2>/dev/null; echo mv $A $DEST ; done

Run it as-is first, sanity check, and if happy with the proposed commands, rerun it without the echo elements.

In your file structure example, would generate the following commands

mkdir -p ./dirB
mv ./dirA/file1 ./dirB/file1
mkdir -p ./dirB/x
mv ./dirA/x/file2 ./dirB/x/file2
mkdir -p ./dirB/y
mv ./dirA/y/file3 ./dirB/y/file3
steve
  • 21,892
  • 1
    Your code breaks with file names containing special characters such as spaces. If you add the missing double quotes, your code only breaks on backslashes, trailing spaces and newlines. You should explain how to make ${A/dirA/dirB} work if dirA contains slashes and that it's bash-specific, or better don't use it and use the portable ${A#dirA/}/dirB instead. – Gilles 'SO- stop being evil' Nov 07 '16 at 00:10