To be able to handle arbitrary directory names, on a GNU system, you could do:
du --block-size=1 -l0d1 |
perl -0 -lne '
print $2 if /^(\d+)\t(.*)/s && $2 ne "." && $1 < 50000 * 1024' |
xargs -r0 mv -vt /destination/directory --
Beware that du
's default block size is 512, but GNU du
decided to change that to 1024 for some reason¹. But that changes when $POSIXLY_CORRECT
is in the environment and can also be changed with the DU_BLOCK_SIZE
, BLOCK_SIZE
and BLOCKSIZE
environment variables.
So best to specify it explicitly on the command line. With a block size of 1, you can specify that disk usage with byte precision which makes it easier to post-process².
With -d1
, same as --max-depth=1
, we get a disk usage for each subdirectory of the current working directory and the current working directory itself (there's no --min-depth
unfortunately). That avoids running on du
per directory but also means it's counting the size of the files in the current directory which we don't need and we need to discard with $2 ne "."
in perl
. Compared to doing du -- */
, that has the a advantage of not skipping hidden directories, and of not processing symlinks to directories.
That means however we need the -l
aka --count-links
to count every hard link to files found in the recursive descent of those directories, as otherwise if a hard link of a given file was found in one of the subdirectories, other hard links to it would not be counted in any of the other directories. Beware that also means that hard links to a given file within a same subdirectory are also all counted so you could end up with a different value than when using du -s
on a single directory.
So if your directory arborescence may contain hardlinks, it may be better to call one du
per directory to make sure they're counted independently from each other.
find . ! -name . -prune -type d -printf '%P\0' |
xargs -r0 -n1 du --block-size=1 -0s -- |
perl -0 -lne '
print $2 if /^(\d+)\t(.*)/s && $2 ne "." && $1 < 50000 * 1024' |
xargs -r0 mv -vt /destination/directory --
To do something similar portably (ignoring toybox du
¹), one could do:
for dir in .* *; do
[ "$dir" = . ] || [ "$dir" = .. ] && continue
[ -d "$dir" ] || continue
[ -L "$dir" ] && continue
du_output=$(
LC_ALL=C POSIXLY_CORRECT=1 BLOCK_SIZE=512 BLOCKSIZE=512 DU_BLOCK_SIZE=512 du -s -- "$dir"
) || continue # don't take any decision if du fails for that dir
LC_ALL=C awk -- '
BEGIN {$0 = ARGV[1]; exit !($1 * 512 < 50000 * 1024)}
' "$du_output" || continue
mv -- "$dir" /destination/directory
done
(untested).
Compared to the first approach that runs 4 commands in total (more if mv
needs to be run several times if there's a large number of directories), that one runs up to 7 commands per file so is going to be a lot less efficient.
¹ And some other implementations targetting Linux primarily such as busybox and toybox have chosen to align with GNU du
instead of historical practice and the standard in that regard. The worst is toybox' (as found on Android for instance) which honours none of POSIXLY_CORRECT
, BLOCKSIZE
, BLOCK_SIZE
nor DU_BLOCK_SIZE
and has -b
(byte), -k
(kibibyte, default), -K
(512), -m
(mebibyte) options instead, which means it's impossible to invoke du
in a portable fashion if that implementation needs to be considered.
² though increases the risk of integer overflow. 512 is generally the minimum space allocation unit, so makes sense as a default unit for disk usage.
"$dir"
– Chris Davies Oct 09 '23 at 12:47du -bd 1|while read a b ;do if [[ $a < 50000 ]]; then echo $a;echo $b;fi;done
– K-attila- Oct 09 '23 at 16:01