So, I am assuming that all the sub-directories that you want to group are at exactly one depth level under your parent directory. We'll let zip
recurse into the sub-sub-directories.
EDIT: Thanks to people's suggestions, this new version now works with all kinds of file names, including names containing spaces, new lines, and special chars. An excellent writeup on the matter can be found here:
https://unix.stackexchange.com/a/321757/439686
#!/bin/bash
export rootdir=${1:-/your/parent/directory}
export N=10 # group size
export stamp=$(date +%s)
find "$rootdir" -type d -mindepth 1 -maxdepth 1 -exec bash -c '
count=0 # group number
while [ $# -gt 0 ] ;do
((count++))
zip -r "$rootdir/group.${stamp}.${count}.zip" "${@:1:N}"
shift $N || set --
done
' "" {} +
Result:
group.1615512971.1.zip
group.1615512971.2.zip
group.1615512971.3.zip
group.1615512971.4.zip
...
And here is a slightly different version, which also loops through the positional parameters, but without spawning a subshell. (This one performs faster than the previous version)
#!/bin/bash
rootdir=/your/parent/directory
N=10 # group size
stamp=$(date +%s)
readarray -td '' ARRAY < <(find "$rootdir" -type d -mindepth 1 -maxdepth 1 -print0)
set -- "${ARRAY[@]}"
count=0
while [ $# -gt 0 ] ;do
((count++))
zip -r "$rootdir/group.${stamp}.${count}.zip" "${@:1:N}"
shift $N || set --
done
EDIT #2: Parallelism and memory usage
After reading this post here: https://unix.stackexchange.com/a/321765/439686
it occured to me that my previous two versions can run into some serious troubles if we are dealing with a huge number of directories. Beside putting some serious strain on the memory, they also are inefficient, as they are waiting on find
to find the whole list of directories before we even start the first zip
command. It would be much nicer if we ran things in parralell -- through pipes -- and then it won't matter how many files there are. That leaves us with the only possible correct solution -- do it with find ... -print0 | xargs -0 command
. Why xargs
? Because it can start commands with N arguments at a time, instead of waiting for the whole list, and also because xargs
can deal with the zero-delimited strings that -print0
will be piping to it. And we absolutely must use zero as the delimiter, because filenames are allowed to have any other characters, including newlines. As an added bonus, with xargs
we can even start multiple processes at the same time, to better utilize a multicore system. So, here it is:
#!/bin/bash
rootdir=${1:-/your/parent/directory}
N=10 # group size
mktemp --version >/dev/null || exit 1
stamp=$(date +%Y%m%d%H%M)
cores=$(nproc) || cores=1
export rootdir N stamp cores
find "$rootdir" -type d -mindepth 1 -maxdepth 1 -print0
| xargs -r0 --max-args=$N --max-procs=$cores bash -c '
zip -r "$(mktemp -u -p "$rootdir" group.$stamp.XXXXXX.zip)" "$@" ' ""
Result:
group.202103140805.7H1Don.zip
group.202103140805.akqmgX.zip
group.202103140805.fzBsUZ.zip
group.202103140805.iTfmj8.zip
...