How to zip a large number of sub-directories, and have each zip file contain N of the sub-directories

Question

I've read this. But I am trying to achieve something slightly different.

I have a directory with many sub-directories. I want to create zip files with those sub-directories, but instead of creating a separate zip file for each sub-directory, I would like to group them -- lets say 10 sub-directories per zip file.

EDIT: all sub-directories are in one-depth!

Many thanks.

Welcome on U&L! Are the subdirectories nested? Or is your directory tree exactly two levels deep? How should the 10 subdirectories be grouped? E.g. based on alphabetical sorting? — fra-san, Mar 11 '21 at 17:25
Help us helping you: Compose a minimal, complete example, otherwise you will get multiple answers each with a different interpretation of your question. — Quasímodo, Mar 11 '21 at 17:30

score 1 · Answer 1 · edited Mar 12 '21 at 23:02

You can call zip -r zipfile files_or_dirs multiple times for the same zipfile and do this in a loop.

The script below will recursively add 10 subdirectories of the current directory (with all files and subdirectories in them) to a ZIP file and then switch to the next ZIP file. It will ignore files in the current directory. The size of the ZIP files depends on the data in the subdirectories. The last ZIP file may contain less than 10 subdirectories.

Since the question refers to an answer that uses for i in */; do zip -r "${i%/}.zip" "$i"; done and states the only additional requirement that e.g. 10 subdirectories should be stored in a ZIP file instead of one ZIP file per subdirectory, I assume that it is not required to archive directories that start with a dot.

#!/bin/bash
zipnum=0
i=0
for dir in ./*/
do
    zip -r archive$zipnum.zip "$dir" # recursively add this dir to the archive
    ((i++))            # count directories
    if [[ i -ge 10 ]]  # maximum number of directories per ZIP file
    then
        i=0            # reset directory counter
        ((zipnum++))   # next ZIP file number
    fi
done

Note that the assignment of directories to ZIP files may change if you later change the set of subdirectories, so you may get different (or unexpected) results on repeated execution of the script.

As the script counts 0, 1, ..., 9, 10, 11, ... you may get ZIP files with different numbers of digits which may result in unexpected (lexicographical) sorting like

archive0.zip
archive1.zip
archive10.zip
archive11.zip
archive2.zip
archive3.zip
...

Pourko · Accepted Answer · 2021-03-15T04:21:17.643

So, I am assuming that all the sub-directories that you want to group are at exactly one depth level under your parent directory. We'll let zip recurse into the sub-sub-directories.

EDIT: Thanks to people's suggestions, this new version now works with all kinds of file names, including names containing spaces, new lines, and special chars. An excellent writeup on the matter can be found here: https://unix.stackexchange.com/a/321757/439686

#!/bin/bash
export rootdir=${1:-/your/parent/directory}
export N=10 # group size
export stamp=$(date +%s)
find "$rootdir" -type d -mindepth 1 -maxdepth 1  -exec bash -c '
   count=0 # group number
   while [ $# -gt 0 ] ;do
     ((count++))
     zip -r "$rootdir/group.${stamp}.${count}.zip" "${@:1:N}"
     shift $N || set --
   done
' "" {} +

Result:

group.1615512971.1.zip
group.1615512971.2.zip
group.1615512971.3.zip
group.1615512971.4.zip
...

And here is a slightly different version, which also loops through the positional parameters, but without spawning a subshell. (This one performs faster than the previous version)

#!/bin/bash
rootdir=/your/parent/directory
N=10 # group size
stamp=$(date +%s)
readarray -td '' ARRAY < <(find "$rootdir" -type d -mindepth 1 -maxdepth 1 -print0)
set -- "${ARRAY[@]}"
count=0
while [ $# -gt 0 ] ;do
  ((count++))
  zip -r "$rootdir/group.${stamp}.${count}.zip" "${@:1:N}"
  shift $N || set --
done

EDIT #2: Parallelism and memory usage

After reading this post here: https://unix.stackexchange.com/a/321765/439686 it occured to me that my previous two versions can run into some serious troubles if we are dealing with a huge number of directories. Beside putting some serious strain on the memory, they also are inefficient, as they are waiting on find to find the whole list of directories before we even start the first zip command. It would be much nicer if we ran things in parralell -- through pipes -- and then it won't matter how many files there are. That leaves us with the only possible correct solution -- do it with find ... -print0 | xargs -0 command. Why xargs? Because it can start commands with N arguments at a time, instead of waiting for the whole list, and also because xargs can deal with the zero-delimited strings that -print0 will be piping to it. And we absolutely must use zero as the delimiter, because filenames are allowed to have any other characters, including newlines. As an added bonus, with xargs we can even start multiple processes at the same time, to better utilize a multicore system. So, here it is:

#!/bin/bash
rootdir=${1:-/your/parent/directory}
N=10 # group size
mktemp --version >/dev/null || exit 1
stamp=$(date +%Y%m%d%H%M)
cores=$(nproc) || cores=1
export rootdir N stamp cores
find "$rootdir" -type d -mindepth 1 -maxdepth 1 -print0 

  | xargs -r0  --max-args=$N  --max-procs=$cores  bash -c '
  zip -r "$(mktemp -u -p "$rootdir" group.$stamp.XXXXXX.zip)" "$@" ' ""

Result:

group.202103140805.7H1Don.zip
group.202103140805.akqmgX.zip
group.202103140805.fzBsUZ.zip
group.202103140805.iTfmj8.zip
...

How to zip a large number of sub-directories, and have each zip file contain N of the sub-directories

2 Answers2