You could write a script that looks at the sizes of the files and distributes them into bins, taking care not to exceed the maximum size.
The optimal solution may not be simple, but a greedy algorithm should do.
A minor problem would be taking into account the bookkeeping space taken by tar
in addition to the file contents. (Also, how to deal with directories and special files?)
A bigger problem will appear if you want to compress the archives. Since the usual idiom is to put the files together with tar
and compress the tar file with a separate utility, it's not that simple to split the resulting archive along file boundaries. You'd pretty much need to know the compressed sizes of the files in advance. If you compress the files before tar
ing them together, you know the sizes, but lose the space advantages of compressing all files in one go.
Actually, I made a simple awk
script to do just that at some point. Code below, use with
find dir/ -printf "%s\t%p\n" | sort -n | awk -vmax=$maxsizeinbytes -f pack.awk
(Output goes to bins.list.NNN
. No warranty, won't work with filenames containing whitespace, probably other bugs too etc.)
#!/usr/bin/awk
# pack.awk
{
if ($1 > max) {
printf "too big (%d, max %d): ", $1, max, $2 > "/dev/stderr";
exit 1;
}
for (x in bins) {
if (free[x] >= $1) {
bins[x] = bins[x] "\n" $2;
count[x]++; free[x] -= $1;
next
}
};
bins[++i] = $2; free[i] = max - $1; count[i] = 1;
}
END {
for (i in bins) {
printf "bin %d: entries: %d size: %d \n", i, count[i], max - free[i];
print bins[i] > "bins.list." i
}
}