13

What command can I use to create zips with a file number limit? I have a folder (no subfolders) of, say, 5000 files, so I would want a command that could divide that number and create 10 individual zip archives, each consisting of no more than 500 files.

I also don't want the resulting 10 zip files to be connected with each other, so that I can open them individually and won't need to open all 10 at the same time.

don_crissti
  • 82,805
whitewings
  • 2,457

3 Answers3

13

You can use GNU parallel to do that as it can limit the number of elements to a job as well as provide a job number (for a unique zip archive name):

$ touch $(seq 20)
$ find . ! -name "*.zip" -type f -print0 | parallel -0 -N 5 zip arch{#} {}
  adding: 1 (stored 0%)
  adding: 10 (stored 0%)
  adding: 11 (stored 0%)
  adding: 12 (stored 0%)
  adding: 13 (stored 0%)
  adding: 14 (stored 0%)
  adding: 15 (stored 0%)
  adding: 16 (stored 0%)
  adding: 17 (stored 0%)
  adding: 18 (stored 0%)
  adding: 19 (stored 0%)
  adding: 2 (stored 0%)
  adding: 20 (stored 0%)
  adding: 3 (stored 0%)
  adding: 4 (stored 0%)
  adding: 5 (stored 0%)
  adding: 6 (stored 0%)
  adding: 7 (stored 0%)
  adding: 8 (stored 0%)
  adding: 9 (stored 0%)
$ ls
1   11  13  15  17  19  20  4  6  8  arch1.zip  arch3.zip
10  12  14  16  18  2   3   5  7  9  arch2.zip  arch4.zip

The option -N 5 limits the number of files to 5 per archive and is presented to zip in place of {}

The {#} (verbatim, not to be replaced by you during the invocation), is replaced by the job number, resulting in arch1.zip, arch2.zip etc.

The -print0 option to find and -0 option to parallel in tandem make sure that filenames with special characters are correctly handled.

Anthon
  • 79,293
  • I got this error: http://i.imgur.com/JoyPrfY.png From this command: find * ! -name "*.zip" -type f -print0 | parallel -0 -N 500 zip arch{13} {} – whitewings Nov 09 '14 at 16:07
  • @user8547 that is not GNU parallel, but the parallel included in moreutils, you best compile and install from source to get the latest security patches. http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2 – Anthon Nov 09 '14 at 16:15
  • @Anthon Is this the right program: http://i.imgur.com/iVv9uNq.png – whitewings Nov 09 '14 at 16:17
  • @user8547 No, either follow the link in my previous comment (http://ftp.gnu.org/gnu/parallel/parallel-latest.tar.bz2) or follow the guidelines in the link to the page I provided at the top of my answer. – Anthon Nov 09 '14 at 16:19
  • 2
    @user8547 no, just run sudo apt-get install parallel. – terdon Nov 09 '14 at 16:20
  • @Anthon It worked, but the resulting zip files have a jpg extension. When I rename the extension to zip they extract successfully. – whitewings Nov 09 '14 at 16:25
  • @user8547 I only get .jpg name ZIP files if I do: find * ! -name "*.zip" -type f -print0 | parallel -0 -N 5 zip {#}.jpg {}, i.e. if I provide an extension of `.jpg to the first argument (== name of zip file) – Anthon Nov 09 '14 at 16:29
  • @Anthon http://i.imgur.com/AKFpViq.png And the command I used: find * ! -name "*.zip" -type f -print0 | parallel -0 -N 500 zip arch{13} {} – whitewings Nov 09 '14 at 16:30
  • 2
    @user8547 why arch{13}? You really need to use the # character. What shell are you using? – Anthon Nov 09 '14 at 16:32
  • @Anthon Ah I see now. My mistake. I thought I had to input the number of zip files I wanted in place of the # sign. I just tried it again and it worked perfectly. It zipped 6273 files in 20 seconds. – whitewings Nov 09 '14 at 16:35
  • 2
    @user8547 No that is the way to tell parallel to put the job number there, glad it worked out. – Anthon Nov 09 '14 at 16:37
2

The accepted answer worked perfectly fine for me. :) BUT, in case you don't have access to parallel (who knows why), here's an alternative I had come up with before:

find . ! -name '*.zip' -type f | xargs -n 500 | awk '{system("zip myarch"NR".zip "$0)}'

Which will create myarch1.zip, myarch2.zip, myarch3.zip, etc You might want to use the -0 trick Anthon suggested, if you have weird filenames.

msb
  • 2,654
1

A shell-only alternative: process batches of COUNT files via "${@:START:COUNT}" (range of positional parameters) and shift COUNT while incrementing a counter c to name the archives:

set -- *
c=1
while (($#)); do
  if [ $# -ge COUNT ]; then
    zip ${c}.zip "${@:1:COUNT}"
    c=$((c+1))
    shift COUNT
  else
    zip ${c}.zip "${@}"
    shift $#
  fi
done
don_crissti
  • 82,805