#!/usr/bin/perl
use strict;
use List::MoreUtils qw(natatime);
use Sort::Naturally;
# specify directory on command line, or default to .
my $dir = shift || '.';
# Find all the PDF files.
#
# NOTE: you could use perl's `Find::File` module instead of
# readdir() to do a recursive search like `find`.
opendir(DIR, $dir) || die "Can't open $dir: $!\n";
my @pdfs = nsort grep { /\.pdf$/i && -f "$dir/$_" } readdir(DIR);
closedir(DIR);
my $size=1000;
my $i=1;
my $iter = natatime $size, @pdfs;
while( my @tmp = $iter->() ){
my $tarfile="archive_" . sprintf('%02i',$i++) . ".tar.gz";
#print join(" ", ('tar','cfz',$tarfile, @tmp)),"\n";
system('echo','tar','cfz',$tarfile, @tmp);
}
This uses the natatime() ("n-at-a-time") function in perl's List::MoreUtils library module to iterate over the list of PDF files 1000 at a time.
It also uses the Sort::Naturally module to natural-sort the PDF filenames. Drop that (and the call to nsort on the my @pdfs = ... line) if you don't need or want that.
The tar filenames have 2-digit zero-padded numbers in them so that they sort correctly. Change it to 3 or more digits if you have enough PDF files to fill more than 99 tar archives.
The code, as written, is a dry-run. Delete the 'echo', from the system() function call to make it actually tar up the batches of PDF files.
For verbose output while running without echo, uncomment the print statement. BTW, it would be easy to make it print a timestamp, e.g. seconds since the epoch with the perl built-in time(), or nicely formatted with the Date::Format module. e.g:
print join(" ", (time(),'tar','cfz',$tarfile, @tmp)),"\n";
Save as, e.g., vibhu.pl, make it executable with chmod +x vibhu.pl. Here's a sample run (in a directory with only 10 ".pdf" files):
$ touch {1..10}.pdf
$ ./vibhu.pl
tar cfz archive_01.tar.gz 1.pdf 2.pdf 3.pdf 4.pdf 5.pdf 6.pdf 7.pdf 8.pdf 9.pdf 10.pdf
If you change $size=1000 to, e.g., $size=3, you can see that it is actually doing N at a time pdf files:
$ ./vibhu.pl
tar cfz archive_01.tar.gz 1.pdf 2.pdf 3.pdf
tar cfz archive_02.tar.gz 4.pdf 5.pdf 6.pdf
tar cfz archive_03.tar.gz 7.pdf 8.pdf 9.pdf
tar cfz archive_04.tar.gz 10.pdf
The List::MoreUtils and Sort::Naturally modules are available from CPAN. They may already be packaged for your distribution. e.g. on Debian:
sudo apt-get install liblist-moreutils-perl libsort-naturally-perl