#!/usr/bin/perl
use strict;
use List::MoreUtils qw(natatime);
use Sort::Naturally;
# specify directory on command line, or default to .
my $dir = shift || '.';
# Find all the PDF files.
#
# NOTE: you could use perl's `Find::File` module instead of
# readdir() to do a recursive search like `find`.
opendir(DIR, $dir) || die "Can't open $dir: $!\n";
my @pdfs = nsort grep { /\.pdf$/i && -f "$dir/$_" } readdir(DIR);
closedir(DIR);
my $size=1000;
my $i=1;
my $iter = natatime $size, @pdfs;
while( my @tmp = $iter->() ){
my $tarfile="archive_" . sprintf('%02i',$i++) . ".tar.gz";
#print join(" ", ('tar','cfz',$tarfile, @tmp)),"\n";
system('echo','tar','cfz',$tarfile, @tmp);
}
This uses the natatime()
("n-at-a-time") function in perl's List::MoreUtils
library module to iterate over the list of PDF files 1000 at a time.
It also uses the Sort::Naturally
module to natural-sort the PDF filenames. Drop that (and the call to nsort
on the my @pdfs = ...
line) if you don't need or want that.
The tar filenames have 2-digit zero-padded numbers in them so that they sort correctly. Change it to 3 or more digits if you have enough PDF files to fill more than 99 tar archives.
The code, as written, is a dry-run. Delete the 'echo',
from the system()
function call to make it actually tar up the batches of PDF files.
For verbose output while running without echo
, uncomment the print
statement. BTW, it would be easy to make it print a timestamp, e.g. seconds since the epoch with the perl built-in time()
, or nicely formatted with the Date::Format
module. e.g:
print join(" ", (time(),'tar','cfz',$tarfile, @tmp)),"\n";
Save as, e.g., vibhu.pl
, make it executable with chmod +x vibhu.pl
. Here's a sample run (in a directory with only 10 ".pdf" files):
$ touch {1..10}.pdf
$ ./vibhu.pl
tar cfz archive_01.tar.gz 1.pdf 2.pdf 3.pdf 4.pdf 5.pdf 6.pdf 7.pdf 8.pdf 9.pdf 10.pdf
If you change $size=1000
to, e.g., $size=3
, you can see that it is actually doing N at a time pdf files:
$ ./vibhu.pl
tar cfz archive_01.tar.gz 1.pdf 2.pdf 3.pdf
tar cfz archive_02.tar.gz 4.pdf 5.pdf 6.pdf
tar cfz archive_03.tar.gz 7.pdf 8.pdf 9.pdf
tar cfz archive_04.tar.gz 10.pdf
The List::MoreUtils and Sort::Naturally modules are available from CPAN. They may already be packaged for your distribution. e.g. on Debian:
sudo apt-get install liblist-moreutils-perl libsort-naturally-perl