42

I have this list of pdf files in a directory:

c0.pdf   c12.pdf  c15.pdf  c18.pdf  c20.pdf  c4.pdf  c7.pdf
c10.pdf  c13.pdf  c16.pdf  c19.pdf  c2.pdf   c5.pdf  c8.pdf
c11.pdf  c14.pdf  c17.pdf  c1.pdf   c3.pdf   c6.pdf  c9.pdf

I want to concatenate these using ghostscript in numerical order (similar to this):

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf *.pdf

But the shell expansion order does not reproduce the natural order of the numbers but the alphabetical order:

$ for f in *.pdf; do echo $f; done
c0.pdf
c10.pdf
c11.pdf
c12.pdf
c13.pdf
c14.pdf
c15.pdf
c16.pdf
c17.pdf
c18.pdf
c19.pdf
c1.pdf
c20.pdf
c2.pdf
c3.pdf
c4.pdf
c5.pdf
c6.pdf
c7.pdf
c8.pdf
c9.pdf

How can I achieve the desired order in the expansion (if possible without manually adding 0-padding to the numbers in the file names)?

I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case.

moooeeeep
  • 1,313

5 Answers5

41

Once more, zsh's glob qualifiers come to the rescue.

echo *.pdf(n)
22

Depending on your environment you can use ls -v with GNU coreutils, e.g.:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls -v)

Or if you are on recent versions of FreeBSD or OpenBSD:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls | sort -V)
Thor
  • 17,182
  • 1
    ls -v will natural sort of (version) numbers within text so that can be used as well... – Sundeep Oct 03 '16 at 12:21
  • @Sundeep: Indeed, but this seems to be a GNU coreutils only solution. – Thor Oct 03 '16 at 14:11
  • yeah, seems like GNU specific - http://pubs.opengroup.org/onlinepubs/9699919799/ – Sundeep Oct 03 '16 at 14:21
  • 1
    @Sundeep: The -V feature of sort is not specified by POSIX either. However, it seems to have spread farther, for example both FreeBSD and OpenBSD sort support it. – Thor Oct 03 '16 at 14:25
  • oh ok, can you add these details to answer as well? I came across this answer while searching for similar problem (glob in numerical order) and seeing ls used I checked out if it had option by itself instead of piping to sort :) – Sundeep Oct 03 '16 at 14:30
  • NEVER parse ls! Use stat -c "%n" * instead. – Peter Sep 05 '17 at 08:33
  • @Peter: In general I agree, but there are exceptions – Thor Sep 06 '17 at 11:50
  • 1
    and also I would change my comment above since stat with %n is not really best either due to whitespace being allowed in filenames...use printf '%s\0', and things like xargs -0 or while read... I wrote an answer that has that. – Peter Sep 06 '17 at 15:22
18

If all the files in question have the same prefix (i.e., the text before the number; c in this case), you can use

gs  …args…  c?.pdf c??.pdf

c?.pdf expands to c0.pdf c1.pdfc9.pdfc??.pdf expands to c10.pdf c11.pdfc20.pdf (and up to c99.pdf, as applicable).  While each command-line word containing pathname expansion character(s) is expanded to a list of filenames sorted (collated) in accordance with the LC_COLLATE variable, the lists resulting from the expansion of adjacent wildcards (globs) are not merged; they are simply concatenated.  (I seem to recall that the shell man page once stated this explicitly, but I can’t find it now.)

Of course if the files can go up to c999.pdf, you should use c?.pdf c??.pdf c???.pdf.  Admittedly, this can get tedious if you have a lot of digits.  You can abbreviate it a little; for example, for (up to) five digits, you can use c?{,?{,?{,?{,?}}}}.pdf.  If your list of filenames is sparse (e.g., there’s a c0.pdf and a c12345.pdf, but not necessarily every number in between), you should probably set the nullglob option.  Otherwise, if (for example) you have no files with two-digit numbers, you would get a literal c??.pdf argument passed to your program.

If you have multiple prefixes (e.g., a<number>.pdf, b<number>.pdf , and c<number>.pdf , with numbers of one or two digits), you can use the obvious, brute force approach:

a?.pdf a??.pdf b?.pdf b??.pdf c?.pdf c??.pdf

or collapse it to {a,b,c}?{,?}.pdf.

  • 2
    This is the best answer because it's beyond any claims of sketchy use of ls, stat, or anything else; and also works in bash as requested. – Kyle Aug 13 '19 at 19:36
5

If there are no gaps, the following could prove helpful (albeit sketchy and not robust concerning edge-cases and generality) -- just to get an idea:

FILES="c0.pdf"
for i in $(seq 1 20); do FILES="${FILES} c${i}.pdf"; done
gs [...args...] $FILES

If there may be gaps, some [ -f c${i}.pdf ] check could be added.

Edit also see this answer, according to which you could (using Bash) use

gs [..args..] c{1..20}.pdf
sr_
  • 15,384
2

Just quoting and fixing Thor's answer... NEVER parse ls!

You can use sort -V (a non-POSIX extension to sort):

printf '%s\0' ./* | sort -zV \
    | xargs -0 gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH \
        -sDEVICE=pdfwrite -sOutputFile=out.pdf

(for some commands, apparently for gs is such a command, you need "./" instead of ""... if one doesn't work, try the other)

Peter
  • 1,247
  • 1
    The don't parse ls output is because ls displays the file names newline-separated while newline is as valid as any in a file name, but here you're doing the same thing with stat but adding several other issues (like problems with filenames starting with -, problem if there are too many files, stat being a non-portable command). And because you used the split+glob operator without adjusting IFS or disabling globs, you'll still have issues with filenames with space or tab or wildcard characters. – Stéphane Chazelas Sep 05 '17 at 08:44
  • To use GNU sort -V reliably, you'd need ${(z)"$(printf '%s\0' * | sort -zV)"} in zsh (though zsh has (n) for numerical sort already) or readarray -td '' files < <(printf '%s\0' * | sort -zV) in bash4.4+. – Stéphane Chazelas Sep 05 '17 at 08:47
  • @StéphaneChazelas thanks, and you are right that newline can be a concern, but that isn't the only reason not to parse ls. And yeah I was lazy and didn't add -- either. But I should have used printf...I'll change that. – Peter Sep 05 '17 at 10:06
  • for ls alone (that is without -l), what are those other concerns? Note that -- wouldn't help for a file called -. – Stéphane Chazelas Sep 05 '17 at 10:08
  • @StéphaneChazelas there are other differences between versions... like some print "total 0" on there, and the newest ls versions even stick quotes around things where you don't want them... touch \"test\"; ls -1 for example shows '"test"' on my ls. It's simply not meant to be parsed... it's a user interface, not a scripting command. – Peter Sep 05 '17 at 10:11
  • the total x is only for ls -l/n.... The quoting is only for output to a terminal (not a pipe like here). For a POSIX compliant ls, the only problem would be the newlines. But -v is not a POSIX option anyway. Now, I've just realised that busybox ls now also supports ls -v and busybox ls is one of those implementations that are not POSIX compliant as it does some mangling even when stdout is not a terminal. – Stéphane Chazelas Sep 05 '17 at 10:17
  • * -> ./* to avoid problems with some file names with gs. – Stéphane Chazelas Sep 05 '17 at 10:19
  • Note also the OP's comment "I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case." – Jeff Schaller Sep 05 '17 at 12:59