49

For scripting I need to get the page dimensions of a PDF file (in mm).

pdfinfo just prints it in 'pts', e.g.:

Page size:      624 x 312 pts

What should I use?

Or what unit is 'pts' anyway - in case I want to convert them ...

maxschlepzig
  • 57,532

5 Answers5

49

The 'pts' unit used by pdfinfo denotes a PostScript point. A PostScript point is defined in terms of an inch and a resolution of 72 dots per inch:

In the late 1980s to the 1990s, the traditional point was supplanted by the desktop publishing point (also called the PostScript point), which was defined as 72 points to the inch (1 point = 1⁄72 inches = 25.4⁄72 mm = 0.352¯7 mm [ ≙ 0.3528 mm] ).

The manual to gv contains a list of common paper formats specified in PostScript points.

maxschlepzig
  • 57,532
21

Not the easiest way, but given imagemagick and units you could also use

$ identify -verbose some.pdf | grep "Print size" 
Print size: 8.26389x11.6944

to find the page size in inches (this may yield several results if the PDF uses different dimensions) and then convert the numbers like this:

$ units -t '8.26389 inch' 'mm'
  209.90281

Meaning that 8.26 inches are 209.9 mm (I used an A4 PDF for this).

Axel Knauf
  • 1,074
  • 7
  • 5
7

Came across the same problem and came to the following solution. I didn't get into the documentation of how pdf files are constructed I just compared two empty pdf files with different page sizes.

It looks like pdfs have all kinds of attributes embedded between "<<" and ">>". I found that the page size info is there in plain text and can be found with a simple regex search.

This may or may not be true to all pdfs but it worked on all I could find from different sources.

The relevant part can look like any of these for a size A4 page:

/MediaBox [0 0 595 842]
/MediaBox[0 0 595 842]
/MediaBox[ 0 0 595.32 841.92]

It means [0 0 width height] so here is my super lame but working solution to extract this:

cat test.pdf | egrep -ao "/MediaBox ?\[ ?[0-9]+ [0-9]+ [0-9]+(\.[0-9]+)? [0-9]+(\.[0-9]+)?\]" | head -1

Just change test.pdf to your file.

3

I used maxchlepzigs answer to calculate the mm directly:

$ pdfinfo test.pdf | grep "Page size" | grep -Eo '[-+]?[0-9]*\.?[0-9]+' | awk -v x=0.3528 '{print $1*x}'

this also works with Alex Knaufs answer but identify takes much longer than pdfinfo and requires imagemagick, the upside though is that you can use this for multiple files (ie by cd'ing into a directory and using *.pdf):

$ identify -verbose some.pdf | grep "Print size" | grep -Eo '[-+]?[0-9]*\.?[0-9]+' | awk -v x=25.4 '{print $1*x}'

The second grep command gets the two point/inch values. I'm fairly sure you can skip the grep regex and do it directly with awk but i couldn't figure it out.

defuzed
  • 131
1

Unfortunately pdfinfo gives the size of the first page only. We can use mutool to get sizes from all (or some chosen) pages. Then use awk to show these page sizes in millimeters.

mutool info -M file.pdf \
 | awk '/\[ [ .[:digit:]]+ \]/ { printf "Page %02d: %9s x %-9s\n",  $1, $8*25.4/72 "mm", $9*25.4/72 "mm" }'
Page 01:     841mm x 1189mm   
Page 02:     594mm x 841mm    
Page 03:     420mm x 594mm    
Page 04:     297mm x 420mm    
Page 05:     210mm x 297mm    
Page 06:     148mm x 210mm    
Page 07:     105mm x 148mm    
Page 08:    1000mm x 1414mm   
Page 09:     707mm x 1000mm   
Page 10:     500mm x 707mm    
Page 11:     353mm x 500mm    
Page 12:     250mm x 353mm    
Page 13:     176mm x 250mm    
Page 14:     125mm x 176mm    
Page 15:  184.15mm x 266.7mm  
Page 16:   215.9mm x 355.6mm  
Page 17:   215.9mm x 279.4mm

mutool returns page-sizes in pts, defined as 1⁄72 of an international inch. Note that 1 in = 25.4 mm.

Creating PDF file with pages of different sizes

file.pdf was created using mutool and pdfjam with the following bash script

#!/bin/bash

#Creating pdf file with an empty A4 (595 x 842 pts) page. mutool create -o empty.pdf /dev/null

#Using empty.pdf as template for creating 17 diferents page sizes for PAPERSIZE in {letter,legal,executive,{a,b}{0..6}}paper; do pdfjam -q --paper "${PAPERSIZE}" -o "${PAPERSIZE}.pdf" empty.pdf done

#Merging in file.pdf mutool merge -o file.pdf *paper.pdf

  • In what units are the dimensions specified in a list such as [ 0 0 2383.937 3370.394 ]? – maxschlepzig Dec 18 '21 at 10:24
  • pdfinfo can get you page sizes of all the pages: pdfinfo -f 1 -l $(pdfinfo "$file" | awk '/Pages/ {print $2}') "$file" | grep "Page.*size" – Łukasz Rajchel Jun 05 '23 at 12:41
  • This doesn't work if multiple pages in the PDF use the same mediabox. For example one PDF I tested has 5 pages, but only 3 mediaboxes are returned by mutool – Milan Simek Mar 15 '24 at 23:10
  • @MilanSimek Yes, mutool shows each page where the mediabox size changes. This is not a bug, but rather a concise way to provide information on all pages. – lezambranof Mar 16 '24 at 15:38
  • @lezambranof It's definitely not a bug but a feature, I agree :) But it does mean the info provided by your awk command isn't correct in this case. For example with these page dimensions: p1 100x100, p2 100x100, p3 200x200, the command will return Page 01 100x100, Page 02 200x200 which is obviously not correct – Milan Simek Mar 17 '24 at 17:20