For scripting I need to get the page dimensions of a PDF file (in mm).
pdfinfo
just prints it in 'pts', e.g.:
Page size: 624 x 312 pts
What should I use?
Or what unit is 'pts' anyway - in case I want to convert them ...
For scripting I need to get the page dimensions of a PDF file (in mm).
pdfinfo
just prints it in 'pts', e.g.:
Page size: 624 x 312 pts
What should I use?
Or what unit is 'pts' anyway - in case I want to convert them ...
The 'pts' unit used by pdfinfo
denotes a PostScript point. A PostScript point is defined in terms of an inch and a resolution of 72 dots per inch:
In the late 1980s to the 1990s, the traditional point was supplanted by the desktop publishing point (also called the PostScript point), which was defined as 72 points to the inch (1 point = 1⁄72 inches = 25.4⁄72 mm = 0.352¯7 mm [ ≙ 0.3528 mm] ).
The manual to gv
contains a list of common paper formats specified in PostScript points.
pdfinfo
sometimes gives me the paper format (like Page size: 595.28 x 841.89 pts (A4)
) — I wonder if it does that for a list of page sizes it knows about?
– njsg
May 27 '12 at 20:45
pdfinfo foo.pdf -f 3 -l 3 | grep Page
– Digger
Jun 15 '20 at 01:53
pdfinfo
's Page rot
contains 90
, the Page size
values need to be exchanged. With identify
this isn't needed and in addition it returns the page size of all pages separately.
– mgutt
Apr 24 '23 at 18:56
Not the easiest way, but given imagemagick
and units
you could also use
$ identify -verbose some.pdf | grep "Print size"
Print size: 8.26389x11.6944
to find the page size in inches (this may yield several results if the PDF uses different dimensions) and then convert the numbers like this:
$ units -t '8.26389 inch' 'mm'
209.90281
Meaning that 8.26 inches are 209.9 mm (I used an A4 PDF for this).
Came across the same problem and came to the following solution. I didn't get into the documentation of how pdf files are constructed I just compared two empty pdf files with different page sizes.
It looks like pdfs have all kinds of attributes embedded between "<<" and ">>". I found that the page size info is there in plain text and can be found with a simple regex search.
This may or may not be true to all pdfs but it worked on all I could find from different sources.
The relevant part can look like any of these for a size A4 page:
/MediaBox [0 0 595 842]
/MediaBox[0 0 595 842]
/MediaBox[ 0 0 595.32 841.92]
It means [0 0 width height] so here is my super lame but working solution to extract this:
cat test.pdf | egrep -ao "/MediaBox ?\[ ?[0-9]+ [0-9]+ [0-9]+(\.[0-9]+)? [0-9]+(\.[0-9]+)?\]" | head -1
Just change test.pdf to your file.
I used maxchlepzigs answer to calculate the mm directly:
$ pdfinfo test.pdf | grep "Page size" | grep -Eo '[-+]?[0-9]*\.?[0-9]+' | awk -v x=0.3528 '{print $1*x}'
this also works with Alex Knaufs answer but identify takes much longer than pdfinfo and requires imagemagick, the upside though is that you can use this for multiple files (ie by cd'ing into a directory and using *.pdf
):
$ identify -verbose some.pdf | grep "Print size" | grep -Eo '[-+]?[0-9]*\.?[0-9]+' | awk -v x=25.4 '{print $1*x}'
The second grep
command gets the two point/inch values. I'm fairly sure you can skip the grep regex and do it directly with awk but i couldn't figure it out.
Unfortunately pdfinfo
gives the size of the first page only. We can use mutool
to get sizes from all (or some chosen) pages. Then use awk
to show these page sizes in millimeters.
mutool info -M file.pdf \
| awk '/\[ [ .[:digit:]]+ \]/ { printf "Page %02d: %9s x %-9s\n", $1, $8*25.4/72 "mm", $9*25.4/72 "mm" }'
Page 01: 841mm x 1189mm
Page 02: 594mm x 841mm
Page 03: 420mm x 594mm
Page 04: 297mm x 420mm
Page 05: 210mm x 297mm
Page 06: 148mm x 210mm
Page 07: 105mm x 148mm
Page 08: 1000mm x 1414mm
Page 09: 707mm x 1000mm
Page 10: 500mm x 707mm
Page 11: 353mm x 500mm
Page 12: 250mm x 353mm
Page 13: 176mm x 250mm
Page 14: 125mm x 176mm
Page 15: 184.15mm x 266.7mm
Page 16: 215.9mm x 355.6mm
Page 17: 215.9mm x 279.4mm
mutool
returns page-sizes in pts
, defined as 1⁄72 of an international inch. Note that 1 in = 25.4 mm
.
file.pdf
was created using mutool
and pdfjam
with the following bash script
#!/bin/bash
#Creating pdf file with an empty A4 (595 x 842 pts) page.
mutool create -o empty.pdf /dev/null
#Using empty.pdf as template for creating 17 diferents page sizes
for PAPERSIZE in {letter,legal,executive,{a,b}{0..6}}paper; do
pdfjam -q --paper "${PAPERSIZE}" -o "${PAPERSIZE}.pdf" empty.pdf
done
#Merging in file.pdf
mutool merge -o file.pdf *paper.pdf
[ 0 0 2383.937 3370.394 ]
?
– maxschlepzig
Dec 18 '21 at 10:24
pdfinfo
can get you page sizes of all the pages: pdfinfo -f 1 -l $(pdfinfo "$file" | awk '/Pages/ {print $2}') "$file" | grep "Page.*size"
– Łukasz Rajchel
Jun 05 '23 at 12:41
mutool
shows each page where the mediabox size changes. This is not a bug, but rather a concise way to provide information on all pages.
– lezambranof
Mar 16 '24 at 15:38
awk
command isn't correct in this case. For example with these page dimensions: p1 100x100
, p2 100x100
, p3 200x200
, the command will return Page 01 100x100
, Page 02 200x200
which is obviously not correct
– Milan Simek
Mar 17 '24 at 17:20