19

I have several pdf files (chapter1.pdf, chapter2.pdf, etc.), each one being a chapter of a book. I now how to merge them into a single pdf (I use the command pdfunite from poppler), but since the output file is big, it's difficult to find a chapter without having them indexed in a table of contents. So how to create an embedded table of contents in which each merged chapter is an entry?

Note that I do not want to create a page in the output file which contains the list of chapters and their respective page numbers. I want the index/table of contents metadata of an pdf file, that can be browseable in any pdf reader's (or ebook device's) which supports such feature.

Seninha
  • 1,035

2 Answers2

16

Non-destructive version of @bu5hman's answer:

#!/bin/bash

out_file="combined.pdf" bookmarks_file="/tmp/bookmarks.txt" bookmarks_fmt="BookmarkBegin BookmarkTitle: %s BookmarkLevel: 1 BookmarkPageNumber: %d "

rm -f "$bookmarks_file" "$out_file"

declare -a files=(*.pdf) page_counter=1

Generate bookmarks file.

for f in "${files[@]}"; do title="${f%.*}" printf "$bookmarks_fmt" "$title" "$page_counter" >> "$bookmarks_file" num_pages="$(pdftk "$f" dump_data | grep NumberOfPages | awk '{print $2}')" page_counter=$((page_counter + num_pages)) done

Combine PDFs and embed the generated bookmarks file.

pdftk "${files[@]}" cat output - |
pdftk - update_info "$bookmarks_file" output "$out_file"

It works by:

  1. Generating bookmarks.txt.
  2. Merging PDFs into combined.pdf.
  3. Updating combined.pdf with bookmarks.txt.
  • How would one go about doing this recursively through several folders, and building the bookmarks from folder structure, i.e. sections/subsections depending on folder depth? – Cpt Reynolds Dec 27 '22 at 17:47
  • 1
    @CptReynolds For that, I would probably just write the "Generate bookmarks file" section in Python. – Mateen Ulhaq Jan 02 '23 at 22:41
6

A function I use all the time to do exactly this. Just make sure the pdfs sort properly in sequence in the expansion.

tp="/tmp/tmp.pdf"
td="/tmp/data"
for i in *.pdf; do
    echo "Bookmarking $i"
    printf "BookmarkBegin\nBookmarkTitle: %s\nBookmarkLevel: 1\nBookmarkPageNumber: 1\n" "${i%.*}"> "$td"
    pdftk "$i" update_info "$td" output "$tp"
    mv "$tp" "$i"
done
pdftk *.pdf cat output myBook.pdf
Chris Davies
  • 116,213
  • 16
  • 160
  • 287
bu5hman
  • 4,756