extract TOC from epub

Question

I'm trying to learn script and found this post Extract TOC from epub file, which give me part of solution that I need, but when I tested it, got an error error: Extra content at the end of the document.

A little bit of background: I have 2 epub files: 1.epub and 2.epub. I tested each one separately, it worked fine (got the TOC from each epub), but when I tried to test both files using do, got the above error.

I'm learning scripts, not sure if I made a mistake somewhere. Anyone can point what's my mistake?

ps: my script

#! /usr/bin/bash
EPUB_LIST="1.epub 2.epub"
for f in "$EPUB_LIST"
do
    echo "$f:"
    unzip -p "$f" OEBPS/toc.ncx |
        xml2 |
        sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=:  :p'
    echo
done

cas · Answer 1 · 2022-08-27T13:13:38.580

1

The way your script is written, the for loop only has one thing to iterate over, a single filename (which probably doesn't exist) called "1.epub 2.epub". That's not a list of two filenames, it's a single string.

EPUB_LIST should be an array. e.g.

#!/bin/bash
EPUB_LIST=(1.epub 2.epub)
for f in "${EPUB_LIST[@]}"; do
  echo $f:
  unzip -p "$f" OEBPS/toc.ncx |
    xml2 |
    sed -n -e 's:^/ncx/navMap/navPoint/navLabel/text=: :p'
  echo
done

edited Aug 27 '22 at 13:13

answered Aug 27 '22 at 13:05

cas

78,579

or you could just run the script in my answer that you linked to (i only just noticed what you linked to) and give it both filenames as arguments. The script already handles multiple files. – cas Aug 27 '22 at 13:12
thanks so much for your feedback, the reason I did not use your previous solutions is my list of files is over 10 (and I can learn new things with script), so I'm thinking in using ls command to retrieve it automatically, the solution I used is EPUB_LIST=$(ls my*.epub), when I tried to run it, got an error: error: Extra content at the end of the document. my guess is ls is putting EOL at the end of each file name, is there an easy way to remove it? Thanks for your help – michaelbr Aug 28 '22 at 08:15
Bad idea. See Why not parse ls (and what to do instead)? for reasons why. EPUB_LIST=(my*.epub) works better, and without the risks and problems of parsing the output of ls. Or you could use mapfile -d '' -t EPUB_LIST < <(find . -type f -name '*.epub' -print0) - this version is especially useful if you have a directory tree of .epub files to process...and find has a lot of options for refining exactly which files get selected. – cas Aug 28 '22 at 08:24
The error: Extra content at the end of the document message is coming fromxml2 - the most probable cause is that you tried to pipe a non-existent or empty file (i.e. the .epub didn't contain a file called OEBPS/toc.ncx). You'll want to examine the books where that happens and see if they have another TOC file (one of the other answers to the linked question has some solutions to that). There are several versions of epub files, and the TOC file isn't always named toc.ncx. – cas Aug 28 '22 at 08:27
thanks so much for your explanation, did not know that you could do (my*.epub) (learning something new every day). And thanks for the tips about TOC, will take another look at my epub files. – michaelbr Aug 28 '22 at 08:43

extract TOC from epub

1 Answers1