3

I received a PDF in an email and I want to crop and rotate it because it has two pages per sheet. Trying the solutions in Split pages in pdf I had problems with "AssertionError" in pyPDF and "Warning: stream operator not terminated by valid EOL." in ImageMagick. pdftk seems to be stuck in an endless loop and never finishes processing the file.

Here's the pyPDF error:

Traceback (most recent call last):
  File "./un2up.py", line 48, in <module>
    split_pages(sys.argv[1],sys.argv[2])
  File "./un2up.py", line 14, in split_pages
    for i in range(input.getNumPages()):
  File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 431, in getNumPages
    self._flatten()
  File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 596, in _flatten
    catalog = self.trailer["/Root"].getObject()
  File "/usr/lib64/python2.7/site-packages/pyPdf/generic.py", line 480, in __getitem__
    return dict.__getitem__(self, key).getObject()
  File "/usr/lib64/python2.7/site-packages/pyPdf/generic.py", line 165, in getObject
    return self.pdf.getObject(self).getObject()
  File "/usr/lib64/python2.7/site-packages/pyPdf/pdf.py", line 647, in getObject
    assert idnum == indirectReference.idnum
AssertionError

I tried opening it in Adobe Reader and saving a copy, but the file ended up the same.

The file opens fine for visualization on evince, Adobe Reader and Google Drive.

Any idea how to fix the file so it can be read by pyPdf?

  • Perhaps the file was damaged/truncated because of some problems with mails. What program produced this pdf? In the case the pdf is not private: For bug-hunting it would help if you could publish it. – jofel Aug 22 '13 at 16:31
  • The PDF came form a professor who sent scanned book pages to the class, so I don't believe the PDF is publishable, but another file, sent by the same person in the same email.

    Transfer corruption occured to me, but the file opens fine in all PDF viewers tested. That does not rule out corruption, but it would be REALLY bad luck to have the only bits that would make pyPDF go crazy flipped.

    – Elton Carvalho Aug 22 '13 at 22:06

1 Answers1

0

Use pypdf2 with pdf mode strict = false

Joseph R.
  • 39,549
Jirka
  • 11
  • I can try that, but I'm really curious why this was downvoted. – Elton Carvalho Aug 22 '13 at 22:08
  • Indeed, I can comment that using https://github.com/mstamy2/PyPDF2/tree/master/PyPDF2 with strict=False solves the described problem. Given that pyPDF is not supported since 2010, upgrading is probably a good idea in any case. Not sure why poor Jirka was ever downvoted. – Klaas van Schelven Apr 08 '14 at 11:15