4

I have used qpdf with

qpdf --qdf --object-streams=disable orig.pdf expanded.pdf

As reference from https://unix.stackexchange.com/a/109177/306249

I can see all objects in text editor. But I cannot view the text in the pdf.

I have test with "Hello world" text in pdf. But I can't see this text after decompress.

How to view the text with objects?

1 Answers1

3

Even after expansion of all objects, texts (strings and single characters) do not need to be represented in ASCII, they may be hex-encoded.

To find the text, proceed as follows:

  1. In your expanded PDF, look for all keys named /Contents. It may look like:

    /Contents 8 0 R
    

    This tells you that the contents of the respective page is in object number 8.

  2. Go to object number 8. This can be found by searching for the string '8 0 obj'

  3. In the following lines, bracketed by the lines stream ... endstream, if you see at the end of a line either one of...

    ... TJ, Tj, ' or "

    you'll have a text showing operator at work.

  4. The preceding line holds the text, but it may look like:

    [(H)0.0976563(e)0.0976563(l)-599.902(l)0.0976563(o)0.0976563(W)0.0976563(o)-599.902(r)0.0976563(l)0.0976563(d)0.0976563(!)]TJ
    

    Hey, you were lucky! Can you decipher the "Hello World!" string here? The intermediate numbers are only to control the placement of the individual characters....

  5. ....and now I'll stop to teach PDF. You can read all the details in the official PDF format specification :-)

    Just one more hint: if you search for
    my other PDF-related answers on StackOverflow,

    you may discover quite a few examples which go into more details about how to read PDF code.

Kurt Pfeifle
  • 1,461