How to run pdftotext ... | grep on many documents?

Question

Code which works with a single document

pdftotext *.pdf - | grep therapy

You can use find as described in the thread How can I grep in PDF files? but I would like to understand why the above command is not working.

Differential code where pdfgrep may add some benefit but still early in development

pdftotext *.pdf - | pdfgrep therapy
#Wrong syntax so error
# Usage: pdfgrep [OPTION]... PATTERN FILE...
# Syntax Warning: Invalid Font Weight
# Syntax Warning: Invalid Font Weight

I would like to get then a fast way to move to the specific pdf page if there is a good match. However, I have not found any evidence that such a feature exists.

OS: Debian 8.5
Linux kernel: 4.6 backports
Hardware: Asus Zenbook UX303UA
Poppler-utils: pdftotext

Stephen Kitt · Accepted Answer · 2016-10-20T08:37:22.017

4

Just use pdfgrep directly:

pdfgrep -n therapy *.pdf

The -n option will display the page number of each match.

edited Oct 20 '16 at 08:37

answered Oct 20 '16 at 08:31

Stephen Kitt

434,908

score 1 · Answer 2 · answered Oct 20 '16 at 08:32

you could try this;

pdfgrep therapy *.pdf

or

find /tmp -name '*.pdf' -exec pdfgrep test {} +

eg;

user@host $ pdfgrep test *.pdf 
1.pdf:test1
1.pdf:test2
1.pdf:test3
2.pdf:test1
2.pdf:test2
2.pdf:test3
test (copy).pdf:test1
test (copy).pdf:test2
test (copy).pdf:test3


user@host $ find /tmp -name '*.pdf' -exec pdfgrep test {} +
/tmp/test (copy).pdf:test1
/tmp/test (copy).pdf:test2
/tmp/test (copy).pdf:test3
/tmp/1.pdf:test1
/tmp/1.pdf:test2
/tmp/1.pdf:test3
/tmp/2.pdf:test1
/tmp/2.pdf:test2
/tmp/2.pdf:test3

How to run pdftotext ... | grep on many documents?

2 Answers2