I want to search for a specific keyword in a pdf file using linux shell. How I will use the grep command to do that?
Asked
Active
Viewed 286 times
0
-
1This should help: http://unix.stackexchange.com/questions/6704/how-can-i-grep-in-pdf-files – userABC123 Sep 10 '15 at 13:14
1 Answers
2
You won't. PDF is a binary format so you need to convert to text first. Grep can search through the data but there's no reason to assume that a PDF that, when opened in a PDF viewer, has the string foo
will actually contain foo
in the original, binary data. It may be written very differently in the source.
A simple solution is to install pdftotext
and use that. It should be available in your distribution's repositories. On Debian-based systems, you can install it with:
sudo apt-get install poppler-utils
Then, you can search through your PDF file with:
pdftotext foo.pdf - | grep keyword

terdon
- 242,166
-
what will be the output of this command? I have used it but no output is shown. If I want to display the found keyword what will I have to do? @terdon♦ – Sep 10 '15 at 13:41
-
@user3435851 it will be displayed if it was found, just like any other grep. If you get no output, the keyword was not present in the PDF. – terdon Sep 10 '15 at 13:42
-
-
@user3435851 use a loop:
for file in *pdf; do pdftotext "$file" | grep pattern"; done
. Please ask a separate question if you are having trouble with this, but make sure to search the site first. All of this has been covered before. – terdon Sep 10 '15 at 13:50 -
-
What command? What error? Please post a new question for this. This is a Q&A site, not a discussion forum and we want to avoid long discussions in the comments. Please take the [tour] to understand how the site works. – terdon Sep 10 '15 at 14:08