0

I want to search for a specific keyword in a pdf file using linux shell. How I will use the grep command to do that?

1 Answers1

2

You won't. PDF is a binary format so you need to convert to text first. Grep can search through the data but there's no reason to assume that a PDF that, when opened in a PDF viewer, has the string foo will actually contain foo in the original, binary data. It may be written very differently in the source.

A simple solution is to install pdftotext and use that. It should be available in your distribution's repositories. On Debian-based systems, you can install it with:

sudo apt-get install poppler-utils

Then, you can search through your PDF file with:

pdftotext foo.pdf - | grep keyword
terdon
  • 242,166
  • what will be the output of this command? I have used it but no output is shown. If I want to display the found keyword what will I have to do? @terdon♦ –  Sep 10 '15 at 13:41
  • @user3435851 it will be displayed if it was found, just like any other grep. If you get no output, the keyword was not present in the PDF. – terdon Sep 10 '15 at 13:42
  • Thanks. Another question: how to search in multiple pdf files? @terdon♦ –  Sep 10 '15 at 13:48
  • @user3435851 use a loop: for file in *pdf; do pdftotext "$file" | grep pattern"; done. Please ask a separate question if you are having trouble with this, but make sure to search the site first. All of this has been covered before. – terdon Sep 10 '15 at 13:50
  • The command is showing error. Will you please check it? @terdon –  Sep 10 '15 at 14:07
  • What command? What error? Please post a new question for this. This is a Q&A site, not a discussion forum and we want to avoid long discussions in the comments. Please take the [tour] to understand how the site works. – terdon Sep 10 '15 at 14:08