1

Suppose I have a directory Note_De_cours containing 8 other directories, i.e.

Semaine_1  Semaine_3  Semaine_5  Semaine_7
Semaine_2  Semaine_4  Semaine_6  Semaine_8

Each of those directories contains some pdf files. Is there a command line to search for a word or set of words in each of those pdf in the same time. It is annoying to open a pdf, press Ctrl + f and search for the word. I have thought using grep, but I am really not an expert. Maybe there some other most optimise ways to do that.

I would like to stay in Note_De_Cours and apply pdfgrep to see in all the pdf in the same time. I would like the command to tell me which file contains the word or the set of words I want. How can I do that?

EDIT

Can I loop through this command : find elem -iname '*.pdf' -exec pdfgrep "baysien optimal" {} + on elem? Something like for elem in ...; do find elem -iname '*.pdf' -exec pdfgrep "baysien optimal" {} +

I have done for i in 1 2 3 4 5 6 7 8; do find Semaine_$i -iname '*.pdf' -exec pdfgrep "taux" {} +; done but it does not output the file where it comes from

David
  • 107

2 Answers2

0

You won't get results when you use the grep command directly. Because the applications included in the minimal package of linux can only process files that can be read by vi and nano. (grep, awk etc.) There are many tools and alternatives for special file formats such as PDF. By installing open source archiving software such as Dspace, you can search and catalog all your PDFs in the browser. By adding modules, you can strengthen PDF operations. Or you can use command line-based applications that convert PDF files to plain text files such as pdftotext. Example search command for pdftotext:

pdftotext /file/semaine.pdf - | grep -n -i "Semaine"

-n: Prints the line numbers. -i: Does not distinguish between uppercase and lowercase letters.

By adding wc -l at the end of the command, you can find out how many times the term you're looking for occurs.

You can get more effective results with awk and derivative parameters that you add after pipelines.

As I mentioned above, there is more than one way. I can suggest these two different alternatives.

menderes
  • 76
  • 3
0

Instead of

for i in 1 2 3 4 5 6 7 8; do  find Semaine_$i -iname '*.pdf' -exec pdfgrep "taux" {} +; done

if you want to print the name of the file, use -print on find (to print the name after the matches) or -l on grep (to print the name instead of the matches):

find Semaine_[1-8] -iname '*.pdf' -exec pdfgrep "taux" {} \; -print

or

find Semaine_[1-8] -iname '*.pdf' -exec pdfgrep -l "taux" {} \;

Also, pdfgrep has built-in recurse functionality via the -r flag, so you could simply do:

pdfgrep -r -l "taux" Semaine_[1-8]