0

A small grep code I used to use very often is not working as before. Bellow there is an example in which I have a data set with 7 rows and 3 columns, the dataset is named animal.txt:

Animal Habitat Family
Bear   forest  Ursidae
Dog    house   Canidae
Cat    house   Fenidae
Wolf   mountain Canidae
Eagle  mountain Accipitridae
Lion   sabana  Fenidae

I have a list with the names of 3 animals. What I want is to extract the lines that contains those names of animals. The list is named animal3.txt.

Dog
Cat
Bear

Both the dataset and the list are tab delimited files. The code I am using is:

grep -w -F -f ./animal3.txt ./animal.txt > ./output.txt

The output only have the line of Bear I have searched in several forums and haven't find something similar. I really don't know what is going on or what I am doing wrong.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
Fersal
  • 67
  • Works fine for me. However, I'm assuming animal.hmp.txt is a typo, since you earlier stated the filename was animal.txt? Also, all three ./ are unnecessary. This means the file in the current directory, which is already implicit. – Sparhawk Mar 17 '18 at 00:42
  • 2
    The first thing I would check for is non-printing characters in the file(s) - in particular, DOS line endings - e.g. cat -et animal3.txt – steeldriver Mar 17 '18 at 00:42
  • 1
    Is it possible that your file animal3.txt does not have the style of line ending it used to have, ie now it's CRLF? – Ulrich Schwarz Mar 17 '18 at 00:43
  • Yes, I already checked it that and all the animal's names have those ^M$ terminators except the last one. So, It is very probably that is the reason grep is not working well. I did't know that. So, how can I remove that? – Fersal Mar 17 '18 at 00:59
  • You can use any of the methods described here: Remove ^M character from log files – steeldriver Mar 17 '18 at 01:11
  • Thank you guys!. Yours comment were very useful. I already solve my issue. – Fersal Mar 17 '18 at 12:06

1 Answers1

1

From comments I gather that the animal3.txt file has carriage returns at the end of at least some lines. These characters becomes part of the pattern that grep is using which in turn makes the patterns not match in the animal.txt file.

If the file is not supposed to have any carriage returns, then you may use

tr -d '\r' <animal3.txt >animal3-new.txt

to delete them. You may then replace the original animal3.txt file with the corrected animal3-new.txt file.

Kusalananda
  • 333,661
  • The OP mentions the last line has no ^M which would indicate it's a MS-DOS type file also missing the last line delimiter. dos2unix would fix all those problems. Those types of files could also have UTF-8 BOMs or other idiosyncrasies. – Stéphane Chazelas Mar 17 '18 at 08:17