1

I would like to search for text with accents in files. I know that I can use grep for searching regular text:

grep -rnw './' -e 'KORONA'

...but it doesn't work for words with accent characters, like KORONAVÍRUS, obmedzená.

Any recommendation?

user66638
  • 147
  • 1
    for specific accented letters you could try equivalence classes ex. KORONAV[[=I=]]RUS – steeldriver May 12 '20 at 21:01
  • @steeldriver No it doesn't work for files with mixed encoding. The equivalence class will be created with the encoding applied when executing the command, but a file could have a different encoding. Even a . will fail, as the "any character" has to be a character valid in the encoding being used to run the command. –  May 12 '20 at 21:54

1 Answers1

2

If the encoding of all the files is the same, you just need to write the searched sentence in that encoding. That brings up two possible conditions:

  • The encoding on the command line (or where the command is executed) (probably set by one of the locale variables LC_*) is the same as the encoding of all the files, then, just grep as usual:

    grep -rn 'KORONAVÍRUS, obmedzená.'
    

Use the -w option only if you want to match the whole line.

  • If the encoding of all files is different, change the search string to that encoding.

    $ echo 'KORONAVÍRUS, obmedzená.' >orig
    $ grep -ran "$(cat orig | iconv -t CP1252)"
    

    Here, the -a option allows grep to search inside files with diferent encodings that may be detected as binary.

If the files could contain different encodings then there is no solution possible. There is no way to auto-detect a file encoding.

It is not possible to search inside a list of files if the files doesn't have an uniform encoding.

Related:
How to use grep/ack with files in arbitrary encoding?

  • Nice one! No need to use an intermediate file though: grep -a "$(echo 'KORONAVÍRUS, obmedzená.' | iconv -t CP1252)" also works :) – Milan Simek Jun 02 '23 at 16:49