Search text in files with accent characters

Question

I would like to search for text with accents in files. I know that I can use grep for searching regular text:

grep -rnw './' -e 'KORONA'

...but it doesn't work for words with accent characters, like KORONAVÍRUS, obmedzená.

Any recommendation?

for specific accented letters you could try equivalence classes ex. KORONAV[[=I=]]RUS — steeldriver, May 12 '20 at 21:01
@steeldriver No it doesn't work for files with mixed encoding. The equivalence class will be created with the encoding applied when executing the command, but a file could have a different encoding. Even a . will fail, as the "any character" has to be a character valid in the encoding being used to run the command. — , May 12 '20 at 21:54

score 2 · Accepted Answer · answered May 12 '20 at 22:16

If the encoding of all the files is the same, you just need to write the searched sentence in that encoding. That brings up two possible conditions:

The encoding on the command line (or where the command is executed) (probably set by one of the locale variables LC_*) is the same as the encoding of all the files, then, just grep as usual:
```
grep -rn 'KORONAVÍRUS, obmedzená.'
```

Use the -w option only if you want to match the whole line.

If the files could contain different encodings then there is no solution possible. There is no way to auto-detect a file encoding.

It is not possible to search inside a list of files if the files doesn't have an uniform encoding.

Nice one! No need to use an intermediate file though: grep -a "$(echo 'KORONAVÍRUS, obmedzená.' | iconv -t CP1252)" also works :) — Milan Simek, Jun 02 '23 at 16:49

1 Answers1