I have a large utf-8 text file which I frequently search with grep
. Recently grep
began reporting that it was a binary file. I can continue to search it with grep -a
, but I was wondering what change made it decide that the file was now binary.
I have a copy from last month where the file is no longer detected as binary, but it's not practical to diff
them since they differ on > 20,000 lines.
file
identifies my file as
UTF-8 Unicode English text, with very long lines
How can I find the characters/lines/etc. in my file which are triggering this change?
The similar, non-duplicate question 19907 covers the possibility of NUL but grep -Pc '[\x00-\x1F]'
says that I don't have NUL or any other ANSI control chaarcters.
nul
and someEsc
s. I tried grepping for them. I could find theesc
s (\x1B
), but thenul
never showed up. The test given above showed 1, for the line containingEsc
s, but nothing for any range that didn't contain\x1B
. I wouldn't trust that test. Trygrep -zc .
instead (should be one more than the number ofnul
s in your file). (Also, you might be better off using[[:cntrl:]]
.) – muru Sep 17 '15 at 22:10sed -z 's/.*\(....\)$/\1/' foo | od -c
to see a few characters before theNUL
(if there is one), which might lead you to the problem. – muru Sep 17 '15 at 22:17sed
doesn't have a-z
option:sed: invalid option -- 'z'
. – Charles Sep 17 '15 at 22:19grep
? What didgrep -zc .
say? – muru Sep 17 '15 at 22:26grep -C 2 -aoP '\0' file | od -c
, that should show the NUL (if present) and the surrounding lines. – terdon Sep 17 '15 at 22:41grep -f
), or by using\0
or the like in Perl syntax as suggested by terdon. – Gilles 'SO- stop being evil' Sep 18 '15 at 00:03[\x00]
withgrep -P
). Or did I? – muru Sep 18 '15 at 00:05