7

Like discussed in questions like:

that grep might stop processing files it considers binary.

While this is normaly not a big problem, when searching text files, it has turned out, that sometimes some text files are "dirty", i.e. contain some binary data.

My specific case motivating this question is that somehow some binary data (to whatever is considered binary by grep, see questions cited), made its way into the .bash_history file, I would like to remove it.

How can I remove binary data from a text file?

More than just the removing binary data, I would appreciate to be able to have a look on what is the offending (to grep) binary data, as to avoid the removal of something needed/important.

2 Answers2

9
cat -v .bash_history > newbashhistory

Look at newbashhistory and decide if you like it

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
icarus
  • 17,920
4

One way to view the lines containing not-text data is:

perl -nle 'print if m/[^ -~\t\r]/' .bash_history | hexdump -C

basically print the line if it matches a character not (^) in the space to tilde range (the printable stuff, per a review of ascii(7)) and also not some other not-text-but-okay characters (newlines should be magically handled by the -l flag).

If the binary contents of those line looks okay to destroy, then you can delete them via something like:

perl -i.whoopsie -ple 's/[^ -~\t\r]//g' .bash_history

And then perhaps use

cmp -l .bash_history.whoopsie .bash_history

to verify that the correct binaries have been destroyed.

thrig
  • 34,938