2

INPUT:

[user@notebook test]$ cat a.txt

music
musicsheet
sheetmusic

[user@notebook test]$ cat a.txt | cat -vte -
$
^[[1mmusic^[[22m$
^[[1mmusicsheet^[[22m$
^[[1msheetmusic^[[22m$
^[[4m^[[24m$
[user@notebook test]$ 

NEEDED OUTPUT (after removing these interesting characters):

[user@notebook test]$ cat a.txt 
music
musicsheet
sheetmusic
[user@notebook test]$ cat a.txt | cat -vte -
music$
musicsheet$
sheetmusic$
[user@notebook test]$ 

Question: How can I remove the interesting/unknown characters:

^[[1m
^[[22m
^[[4m
^[[24m

what are these characters? could there be more similar?

Trying to use tr to remove non-printable characters just makes these interesting chars visible and removes newline, what is both bad:

[user@notebook test]$ cat a.txt | tr -cd '[:print:]'
[1mmusic[22m[1mmusicsheet[22m[1msheetmusic[22m[4m[24m[user@notebook test]$ 
pepite
  • 1,103
  • Those are color codes. How was this file generated? Did you maybe do something like ls > a.txt? It would be much simpler to just regenerate the file. And do you really need the $? Or do you just want a newline character at the end of each line? Please [edit] your question and clarify. – terdon Jan 23 '17 at 11:53

2 Answers2

3

You want to remove CSI...m sequences. Knowing that they contain a sequence of numbers separated by ;, you can use sed to replace each occurrence with an empty string:

esc=$'\e'
sed "s/$esc\[[0-9;]*m//g" a.txt

I'm using Bash syntax to write the escape character above.

Toby Speight
  • 8,678
0

@Toby Speight's solution is fine. Some extra information:

Normally those are the ansi "special character" used to produce color, special effects, position the cursor, etc in the terminal.

for example grep --color=always '[a-z]*music[a-z]*' files > output will produce characters like that.

sed -r "s/\x1B\[[0-9;]*[a-zA-Z]//g"

Sugestion: check if you have a GREP_COLOR deprecated variable set to --color=always or similar...

JJoao
  • 12,170
  • 1
  • 23
  • 45