4

I have a program that would output something like this:

^[0;33m"2015-02-09 11:42:36 +0700 114.125.x.x access"^[0m

Is there built in linux program that could clean that output up into something like this

"2015-02-09 11:42:36 +0700 114.125.x.x access"
Kokizzu
  • 9,699

2 Answers2

4

Those are ANSI control sequences. There are no programs built-in that remove those codes, at least that I am aware of. A simple sed script, however, will the job for you:

sed -r 's/\x1b_[^\x1b]*\x1b[\]//g; s/\x1B\[[^m]*m//g'

Using the above with your sample input:

$ echo $'\e[0;33m"2015-02-09 11:42:36 +0700 114.125.x.x access"\e[0m'  | sed -r 's/\x1b_[^\x1b]*\x1b[\]//g; s/\x1B\[[^m]*m//g'                    
"2015-02-09 11:42:36 +0700 114.125.x.x access"

OSX or other BSD system

With OSX (BSD) sed, commands cannot be chained together with semicolons. Try, instead:

sed -e 's/\x1b_[^\x1b]*\x1b[\]//g' -e 's/\x1B\[[^m]*m//g'
John1024
  • 74,655
1

The idea of a sed script is okay (and there are several scripts available for this purpose), but the script suggested could be improved:

  • this chunk s/\x1b_[^\x1b]*\x1b[\]//g might be intended to filter out application mode commands. However, the reader is unlikely to find these used, as noted in the xterm documentation:

APC Pt ST
None. xterm implements no APC functions; Pt is ignored. Pt need not be printable characters.

  • if the underscore were changed to a right-square-bracket ], then that would match some operating system controls. Again, however the main use of those in terminals uses the variant ending with \007 (ASCII BEL) for xterm-style titles (often used in bash/zsh prompt-strings).

Given those considerations, a better first chunk might be a non-greedy match for either escape\ or BEL. But sed makes greedy matches. Rather than get complicated, just

s/\x1b\][^\x07]*\x07//g

should suffice.

The other chunk has a problem as well. It is too greedy from the start:

s/\x1B\[[^m]*m//g

because it assumes that after getting rid of application mode (or operating system) commands, the only remaining escape sequences in the shell's output are going to be those that change video modes, i.e., SGR (ending with m). With a slight change, the expression would filter out only SGR sequences, and not get carried away and remove everything beginning with a escape[, e.g.,

s/\x1B\[[;0-9]*m//g

Filtering out color escape sequences can be considered a special case in filtering terminal output to plain text, as answered in Can I programmatically “burn in” ANSI control codes to a file using unix utils? a week before this question was asked.

Further reading:

Thomas Dickey
  • 76,765