97

I can use the "script" command to record an interactive session at the command line. However, this includes all control characters and colour codes. I can remove control characters (like backspace) with "col -b", but I can't find a simple way to remove the colour codes.

Note that I want to use the command line in the normal way, so don't want to disable colours there - I just want to remove them from the script output. Also, I know can play around and try find a regexp to fix things up, but I am hoping there is a simpler (and more reliable - what if there's a code I don't know about when I develop the regexp?) solution.

To show the problem:

spl62 tmp: script
Script started, file is typescript
spl62 lepl: ls
add-licence.sed  build-example.sh  commit-test         push-docs.sh
add-licence.sh   build.sh          delete-licence.sed  setup.py
asn              build-test.sh     delete-licence.sh   src
build-doc.sh     clean             doc-src             test.ini
spl62 lepl: exit
Script done, file is typescript
spl62 tmp: cat -v typescript
Script started on Thu 09 Jun 2011 09:47:27 AM CLT
spl62 lepl: ls^M
^[[0m^[[00madd-licence.sed^[[0m  ^[[00;32mbuild-example.sh^[[0m  ^[[00mcommit-test^[[0m         ^[[00;32mpush-docs.sh^[[0m^M
^[[00;32madd-licence.sh^[[0m   ^[[00;32mbuild.sh^[[0m          ^[[00mdelete-licence.sed^[[0m  ^[[00msetup.py^[[0m^M
^[[01;34masn^[[0m              ^[[00;32mbuild-test.sh^[[0m     ^[[00;32mdelete-licence.sh^[[0m   ^[[01;34msrc^[[0m^M
^[[00;32mbuild-doc.sh^[[0m     ^[[00;32mclean^[[0m             ^[[01;34mdoc-src^[[0m             ^[[00mtest.ini^[[0m^M
spl62 lepl: exit^M

Script done on Thu 09 Jun 2011 09:47:29 AM CLT
spl62 tmp: col -b < typescript 
Script started on Thu 09 Jun 2011 09:47:27 AM CLT
spl62 lepl: ls
0m00madd-licence.sed0m  00;32mbuild-example.sh0m  00mcommit-test0m         00;32mpush-docs.sh0m
00;32madd-licence.sh0m   00;32mbuild.sh0m          00mdelete-licence.sed0m  00msetup.py0m
01;34masn0m              00;32mbuild-test.sh0m     00;32mdelete-licence.sh0m   01;34msrc0m
00;32mbuild-doc.sh0m     00;32mclean0m             01;34mdoc-src0m             00mtest.ini0m
spl62 lepl: exit

Script done on Thu 09 Jun 2011 09:47:29 AM CLT
andrew cooke
  • 1,081

15 Answers15

73

The following script should filter out all ANSI/VT100/xterm control sequences for (based on ctlseqs). Minimally tested, please report any under- or over-match.

#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       \e\[ [ -?]* [@-~] | # CSI ... Cmd
       \e\] .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
       \e[P^_] .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
       \e. //xg;
    print;
}

Known issues:

  • Doesn't complain about malformed sequences. That's not what this script is for.
  • Multi-line string arguments to DCS/PM/APC/OSC are not supported.
  • Bytes in the range 128–159 may be parsed as control characters, though this is rarely used. Here's a version which parses non-ASCII control characters (this will mangle non-ASCII text in some encodings including UTF-8).
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
       (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
       (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
       \e.|[\x80-\x9f] //xg;
    print;
}
  • thanks to both answers. i felt i should make something as a good answer, although both give regexps, which i wanted to avoid. chose this one as it gives a reference for the format. – andrew cooke Jun 16 '11 at 12:49
  • @andrew: My regexp is flexible enough that I expect it to work with pretty much any now-existing terminal, and probably with any tomorrow-existing terminal as well. I haven't tested it much, so there might be bugs, but the approach is sound as control sequences follow a few general patterns. – Gilles 'SO- stop being evil' Jun 16 '11 at 13:53
  • please provide how to use this script. does it require pipe input? or positional arguments? – Trevor Boyd Smith Jan 05 '18 at 17:14
  • 1
    @TrevorBoydSmith Either will work for input, and the output is always on standard output, like typical text utilities. – Gilles 'SO- stop being evil' Jan 05 '18 at 20:31
  • This mangles multibyte characters such as ☺ (\xe2 \x98 \xba). The [\x80-\x9f] clause strips the middle byte. – Jeffrey Jun 09 '18 at 04:52
  • @Jeffrey You're right. Many VTxxx terminals historically interpreted bytes in the range 128–159 as control characters but most terminal emulators these days don't and I've never seen them used for markup in files meant to be displayed on a terminal. I've edited my answer to show a version that lets non-ASCII characters through unchanged. – Gilles 'SO- stop being evil' Jun 10 '18 at 00:08
  • This does not seem to preserve line breaks in applications that output to vty (like less / vim). – Chris Stryczynski Jun 10 '18 at 13:28
  • @ChrisStryczynski It does if they use line feeds. It doesn't if they use cursor motion commands, which they typically would when repainting the screen. Removing terminal commands is the whole point of the script. – Gilles 'SO- stop being evil' Jun 11 '18 at 08:53
  • That makes sense. I used the following sed command to insert line breaks. sed -r 's/^(.{0,98})(.*)/\1\n\2\n/' where 98 in the column width of the terminal (tput cols). – Chris Stryczynski Jun 11 '18 at 09:07
  • FINALLY! Utúvienyes! I have been scouring Teh Intartubez for something that would strip ANSI color escapes (easy) and also the tput sgr0 terminal sequences (largely ignored by SO). Couldn't find or evolve a tr sequence that would catch the latter without destroying all other input. Thank you muchly! – Ti Strga May 30 '22 at 17:08
42

Updating Gilles' answer to also remove carriage returns and do backspace-erasing of previous characters, which were both important to me for a typescript generated on Cygwin:

#!/usr/bin/perl

while (<>) {
  s/ \e[ #%()*+\-.\/]. |
    \r | # Remove extra carriage returns also
    (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
    (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
    (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
    \e.|[\x80-\x9f] //xg;
    1 while s/[^\b][\b]//g;  # remove all non-backspace followed by backspace
  print;
}
Pablo A
  • 2,712
dewtell
  • 521
  • +1 I was already typing a post whith the same question as the OP when I fond this message with your script and that of @Gilles. +1 for both of you – miracle173 Apr 29 '12 at 22:02
  • The time I spent trying various scripts I could have spent just learning how to do the regex on my own... But this script was the first to actually work, thank you very much – Kyle Jul 18 '22 at 17:40
24

There's an ansi2txt command in the colorized-logs package on Ubuntu. It removes ANSI color codes nicely, but it doesn't deal with things like progress bars produced by emitting ^H or ^M characters to overwrite text in place. col -b can deal with those, so for best results you can combine the two

cat typescript | ansi2txt | col -b
  • 3
    Just note that ansi2txt actually has a [-w WIDTH] - and if you do not set it, it is 120 characters - and it will break any lines that have more characters than this. – sdbbs Sep 15 '20 at 13:25
22

I would use sed in this case:

cat -v typescript | sed -e "s/\x1b\[.\{1,5\}m//g"

sed -e "s/search/replace/g" is standard stuff. The regex is explained as below:

  • \x1b match the Escape preceeding the color code
  • \[ matches the first open bracket
  • .\{1,5\} matches 1 to 5 of any single character. Have to \ the curly braces to keep the shell from mangling them.
  • m last character in regex - usually trails the color code.
  • // empty string for what to replace everything with.
  • g match it multiple times per line.
Pablo A
  • 2,712
Glorytoad
  • 780
10
cat typescript | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g' | col -b > typescript-processed
Myer
  • 201
  • 2
  • 4
7
# The "sed -r" trick does not work on every Linux, I still dunno why:
DECOLORIZE='eval sed "s,${END}\[[0-9;]*[m|K],,g"'

=> howto use:

<commands that type colored output> | ${DECOLORIZE}

tested on: - AIX 5.x / 6.1 / 7.1 - Linux Mandrake / Mandriva / SLES / Fedora - SunOS

scavenger
  • 180
4

I solved the problem by running scriptreplay in a screen and the dumping the scrollback buffer to a file.

The following expect script does this for you.

It has been tested for logfiles with up to 250.000 lines. In the working directory you need your scriptlog, a file called "time" with 10.000.000 times the line "1 10" in it, and the script. I needs the name of your scriptfile as command line argument, like ./name_of_script name_of_scriptlog.

#!/usr/bin/expect -f 

set logfile [lindex $argv 0]

if {$logfile == ""} {puts "Usage: ./script_to_readable.exp \$logfile."; exit}

set timestamp [clock format [clock sec] -format %Y-%m-%d,%H:%M:%S]
set pwd [exec pwd]
if {! [file exists ${pwd}/time]} {puts "ERROR: time file not found.\nYou need a file named time with 10.000.000 times the line \"1 10\" in the working directory for this script to work. Please provide it."; exit}
set wc [exec cat ${pwd}/$logfile | wc -l]
set height [ expr "$wc" + "100" ]
system cp $logfile ${logfile}.tmp
system echo $timestamp >> ${logfile}.tmp
set timeout -1
spawn screen -h $height -S $timestamp 
send "scriptreplay -t time -s ${logfile}.tmp 100000 2>/dev/null\r"
expect ${timestamp} 
send "\x01:hardcopy -h readablelog.${timestamp}\r"

send "exit\r"

system sed '/^$/d' readablelog.$timestamp >> readablelog2.$timestamp
system head -n-2 readablelog2.$timestamp >> ${logfile}.readable.$timestamp
system rm -f readablelog.$timestamp readablelog2.$timestamp ${logfile}.tmp

The time file can be generated by

for i in $(seq 1 10000000); do echo "1 10" >> time; done
Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
hnkchnsk
  • 41
  • 1
  • The command for generating time file generated 100% CPU usage for a few minutes and after it finished my memory usage was 100% and running command resulted in "fork: cannot allocate memory". And it didn't really work as expected. – barteks2x May 27 '16 at 17:29
  • 1
    There's a far easier way to generate the timing file. The fields are "delay blocksize", so there's no reason not to just make it "0 <entirefile>" and dump the whole thing with no delay. You can do that by taking the size of the script minus the first line (tail -n +2 typescript|wc -c), and create the timing file with echo "0 "\tail -n +2 typescript|wc -c` > timing. That'll be basically instant, andscriptreplay` will replay the entire script at fastest possible speed. – FeRD Jul 11 '18 at 05:27
4

I use

cat file | ansifilter

See https://gitlab.com/saalen/ansifilter.

m1027
  • 41
  • 1
  • 2
    ansifilter looks to be interesting, but seems overkill for this. There is no need for cat – icarus Jun 18 '21 at 09:11
2

I would prefer to use specialized tools to convert script output into plain text, which is constantly supported and well tested, over custom regexp. So this did job for me:

$ cat typescript | ansi2txt | col -bp > typescript.txt.bp    
$ cat -v typescript.txt.bp

script command captures into typescript file ansi2txt - converts ansi code with escapes like colorcodes, backspaces etc into regular text, however I found that couple escapes still left. col -bp - removed them completely.

I’ve tested this on latest Ubuntu disco, and it works.

2

Although the solutions given so far work nicely to remove the control sequences, however they also remove the formatting codes. The result is that the tables in the output are squished together. My requirement was just to be able to view and search in the session log files collected from the terminal. The solution that works best for me was using less -r.

less -r session.log
AliA
  • 121
  • Thanks, this was the best answer for me. Preserved colour's which was helpful, but removed control chars/sequences that appear to have come from copying and pasting in commands in an SSH session. Seems to have successfully stripped out all the 'bracketed paste' control stuff, which is normally a pain. Thanks. – Chris Jun 06 '23 at 05:36
1

Found this question while looking for the a solution to the same problem. A little more digging and found this script over at Live Journal at this link. I worked perfectly for me. It's also a very good write up about this problem and how the solution works. Definitely worth a read. http://jdimpson.livejournal.com/7040.html

#!/usr/bin/perl -wp

# clean up control characters and other non-text detritus that shows up 
# when you run the "script" command.

BEGIN {
# xterm titlebar escape sequence
$xtermesc = "\x1b\x5d\x30\x3b";

# the occurence of a backspace event (e.g. cntrl H, cntrol W, or cntrl U)
$backspaceevent = "\x1b\\\x5b\x4b"; # note escaping of third character

# ANSI color escape sequence
$ansiesc = qr/\x1b\[[\d;]*?m/;

# technically, this is arrow-right. For some reason, being used against
# very long backspace jobs. I don't fully understand this, as evidenced
# by the fact that is off by one sometimes.
$bizarrebs = qr/\x1b\[C/;

# used as part of the xterm titlebar mechanism, or when
# a bell sounds, which might happen when you backspace too much.
$bell = "\x07"; # could use \a

$cr = "\x0d"; # could use \r

$backspace = "\x08"; # could use \b
}

s/$xtermesc.+?$bell//g;
s/[$cr$bell]//g;
s/${backspaceevent}//g;
s/$ansiesc//g;
while (s/(.)(?=$backspace)//) { s/$backspace//; } # frickin' sweet 
# For every ^H delete the character immediately left of it, then delete the ^H.
# Perl's RE's aren't R, so I wonder if I could do this in one expression.
while (s/(..)(?=$bizarrebs)//) { s/$bizarrebs//; }
SammerV
  • 11
1

I found that just using cat was all I needed to view the output of script in the terminal. This doesn't help when redirecting the output to another file, but does make the result readable, unlike cat -v, col -b, or a text editor.

To eliminate colors or save the results to a file, manually copy and paste the output from cat into a text editor, or into another cat command, i.e.:

cat > endResult << END
<paste_copied_text_here>
END
  • 1
    did your script run include output with color codes attached, as in the OP's case? – Jeff Schaller Feb 07 '19 at 15:34
  • Using cat presents the original colors, which can be removed by manual copy-and-paste. The OP used cat -v and col -b, both of which present codes rather than a properly-formatted end result. I have edited my answer. – Roger Dueck Feb 07 '19 at 15:53
0

I use

env TERM=dumb SHELL=/bin/bash script

leaves you with a typescript file with a bunch of ^M but I just remove those

  • with vi

    :1,$s/^M//g
    

    remembering to escape the ^M with ^V

  • or just use

    dos2unix typescript
    
AdminBee
  • 22,803
johnam
  • 1
-2

Following up on the last answer which uses tr and :cntrl: could we maybe do

sed "/^[[:cntrl:]]/d" output.txt

This seems to work for me because all lines generated by vi start with a control character. It happens to also strip out blank lines and lines that start with a tab, although that works for what I'm doing. Maybe there is a way to match any control character except for \n \m \t.

Maybe we can search for the particular control character, and it looks like all junk lines generated by vi start with what looks like ^[. hexdump tells me the first character is 1b, so this seems to work too

sed "/^\x1b/d" output.txt

This looks similar to an answer posted above, but it does not work properly because after running the command, some junk chars are already added to the command line as if the user had typed them.

snaran
  • 1
  • 1
    There is no "last answer" as the answers can and do change order. You should use the "share" button underneath the answer you want to reference, and include that as a link in your answer. Assuming your answer is sufficient to be more than a comment, of course. Right now I can't identify which of the several answers you're referencing. – Chris Davies Jun 23 '17 at 21:22
  • 1
    “could we maybe do …”  Yes, we *could* do that — but it would delete every line* that begins with a control character.  On the output of, for example, ls --color (as shown in the question), your solution will delete* almost every line that contains information.  Not good.  But thanks for leaving out the useless use of cat.    :-)    ⁠ – G-Man Says 'Reinstate Monica' Jun 23 '17 at 21:23
  • Is there a way to create a character class that is :iscntrl: but not :isspace:? Maybe some syntax like ^[[:iscntrl:]-[:isspace]] – snaran Jun 23 '17 at 22:09
-3

tr - translate or delete characters

cat typescript | tr -d [[:cntrl:]]
Chunk
  • 1