7

I have a text file that contains (ANSI ?) escape sequences:

raw file

When I cat the file I get formatted output:

formatted output

How do I save / pipe the output of the text file to a new file so that the control codes are removed but the formatting is preserved?

In other words, I want to convert / export the contents of the file to a new file while retaining the intended display format (e.g. like one would get when using less -r <file>).

I need an automated way to do this so that the output can be logged and sent as an email notification.

Here is the base64 output of the file in the above screenshots (use base64 -d to decode it):

G1tIG1syShtbMTsxSA0KG1s0OzRIICAgICAgICAgICAgICAgICAgICAgICAgICBDT05GSURFTlRJ
QUwgSU5GT1JNQVRJT04bW0sbWzU7NEgbW0sbWzY7NEgbW0sbWzc7NEhUaGlzIHdvcmsgY29udGFp
bnMgdmFsdWFibGUsIGNvbmZpZGVudGlhbCwgYW5kIHByb3ByaWV0YXJ5IGluZm9ybWF0aW9uLhtb
SxtbODs0SERpc2Nsb3N1cmUsIHVzZSwgb3IgcmVwcm9kdWN0aW9uIGlzIGdvdmVybmVkIGJ5IHlv
dXIgTGljZW5zZSBBZ3JlZW1lbnQuG1tLG1sxMDs0SBtbSxtbMTE7NEhUaGlzIHVucHVibGlzaGVk
IHdvcmsgaXMgcHJvdGVjdGVkIGJ5IHRoZSBsYXdzIG9mIHRoZSBVbml0ZWQgU3RhdGVzIGFuZBtb
SxtbMTI7NEhvdGhlciBDb3VudHJpZXMuICBUaGUgd29yayB3YXMgY3JlYXRlZCBpbiAxOTg4IGFu
ZCByZXZpc2VkIGluIDE5OTQuICAbW0sbWzEzOzRISWYgcHVibGljYXRpb24gb2NjdXJzLCB0aGUg
Zm9sbG93aW5nIG5vdGljZSBzaGFsbCBhcHBseTobW0sbWzE0OzRIG1tLG1sxNTs0SBtbSxtbMTY7
NEggICBDb3B5cmlnaHQgMTk4OCwxOTk0LiBBbGwgcmlnaHRzIHJlc2VydmVkLhtbSxtbMTc7NEgb
W0sbWzE4OzRIG1tLG1sxOTs0SFRoaXMgQ29weXJpZ2h0IG5vdGljZSBhbmQgb3RoZXIgY29weXJp
Z2h0IG5vdGljZXMgaW5jbHVkZWQgaW4gdGhlIG1hY2hpbmUbW0sbWzIwOzRIcmVhZGFibGUgY29w
aWVzIG11c3QgYmUgcmVwcm9kdWNlZCBvbiBhbGwgYXV0aG9yaXplZCBjb3BpZXMuG1tLG1syMTs0
SBtbSxtbMjI7NEhUaGlzIGlzIGEgcmVnaXN0ZXJlZCB0cmFkZW1hcmsuG1tLDQo=
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
mvanle
  • 335
  • 1
    How can "pure" text have formatting? – muru Jan 13 '21 at 04:31
  • cat it in a terminal emulator, select the text, copy it, and then paste it into another file. Alternatively, you can write a script which interprets just the escapes present in the file (clear, cup, el), but it's not worth the trouble (just like doing research & evaluation for some already existing program doing that). –  Jan 13 '21 at 06:13
  • e.g. in xterm you press Shift-PageUp to scroll back until the beginning of the text, Left-Click at the start of selection, scroll back down with Shift-PageDown, Right-Click to the end of selection, then enter cat > outfile and press Shift-Insert and Control-D to save it into another file. –  Jan 13 '21 at 06:19
  • How do I save / pipe the output of the text file to a new file so that the control codes are removed but the formatting is preserved? - it's possibly only if by formatting you mean indentation, paragraphs, whitespaces etc. – Arkadiusz Drabczyk Jan 13 '21 at 21:49
  • 1
    With xterm, you can dump the screen to html or svg. Using screen or tmux, you can dump the screen to text in a different way. But "formatting" is meaningless, since there are many different ways to represent the screen contents. – Thomas Dickey Jan 21 '21 at 01:47

3 Answers3

7

Best here would be to have a terminal emulator interpret those sequences, and tell it to dump the text part in the result.

screen is one of those terminal emulators that you can easily interact with in a script. You could do:

INPUT=file.txt OUTPUT=output.txt screen -Dmc /dev/null sh -c '
  screen -X scrollback 100000
  cat < "$INPUT"
  screen -X hardcopy -h "$OUTPUT"'

That starts a new (-m) Detached screen session with an empty config file (/dev/null). In there we run that inline sh script in a screen window, were we increase the scrollback size (though here the output would fit on one screen), dump the input file in the screen window, then call hardcopy -h to dump the contents of the screen including scrollback into the output file.

2

The file contains a number of escape sequences (the character sequence ^[ is an escape character):

  • ^[[2J, clears the screen.

  • ^[[H, moves the cursor to the top of the screen.

  • ^[[x;yH, moves the cursor to position (x,y).

    This means the file basically uses absolute positioning of where the text should go on the screen.

  • ^[[K clears to the end of the line.

The file also contains a couple of carriage returns that we may want to remove.

We may choose to remove most of these, but to replace the "move to (x,y)" sequences with a single newline for simplicity. Luckily, each positioning like this just moves the cursor down a line, and indents it by four spaces (there are no "out of sequence" lines of text).

sed -e $'s/\e\\[2J//g' \
    -e $'s/\e\\[H//g' \
    -e $'s/\e\\[[[:digit:]]*;[[:digit:]]*H/@    /g' -e 'y/@/\n/' \
    -e $'s/\e\\[K//g' \
    -e $'s/\r//g' file

The substitutions here are done using C-strings in the shell ($'...', supported by e.g. bash and zsh) to encode the escape characters.

Each ^[[x;yH sequence is replaced by a newline and four spaces of indentation in a two-step way; by first replacing them by @     (@ is any character not otherwise present in the data), and then replace each @ with a newline using y///. This is due to s/// not being able to insert newlines (although GNU sed can).

We can also choose to work with the output of running cat -v on the data. This makes some control characters visible in a non-ambiguous way.

$ cat -v file
^[[H^[[2J^[[1;1H^M
^[[4;4H                          CONFIDENTIAL INFORMATION^[[K^[[5;4H^[[K^[[6;4H^[[K^[[7;4HThis work contains valuable, confidential, and proprietary information.^[[K^[[8;4HDisclosure, use, or reproduction is governed by your License Agreement.^[[K^[[10;4H^[[K^[[11;4HThis unpublished work is protected by the laws of the United States and^[[K^[[12;4Hother Countries.  The work was created in 1988 and revised in 1994.  ^[[K^[[13;4HIf publication occurs, the following notice shall apply:^[[K^[[14;4H^[[K^[[15;4H^[[K^[[16;4H   Copyright 1988,1994. All rights reserved.^[[K^[[17;4H^[[K^[[18;4H^[[K^[[19;4HThis Copyright notice and other copyright notices included in the machine^[[K^[[20;4Hreadable copies must be reproduced on all authorized copies.^[[K^[[21;4H^[[K^[[22;4HThis is a registered trademark.^[[K^M

Here, we can use the following sed command:

cat -v file |
sed -e 's/\^\[\[2J//g' \
    -e 's/\^\[\[H//g' \
    -e 's/\^\[\[[[:digit:]]*;[[:digit:]]*H/@    /g' -e 'y/@/\n/' \
    -e 's/\^\[\[K//g' \
    -e 's/\^M//g'

Note that the -v option to cat is not standard, but the output of cat implementations that supports it seems to be consistent.

On some systems, the vis utility may be used in place of cat -v, but it generates other visible representations of control characters.

Kusalananda
  • 333,661
-2

Easy way it to use truncate command with delete option. \033 represent escape character.

$cat your_filename|tr -d \033 > save_filename

use following command to remove all the control characters

$cat your_filename|tr -d [:cntrl:] > save_filename



Hope that helps

  • 1
    I'm pretty sure tr is short for "translation" or "translitteration", not "truncation", which would imply cutting the file short. Also, the way you've written those commands, the first would remove all zeroes and threes (because the unquoted backslash is just removed by the shell), and while the second would remove the ESC, it would leave the rest of the escape sequences intact, so you'd be left with stuff like [K[5;4H etc. in the middle of the text. However, [:cntrl:] is an unquoted glob, so if you have files named c or n etc., it will get expanded to those. – ilkkachu Jan 28 '21 at 19:32
  • BTW, you may want to take a look at the editing help, esp. the first part about code formatting. – ilkkachu Jan 28 '21 at 19:32