1

I set up a new, basic Linux server (CentOS in this case) just for testing purposes. In there, without any firewalls and all that I'm running a Python web application. Basically I type python run.py and have the application on foreground, while the application itself is run on some port like 8080, so in web browser I use I just type my_public_ip_addr:8080 and use it just fine. All this happens over SSH from my laptop.

Now, I left my laptop open for a while, and when I came back, the shell was displaying something like this:

83.20.238.86 - - [13/Mar/2019 08:54:43] "GET / HTTP/1.1" 200 -
87.122.83.97 - - [13/Mar/2019 11:55:30] "GET / HTTP/1.1" 200 -
176.32.33.145 - - [13/Mar/2019 12:08:36] "GET / HTTP/1.1" 200 -
176.32.33.145 - - [13/Mar/2019 12:08:36] code 400, message Bad request syntax ('\x16\x03\x01\x00\xfc\x01\x00\x00\xf8\x03\x03(\xd3FM\xf5\x0eLo\x17\xa3|\x1f8\xca~#\x07\xc1\x1f&&\x14\x19\x11\x10:\x824\xd23nA\x00\x00\x8c\xc00\xc0,\xc02\xc0.\xc0/\xc0+\xc01\xc0-\x00\xa5\x00\xa3\x00\xa1\x00\x9f\x00\xa4\x00\xa2\x00\xa0\x00\x9e\xc0(\xc0$\xc0\x14\xc0')
176.32.33.145 - - [13/Mar/2019 12:08:36] "��(M�L⎺�≠8#�&&:�4┼A��▮�←�2�↓�/�→�1�↑���������(�$��" 4▮▮ ↑
176↓32↓33↓145 ↑ ↑ [13/M▒⎼/2▮19 14:55:55] "GET / HTTP/1↓1" 2▮▮ ↑
176↓32↓33↓145 ↑ ↑ [13/M▒⎼/2▮19 14:55:55] c⎺de 4▮▮← └e⎽⎽▒±e B▒d ⎼e─┤e⎽├ ⎽≤┼├▒│ ('\│16\│▮3\│▮1\│▮▮\│°c\│▮1\│▮▮\│▮▮\│°8\│▮3\│▮3\│92\│8e\│°7\│9e\│1▒\│▒2\│1e\│°8\│°bb^\│1b\│d1\│▒1\│1e\│d2\│d1^\│1e/└\│96_(\│beU\│▮4\│8d≥\│d7⎻\│°e\│▮▮\│▮▮\│8c\│c▮▮\│c▮←\│c▮2\│c▮↓\│c▮/\│c▮→\│c▮1\│c▮↑\│▮▮\│▒5\│▮▮\│▒3\│▮▮\│▒1\│▮▮\│9°\│▮▮\│▒4\│▮▮\│▒2\│▮▮\│▒▮\│▮▮\│9e\│c▮(\│c▮$\│c▮\│14\│c▮')
176↓32↓33↓145 ↑ ↑ [13/M▒⎼/2▮19 14:55:55] "���������b^[⎺⎼▒┼±e@ce┼├⎺⎽↑⎺⎼▒┼±e ⎺⎼▒┼±e_±c]$ ^C
[⎺⎼▒┼±e@ce┼├⎺⎽↑⎺⎼▒┼±e ⎺⎼▒┼±e_±c]$ ┌⎺±⎺┤├
C⎺┼┼ec├☃⎺┼ ├⎺ 1▮4↓248↓36↓8 c┌⎺⎽ed↓
▒d▒└@±⎽:·$ 
▒d▒└@±⎽:·$ 
▒d▒└@±⎽:·$ 
▒d▒└@±⎽:·$ ec▒⎺ '▒e┌┌⎺ ⎽├▒c┐ ⎺┴e⎼°┌⎺┬'
▒e┌┌⎺ ⎽├▒c┐ ⎺┴e⎼°┌⎺┬
▒d▒└@±⎽:·$ 

You can see 3 last "normal" GET requests to /, but then it begins. I know it can be fixed (link1 or link2) and these were some scanning bots, but my question is:

How does it work, that incoming request broke my terminal?

adamczi
  • 251
  • 1
    Can you run cat -vet on that log file and show us the lines from 12:08:36 and 14:55:55 ? There might be a Ctrl-N in there. https://en.wikipedia.org/wiki/Shift_Out_and_Shift_In_characters – Mark Plotnick Mar 13 '19 at 17:09
  • 1
    It's possible that someone can guess the byte sequence that was printed, but it might help diagnose the precise byte sequence if you post a hex dump of the bytes that were sent to the terminal. eg: python run.py | tee run.log followed by hexdump -C run.log – Philip Couling Mar 13 '19 at 17:10
  • Hi @MarkPlotnick, thanks for your comment. The thing is that I don't have those strings in any logfile, it just copy-pasted content straight from my terminal output – adamczi Mar 13 '19 at 18:50
  • @PhilipCouling thanks for your suggestion, but as I wrote above, it has already happened, so I can only run it again and wait for the next try. – adamczi Mar 13 '19 at 18:52
  • 1
    while I think JdeBP has a very good point in their answer, I'm voting to close this is as non-reproducible, because like the previous comments say, answering the question about the "bad character" would require seeing the actual data and you said you don't have that. – ilkkachu Mar 13 '19 at 19:24
  • @ilkkachu I removed that part of the question about specific character and left the "how does it work" one – adamczi Mar 14 '19 at 10:55

1 Answers1

5

Let this be a lesson in security. Your program dumps network-supplied input directly to its log, as-is. You dumped the log output directly to a user terminal. You gave attackers out on Internet at large the ability to control output on your terminal.

Pipe your logs through cyclog and multilog or similar, as I explained at https://unix.stackexchange.com/a/505854/5132, so that they go to a set of strictly size-capped, automatically rotated, log files rather than to a terminal. Then read those log files using tools that will sanitize control characters.

The "bad characters" here are well known, and are standardized by ECMA-35 (a.k.a. ISO/IEC 2022) in conjunction with a large registry of character sets. Your terminal emulator implements two switchable portions of the 8-bit character set, known as "GL" and "GR". Various standard control characters and escape sequences switch these two amongst four designated character sets, known as "G0", "G1", "G2", and "G3". These four are in turn mapped to actual character sets by further escape sequences.

The set of byte sequences that can mess up your output is rather large. There are more than just and , as question comments would lead you to believe. There are four possible shifts of two shiftable areas, and locking and single shifts. The C1 control characters for shifting have two representations. Then there are just under two hundred possible mapped character sets for each of the four shifts, each with their own escape sequence.

It's quite a complex system, and if you are at this point thinking "Surely it's better to just use Unicode?" you will not be the first. The inventors of mosh made it a selling point that their terminal emulator does not implement any of this character set switching. Neither does my console-terminal-emulator. Our terminal emulators simply will not get into these difficulties. Markus Kuhn has been encouraging dropping ISO 2022 character set switching since 1999.

Further reading

JdeBP
  • 68,745