12

I was wondering why if you open a textfile made in Windows notepad under unix you will find that it has ^M where there should be a new line?

My understanding is that in Windows, every line is ended with \r\n, ie 0x0D0A in ASCII, while ^M has ASCII value 0x5E4D. I cannot relate these two from one to the other.

Caleb
  • 70,105
Tim
  • 101,790

1 Answers1

21

You're right about the line endings being important; both OSes expect the line to end with "\n", but Windows also adds a "\r" before that that unix doesn't expect, so unix programs will output the "\r" in their own way.

The file doesn't actually end with the two characters "^" and "M", that's just a common way to represent unprintable characters. Programs will output "^" and a letter corresponding to the byte's value, starting with A for 1. M is the 13th letter, and '\r' is ASCII code 13 (or 0xD, as you said), so you see "^M"

Michael Mrozek
  • 93,103
  • 40
  • 240
  • 233
  • 2
    That's also short-hand for Ctrl-M, which is how you get that character with the keyboard. – Steven Pritchard Jul 30 '11 at 02:31
  • 7
    You can use the command line utilities dos2unix and unix2dos to convert text files between formats. – Chris Nava Jul 30 '11 at 05:14
  • 1
    @Chris True, but doesn't really have anything to do with the question – Michael Mrozek Jul 30 '11 at 05:17
  • Thanks! (1) I was wondering what programs will output unprintable characters that way? For example, are all text editors/viewers work that way? (2) what kinds of unprintable characters are treated that way, and what kinds are not? For example, why don't text viewers output ^J for \n? – Tim Jul 30 '11 at 12:59
  • (3) I also remember sometimes some unprintable characters are printed as a square with its oct number inside. So I wonder when an unprintable character is output this way and when as Ctrl + some character? – Tim Jul 30 '11 at 13:46
  • 1
    The ^M representation predates GUI text editors by years, if not decades. – Chris Nava Aug 01 '11 at 18:57
  • @Tim, there are several conventions. Some print ? for nonprintable characters (or ? on inverse, or some such), sometimes ^X for ctrl-X, or \xx for the character with hexadecimal code xx. And @ChrisNava is right that this predates GUIs by decades. – vonbrand Jan 26 '13 at 02:41