2

If I do the following commands:

$ cat picture.jpg > copy1.jpg

and

$ cat -v picture.jpg > copy2.jpg

copy1.jpg is a perfect copy of picture.jpg, but copy2.jpg is a lot bigger than picture.jpg.

I assume this is because copy2.jpg has had each of what cat thought were its line endings replaced by a ^M, and each ^M is bigger in size than a line ending. Is this correct?

If then do cat copy2.jpg, I find that there are no instances of ^M in copy2.jpg.

What's going on here? And can cat be relied upon for joining files perfectly using >, if its output can be different from its input?

slm
  • 369,824
EmmaV
  • 4,067

2 Answers2

5

It's not just ^M. Every byte with a non-printable character (whatever that means in your current locale) will be expanded to a multiple-byte printable equivalent under cat -v.

If you're using cat to join files, you need to avoid every option that modifies the output: -b and -n (number lines), -E (mark line endings with $), -s (suppress repeated empty lines), and -v and -T (display non-printable characters using printable characters).

Mark
  • 4,244
1

Your analysis sounds correct to me. I would use cat to join files, since that's its primary function. Just do so without the -v switch, or any switches for that matter.

Using cat -v .. on the file has essentially trashed it. Did you try and open it in an image viewer? I tried your method and that's exactly what happened to mine.

You can see the evidence of this using the file command too:

$ file copy*
copy1.png: PNG image data, 1440 x 847, 8-bit/color RGB, non-interlaced
copy2.png: ASCII text, with very long lines

cat's info page sheds a bit more light on the subject:

'-v'
'--show-nonprinting'
     Display control characters except for LFD and TAB using '^'
     notation and precede characters that have the high bit set with
     'M-'.

On systems like MS-DOS that distinguish between text and binary
files, 'cat' normally reads and writes in binary mode.  However, 'cat'
reads in text mode if one of the options '-bensAE' is used or if 'cat'
is reading from standard input and standard input is a terminal.
Similarly, 'cat' writes in text mode if one of the options '-bensAE' is
used or if standard output is a terminal.

So where are the ^M's?

If you open your copy2.jpg file in say vim you'll see that it's littered with them, for example:

                  ss#1

slm
  • 369,824