Result of diff two files with switched lines says missing the same line twice

Question

I am trying to understand the linux diff command on two files whose lines are just permutation of each other but not able to grok the output that it generates. Consider the three commands below:

[myPrompt]$ cat file1
apples
oranges
[myPrompt]$ cat file2 
oranges
apples
[myPrompt]$ diff file1 file2
1d0
< apples
2a2
> apples

Can someone explain the above cryptic output from diff.

Why there is no mention of "oranges" at all in the output?
What does 1d0 and 2a2 mean?

I understand from this answer that :

"<" means the line is missing in file2 and ">" means the line is missing in file1

BUT that doesn't explain why oranges is missing in the output.

Because oranges is the largest common part between the two files, so what you obtain is the shortest way to express the differences between the two. — Stéphane Chazelas, Jul 25 '14 at 10:30
And if you want more readable output, just use diff -u file1 file2 instead. That's called "unified diff" format. The original diff format was meant to be very compact, but unified diffs are meant to be much more readable. — godlygeek, Jul 25 '14 at 11:43

score 28 · Accepted Answer · edited Jul 25 '14 at 13:37

28

To understand the report, remember that diff is prescriptive, describing what changes need to be made to the first file (file1) to make it the same as the second file (file2).

Specifically, the d in 1d0 means delete and the a in 2a2 means add.

Thus:

1d0 means line 1 must be deleted in file1 (apples). 0 in 1d0 means line 0 is where they would have appeared in the second file (file2) had they not been deleted. That means when changing file2 to file1 (backwards) append line 1 of file1 after line 0 of file2.
2a2 means append the second line (oranges) from file2 to the now second line of file1 (after deleting the first line in file1, oranges switched to line 1)

edited Jul 25 '14 at 13:37

Giulio Muscarello

163

answered Jul 25 '14 at 10:34

chaos

48,171

what is 0 in 1d0? – Geek Jul 25 '14 at 10:40
@Geek see my edit – chaos Jul 25 '14 at 10:45
1

@Geek But be careful, that can make knots in the brain =) – chaos Jul 25 '14 at 10:50
that has, indeed started making knots :-) – Geek Jul 25 '14 at 10:51

polym · Answer 2 · 2014-07-25T13:41:41.560

Consider these files:

file1:

# cat file1
apples
pears
oranges
peaches

file2:

# cat file2
oranges
apples
peaches
ananas
banana

How diff works, given it is order-based:

diff reads the first block of lines of file1 and file2, and tries to find equal lines:

  file1        file2        differences on left (<) or right side (>)
  apples                   <apples
  pears                    <pears 
  -------------------------------
->oranges    ->oranges
  peaches      apples
               peaches
               ananas
               banana

Now it will skip all lines that are equal in both files, which is just oranges in this case:

  file1        file2        differences on left (<) or right side (>)
  apples                   <apples
  pears                    <pears 
  oranges      oranges
  -------------------------------
->peaches    ->apples
               peaches
               ananas
               banana

Now find another set of similar lines and print out differences:

  file1        file2        differences on left (<) or right side (>)
  apples                   <apples
  pears                    <pears 
  oranges      oranges
               apples      >apples
  -------------------------------
->peaches    ->peaches
               ananas
               banana

Skip the similar lines

  file1        file2        differences on left (<) or right side (>)
  apples                   <apples
  pears                    <pears 
  oranges      oranges
               apples      >apples
  peaches      peaches
  -------------------------------
->           ->ananas
               banana

Find identical lines, if possible, and print differences:

line_file1    file1    line_file2    file2        differences on left (<) or right side (>)
         1    apples                              <apples 
         2    pears                               <pears 
         3    oranges           1    oranges
                                2    apples       >apples
         4    peaches           3    peaches
                                4    ananas       >ananas
                                5    banana       >banana
         -----------------------------------------------

Now if I do diff file1 file2:

# diff file1 file2
1,2d0
< apples
< pears
3a2
> apples
4a4,5
> ananas
> banana

Now it is simple to explain what diff's output means:

To make file1 equal to file2:

1,2d0: Delete (d) lines 1-2 from file1 and modify line 0 of file2 accordingly
3a2: Append (a) to line 3 of file1 line 2 of file2
4a4,5: Append to line 4 of file1 lines 4-5 of file2

diff compares file1 with file2 line by line and settles differences in temporary memory. After making file1 equal to file2 until the first occurrence of a line in file1, which also occurs in file2, all lines that are equal up until a difference are not mentioned, often indicated as ---. In this case there is only one similar line, which is oranges. Note that I said file1 equal to file2, so file1 is viewed relative to file2 and not the other way around.

The output is in relation to the first file given, in this case file1.

I don't like the initial explanation: apples occurs in both files just as well. — O. R. Mapper, Jul 25 '14 at 10:57
@O.R.Mapper I changed the explanation. Does it sound more clear/better now :)? — polym, Jul 25 '14 at 12:06
Not quite, for now you wrote "there is only one similar line, which is oranges". Wrong: There are actually two lines, which are not only similar, but absolutely identical. One of them reads oranges, the other one reads apples. Also, your explanation (purely order-based) is in contradiction to Stéphane's comment on the question (length-based) - who is correct? — O. R. Mapper, Jul 25 '14 at 12:09
@O.R.Mapper You forgot "In this case" and the lines before that. I meant it in this step there is only one similar line. I will just add an example to my answer so that it can be understood better. — polym, Jul 25 '14 at 12:30
@O.R.Mapper Also can you give me an example that shows that the length-based answer is correct? — polym, Jul 25 '14 at 12:52
I read your answer several times, and I always instinctively understood in this case as "in this case, where there are two identical lines in the two files, apples and oranges. As for the length-based answer, where do I claim that it is correct? I am merely pointing out that the length-based explanation (from a comment that was upvoted six times and not disputed a single time) is in contrast to your explanation, so I asked which one is correct. — O. R. Mapper, Jul 25 '14 at 12:56
@O.R.Mapper With my example, is it now clear? If it is still unclear, please just edit my answer to make it clear. — polym, Jul 25 '14 at 13:37

score 8 · Answer 3 · answered Jul 25 '14 at 11:06

8

There they are:

$ diff file1 file2
1d0
< apples
2a2
> apples
$ diff file2 file1
1d0
< oranges
2a2
> oranges

answered Jul 25 '14 at 11:06

user78677

89

score 8 · Answer 4 · edited Apr 13 '17 at 12:36

The standard (old) output format will display the difference between the files without surrounding text with areas where the files differ.

For example: 1d0 < (delete) means the apples needs to be removed from the 1st line of file1, and 2a2 > (append) means the apples needs to be added into file2 on the 2nd line, so both files can be matched.

Documentation available at info diff explains it further more:

Showing Differences Without Context

The "normal" diff output format shows each hunk of differences without any surrounding context. Sometimes such output is the clearest way to see how lines have changed, without the clutter of nearby unchanged lines (although you can get similar results with the context or unified formats by using 0 lines of context). However, this format is no longer widely used for sending out patches; for that purpose, the context format and the unified format are superior. Normal format is the default for compatibility with older versions of diff and the POSIX standard. Use the --normal option to select this output format explicitly.

Detailed Description of Normal Format

The normal output format consists of one or more hunks of differences; each hunk shows one area where the files differ. Normal format hunks look like this:
 CHANGE-COMMAND
 < FROM-FILE-LINE
 < FROM-FILE-LINE...
 ---
 > TO-FILE-LINE
 > TO-FILE-LINE...
There are three types of change commands. Each consists of a line number or comma-separated range of lines in the first file, a single character indicating the kind of change to make, and a line number or comma-separated range of lines in the second file. All line numbers are the original line numbers in each file. The types of change commands are:

LaR Add the lines in range R of the second file after line L of the first file. For example, 8a12,15 means append lines 12-15 of file 2 after line 8 of file 1; or, if changing file 2 into file 1, delete lines 12-15 of file 2.

FcT Replace the lines in range F of the first file with lines in range T of the second file. This is like a combined add and delete, but more compact. For example, 5,7c8,10 means change lines 5-7 of file 1 to read as lines 8-10 of file 2; or, if changing file 2 into file 1, change lines 8-10 of file 2 to read as lines 5-7 of file 1.

RdL Delete the lines in range R from the first file; line L is where they would have appeared in the second file had they not been deleted. For example, 5,7d3 means delete lines 5-7 of file 1; or, if changing file 2 into file 1, append lines 5-7 of file 1 after line 3 of file 2.

Result of diff two files with switched lines says missing the same line twice

4 Answers4