31

I am trying to understand the linux diff command on two files whose lines are just permutation of each other but not able to grok the output that it generates. Consider the three commands below:

[myPrompt]$ cat file1
apples
oranges
[myPrompt]$ cat file2 
oranges
apples
[myPrompt]$ diff file1 file2
1d0
< apples
2a2
> apples

Can someone explain the above cryptic output from diff.

  1. Why there is no mention of "oranges" at all in the output?
  2. What does 1d0 and 2a2 mean?

I understand from this answer that :

"<" means the line is missing in file2 and ">" means the line is missing in file1

BUT that doesn't explain why oranges is missing in the output.

Geek
  • 6,688
  • 13
    Because oranges is the largest common part between the two files, so what you obtain is the shortest way to express the differences between the two. – Stéphane Chazelas Jul 25 '14 at 10:30
  • 11
    And if you want more readable output, just use diff -u file1 file2 instead. That's called "unified diff" format. The original diff format was meant to be very compact, but unified diffs are meant to be much more readable. – godlygeek Jul 25 '14 at 11:43
  • 5
    @godlygeek Or diff -y file1 file2 – user80551 Jul 25 '14 at 12:44

4 Answers4

28

To understand the report, remember that diff is prescriptive, describing what changes need to be made to the first file (file1) to make it the same as the second file (file2).

Specifically, the d in 1d0 means delete and the a in 2a2 means add.

Thus:

  • 1d0 means line 1 must be deleted in file1 (apples). 0 in 1d0 means line 0 is where they would have appeared in the second file (file2) had they not been deleted. That means when changing file2 to file1 (backwards) append line 1 of file1 after line 0 of file2.
  • 2a2 means append the second line (oranges) from file2 to the now second line of file1 (after deleting the first line in file1, oranges switched to line 1)
chaos
  • 48,171
14

Consider these files:

file1:

# cat file1
apples
pears
oranges
peaches

file2:

# cat file2
oranges
apples
peaches
ananas
banana

How diff works, given it is order-based:

  1. diff reads the first block of lines of file1 and file2, and tries to find equal lines:

      file1        file2        differences on left (<) or right side (>)
      apples                   <apples
      pears                    <pears 
      -------------------------------
    ->oranges    ->oranges
      peaches      apples
                   peaches
                   ananas
                   banana
    
  2. Now it will skip all lines that are equal in both files, which is just oranges in this case:

      file1        file2        differences on left (<) or right side (>)
      apples                   <apples
      pears                    <pears 
      oranges      oranges
      -------------------------------
    ->peaches    ->apples
                   peaches
                   ananas
                   banana
    
  3. Now find another set of similar lines and print out differences:

      file1        file2        differences on left (<) or right side (>)
      apples                   <apples
      pears                    <pears 
      oranges      oranges
                   apples      >apples
      -------------------------------
    ->peaches    ->peaches
                   ananas
                   banana
    
  4. Skip the similar lines

      file1        file2        differences on left (<) or right side (>)
      apples                   <apples
      pears                    <pears 
      oranges      oranges
                   apples      >apples
      peaches      peaches
      -------------------------------
    ->           ->ananas
                   banana
    
  5. Find identical lines, if possible, and print differences:

    line_file1    file1    line_file2    file2        differences on left (<) or right side (>)
             1    apples                              <apples 
             2    pears                               <pears 
             3    oranges           1    oranges
                                    2    apples       >apples
             4    peaches           3    peaches
                                    4    ananas       >ananas
                                    5    banana       >banana
             -----------------------------------------------
    

Now if I do diff file1 file2:

# diff file1 file2
1,2d0
< apples
< pears
3a2
> apples
4a4,5
> ananas
> banana

Now it is simple to explain what diff's output means:

To make file1 equal to file2:

  • 1,2d0: Delete (d) lines 1-2 from file1 and modify line 0 of file2 accordingly
  • 3a2: Append (a) to line 3 of file1 line 2 of file2
  • 4a4,5: Append to line 4 of file1 lines 4-5 of file2

diff compares file1 with file2 line by line and settles differences in temporary memory. After making file1 equal to file2 until the first occurrence of a line in file1, which also occurs in file2, all lines that are equal up until a difference are not mentioned, often indicated as ---. In this case there is only one similar line, which is oranges. Note that I said file1 equal to file2, so file1 is viewed relative to file2 and not the other way around.

The output is in relation to the first file given, in this case file1.

polym
  • 10,852
  • 2
    I don't like the initial explanation: apples occurs in both files just as well. – O. R. Mapper Jul 25 '14 at 10:57
  • 1
    @O.R.Mapper I changed the explanation. Does it sound more clear/better now :)? – polym Jul 25 '14 at 12:06
  • Not quite, for now you wrote "there is only one similar line, which is oranges". Wrong: There are actually two lines, which are not only similar, but absolutely identical. One of them reads oranges, the other one reads apples. Also, your explanation (purely order-based) is in contradiction to Stéphane's comment on the question (length-based) - who is correct? – O. R. Mapper Jul 25 '14 at 12:09
  • @O.R.Mapper You forgot "In this case" and the lines before that. I meant it in this step there is only one similar line. I will just add an example to my answer so that it can be understood better. – polym Jul 25 '14 at 12:30
  • 1
    @O.R.Mapper Also can you give me an example that shows that the length-based answer is correct? – polym Jul 25 '14 at 12:52
  • I read your answer several times, and I always instinctively understood in this case as "in this case, where there are two identical lines in the two files, apples and oranges. As for the length-based answer, where do I claim that it is correct? I am merely pointing out that the length-based explanation (from a comment that was upvoted six times and not disputed a single time) is in contrast to your explanation, so I asked which one is correct. – O. R. Mapper Jul 25 '14 at 12:56
  • @O.R.Mapper With my example, is it now clear? If it is still unclear, please just edit my answer to make it clear. – polym Jul 25 '14 at 13:37
8

There they are:

$ diff file1 file2
1d0
< apples
2a2
> apples
$ diff file2 file1
1d0
< oranges
2a2
> oranges
8

The standard (old) output format will display the difference between the files without surrounding text with areas where the files differ.

For example: 1d0 < (delete) means the apples needs to be removed from the 1st line of file1, and 2a2 > (append) means the apples needs to be added into file2 on the 2nd line, so both files can be matched.

Documentation available at info diff explains it further more:

Showing Differences Without Context

The "normal" diff output format shows each hunk of differences without any surrounding context. Sometimes such output is the clearest way to see how lines have changed, without the clutter of nearby unchanged lines (although you can get similar results with the context or unified formats by using 0 lines of context). However, this format is no longer widely used for sending out patches; for that purpose, the context format and the unified format are superior. Normal format is the default for compatibility with older versions of diff and the POSIX standard. Use the --normal option to select this output format explicitly.

Detailed Description of Normal Format

The normal output format consists of one or more hunks of differences; each hunk shows one area where the files differ. Normal format hunks look like this:

 CHANGE-COMMAND
 < FROM-FILE-LINE
 < FROM-FILE-LINE...
 ---
 > TO-FILE-LINE
 > TO-FILE-LINE...

There are three types of change commands. Each consists of a line number or comma-separated range of lines in the first file, a single character indicating the kind of change to make, and a line number or comma-separated range of lines in the second file. All line numbers are the original line numbers in each file. The types of change commands are:

LaR Add the lines in range R of the second file after line L of the first file. For example, 8a12,15 means append lines 12-15 of file 2 after line 8 of file 1; or, if changing file 2 into file 1, delete lines 12-15 of file 2.

FcT Replace the lines in range F of the first file with lines in range T of the second file. This is like a combined add and delete, but more compact. For example, 5,7c8,10 means change lines 5-7 of file 1 to read as lines 8-10 of file 2; or, if changing file 2 into file 1, change lines 8-10 of file 2 to read as lines 5-7 of file 1.

RdL Delete the lines in range R from the first file; line L is where they would have appeared in the second file had they not been deleted. For example, 5,7d3 means delete lines 5-7 of file 1; or, if changing file 2 into file 1, append lines 5-7 of file 1 after line 3 of file 2.

See also:


So to see the oranges, you would have to diff it either by side by side or by using unified context.

In example:

$ diff -y file1 file2
apples                                <
oranges                             oranges
                                  > apples

$ diff -u file1 file2
@@ -1,2 +1,2 @@
-apples
 oranges
+apples
kenorb
  • 20,988