9

I have an assignment for school. One part of it is to check a file for changes and write these changes to a log file. So far I've found the diff command which could be helpful in my opinion. Let's say I have two files with content like this:

file1

orange
apple

file2

orange
apple
strawberry

If I use diff -c file1 file2 in this case, the output of the command is

*** file1   2016-11-24 08:31:19.424712242 +0100
--- file2   2016-11-24 08:25:24.604681751 +0100
***************
*** 1,2 ****
--- 1,3 ----
  orange
  apple
+ strawberry

which I think says that line with '+' sign needs to be added to file1 for them to be the same(?).

Now let's say I change file1 to this:

orange
apple
peach

The output of diff -c file1 file2 is:

*** file1   2016-11-24 08:34:50.647128312 +0100
--- file2   2016-11-24 08:25:24.604681751 +0100
***************
*** 1,3 ****
  orange
  apple
! peach
--- 1,3 ----
  orange
  apple
! strawberry

And here I'm lost, because I don't understand what these exclamation marks mean. Suddenly, the diff command seems not so helpful. I've tried looking at the man page of diff command, but can't find anything (maybe I just don't see it).

psmears
  • 465
  • 3
  • 8
Denco
  • 101

3 Answers3

11
  • diff -u

may be what you need for your assignment.

To take your example and using diff -u:

michael@x071:[/home/michael]diff -u file?
--- file1       2016-11-24 07:48:41 +0000
+++ file2       2016-11-24 07:48:57 +0000
@@ -1,3 +1,3 @@
 orange
 apple
-peach
+strawberry

A word of advice - RTM - or - Read The Manual. There are often other options. FYI: the historic options of diff (and diff3 when comparing three files) were to assist with creating "program inout" that would change file1 into file2 (or file2 back into file1). This has been the base of all "version control" software.

The diff options I remember from long ago:

  • -e : Produces output in a form suitable for use with the ed editor to convert File1 to File2.
  • -f : Produces output in a form not suitable for use with the ed editor, showing the modifications necessary to convert File1 to File2 in the reverse order of that produced under the -e flag.
  • -n : Produces output similar to that of the -e flag, but in the opposite order and with a count of changed lines on each insert or delete command. This is the form used by the revision control system (RCS).

The last option I will highlight is a "new" one - relatively speaking. (also several years old but was often not in POSIX implementations). Rather than creating output suitable for 'ed' of 'RCS', this is suitable for patch:

  • -u : Produces a diff command comparison with three lines of unified context. The output is similar to that of the -c flag, except that the context lines are not repeated; instead, the context, deleted, and added lines are shown together, interleaved.

IMHO: the key value of diff -c is as an improvement over the command cmp - when you want to know more than ONLY if two files differ, or not. I had never paid attention (maybe it is a "new" option as well) - but shall think about it when my question is a recursive search for files that differ between two directory trees.

Toby Speight
  • 8,678
Michael Felt
  • 1,218
  • Seems good too. I'll check how could I process that output. Thanks. – Denco Nov 24 '16 at 08:32
  • When I started, -c was widely available, and -u was gradually making inroads. I remember when I first got access to unified diffs; it felt like a great privilege was bestowed upon me! – Toby Speight Nov 18 '22 at 19:04
8

Your question is answered in diff's Info file, node Detailed Context:

The lines of context around the lines that differ start with two space characters. The lines that differ between the two files start with one of the following indicator characters, followed by a space character:

  • !

    A line that is part of a group of one or more lines that changed between the two files. There is a corresponding group of lines marked with ! in the part of this hunk for the other file.

  • +

    An "inserted" line in the second file that corresponds to nothing in the first file.

  • -

    A "deleted" line in the first file that corresponds to nothing in the second file.

The Info file has plenty of information about output formats, including header lines. I recommend you read through it again.

Toby Speight
  • 8,678
5

The output of diff is formed of chunks, each chunk corresponding to a set of changes. The *************** line marks the start of such a chunk.

Each chunk gives you the context in the files. *** 1,3 **** means that what follows are line 1 to 3 in the first file, while --- 1,3 ---- means that what follows are line 1 to 3 in the second file.

A minus sign - in the first column denotes lines that have been deleted, and a plus sign + marks lines that have been added. An exclamation point ! marks lines that have changed.

If your case, peach in the first file has been changed to strawberry in the second.

Stephen Kitt
  • 434,908
Satō Katsura
  • 13,368
  • 2
  • 31
  • 50
  • Ou, I see. How about this: The same case - when I put one more line into file2 - let's say and other fruit. diff puts an exclamation mark for this line. Why? file2 now has one more line (so 3 lines in file1 and 4 lines in file2). Why is there not a + sign? – Denco Nov 24 '16 at 08:30
  • @Denco That depends on how diff computes the differences. The representation of changes is not unique, in general there are several ways to describe the differences between files. Most diff implementations have more than one algorithms for it, and they can produce different results. – Satō Katsura Nov 24 '16 at 08:37
  • Please read the man page - snip: on -c: The lines removed from File1 are marked with a - (minus sign ) and those added to File2 are marked with a + (plus sign). Lines changed from one file to the other are marked in both files with an ! (exclamation point). – Michael Felt Nov 24 '16 at 08:39
  • @SatoKatsura So I guess I can't count on -c option. But as MichaelFelt suggested, -u did well on the mentioned example. – Denco Nov 24 '16 at 08:51
  • @Denco You can count on the -c option once you come to understand it. It's useful, many projects prefer it to -u, presumably because the maintainers find the result more readable. – Satō Katsura Nov 24 '16 at 10:41
  • @Denco If you call diff without any options[diff file1 file2], the first line of output will be 3c3,4 which means change [c] line 3 in the first file to lines 3 to [,] 4 in the second file. That's why there's no + in the -c version. It seems like diff will group consecutive differing lines together for changes. If you insert tangerine at the end of file1 and in between strawberry and and other fruit in file2, you'll see the + on the and other fruit line because there's a matching line in between peach and add another fruit. – Levi Uzodike Jan 06 '20 at 17:12