59

I have two log files with thousands of lines. After pre-processing, only some lines differ. These remaining lines are either real differences, or shuffled groups of lines.

Unified diffs allow me to see the detailed differences, but it makes manual comparison with eyeballs hard. Side-by-side diffs seems more useful for comparison, but it also adds thousands of unchanged lines. Is there a way to get the advantage of both worlds?

Note, these log files are generated by xscope which is a program that monitors Xorg protocol data. I am looking for general-purpose tools that can be applied to situations similar to the above, not specialized webserver access log analysis tools for example.


Two example log files are available at http://lekensteyn.nl/files/qemu-sdl-debug/ (log13 and log14). A pre-processor command can be found in the xscope-filter file which removes timestamps and other minor details.

AdminBee
  • 22,803
Lekensteyn
  • 20,830
  • 8
    Does your diff have --suppress-common-lines option? http://pastebin.com/KZrVCNFR – manatwork Jun 12 '13 at 10:46
  • 1
    @manatwork Nice, it does. Any way to add more context (e.g. line numbers)? – Lekensteyn Jun 12 '13 at 10:50
  • 5
    Then maybe vimdiff (from the vim package) would serve your needs better: parallel display, colorized, common lines folded. Line numbers can be turned on with :set number. – manatwork Jun 12 '13 at 11:07
  • I think you should put vimdiff up as an answer :) – Kotte Jun 12 '13 at 14:03
  • Are GUI tools in the running? I love KDE's kompare for this purpose. – depquid Jun 12 '13 at 14:18
  • 1
    CLI tools are preferred, but GUI tools are also allowed if they are tiny enough. I have tried kdiff3, but it still produced to much detail. Ideally, I don't see all unnecessary detail. I'll attach two data sets. – Lekensteyn Jun 12 '13 at 15:30

6 Answers6

47

The 2 diff tools I use the most would be meld and sdiff.

meld

Meld is a GUI but does a great job in showing diffs between files. It's geared more for software development with features such as the ability to move changes from one side to the other to merge changes but can be used as just a straight side-by-side diffing tool.

    ss of meld

    ss of meld code highlighting

sdiff

I've used this tool for years. I generally run it with the following switches:

$ sdiff -bBWs file1 file2
  • -b Ignore changes in the amount of white space.
  • -W Ignore all white space.
  • -B Ignore changes whose lines are all blank.
  • -s Do not output common lines.

Often with log files you'll need to make the width of the columns wider, you can use -w <num> to make the screen wider.

other tools that I use off and on

diffc

Diffc is a python script which colorizes unified diff output.

$ diffc [OPTION] FILE1 FILE2

             ss of diffc

vimdiff

Vimdiff is probably as good if not better than meld and it can be run from a terminal. I always forget to use it though which, to me, is a good indicator that I find the tool just a little to tough to use day to day. But YMMV.

                                    ss of vimdiff

slm
  • 369,824
  • 1
    One great feature of Meld, unfortunately not visible on your screenshot, is syntax highlighting of source code files. – manatwork Jun 12 '13 at 14:35
  • Yes. I used to use vimdiff all the time, I've since moved to using meld, I find it easier to use and it's just easier to see what it's telling me vs. vimdiff. – slm Jun 12 '13 at 14:42
  • @manatwork - added your link to the answer, thanks for the feedback! – slm Jun 12 '13 at 14:51
  • 1
    Looks great for source code, but not so much for comparing log files. I often use colordiff from http://colordiff.org/ for source file. To my understanding, sdiff is similar to diff -y with no differences in output but slightly different options. +1 for showing some good alternatives to plain diff. – Lekensteyn Jun 12 '13 at 15:45
  • I've not ever used colordiff, I'll have to check it out. You're correct on the diff -y. The addition of that switch to diff seems to have happened at some point, or I never noticed it. Additionally here's a link to the gnu diff tools resource page. Good stuff for using this suite of tools. – slm Jun 12 '13 at 16:15
24

Currently I am using side-by-side diff with grep filtering the different lines:

diff -y -W250 log.txt log2.txt | expand | \
    grep -E -C3 '^.{123} [|<>]( |$)' | colordiff | less -rS
  • Option -W250 makes the output wider such that I can see more data.
  • expand is necessary to convert tabs to spaces
  • -C3 adds 3 lines of context to the grep output.
  • ^.{123} matches half of the data before the side-by-side diff markers.
  • colordiff makes the output prettier to follow
  • less -rS allows ANSI colors to be interpreted (-r) and prevents wrapped lines (-S).

This is a hack, alternatives are welcome.

Lekensteyn
  • 20,830
  • 1
    Nice idea. Unfortunately the grep regex is too slow. Also diff has a -t option to expand tabs. – Timmmm Jul 27 '18 at 13:48
23

Nobody mentioned icdiff yet? It's great! Pic speaks for itself: icdiff

11
diff -y --suppress-common-lines "$file1" "$file2"

would be the first thing to try. It is simple, and diff is a common Unix tool.

-y

makes it display side by side and

--suppress-common-lines 

filters out identical lines

Anton
  • 211
5

The linux "sdiff" command shows side-by-side differences, by default including all lines, but you can use various options to show only differences:

sdiff -tWBs -w $COLUMNS config.xml config.xml.original

where

-t: translate tabs to spaces

-W: ignore whitespace differences

-B: ignore blank lines

-s: ignore lines that are the same

-w $COLUMNS: use full width of screen

The lines shown will be divided by |, <, or > -- see documentation, or just try it.

0

Maybe you can try difftastic. I usually use it with --display side-by-side-show-both option.

Yun Wu
  • 1