32

How to find two files matched data in shell script and duplicate data store in another file in shell?

#!/bin/bash

file1="/home/vekomy/santhosh/bigfiles.txt" file2="/home/vekomy/santhosh/bigfile2.txt"

while read -r $file1; do while read -r $file2 ;do if [$file1==$file2] ; then echo "two files are same" else echo "two files content different" fi done done

I written code but it didn't work. How to write it?

3 Answers3

62

To just test whether two files are the same, use cmp -s:

#!/bin/bash

file1="/home/vekomy/santhosh/bigfiles.txt"
file2="/home/vekomy/santhosh/bigfile2.txt"

if cmp -s "$file1" "$file2"; then
    printf 'The file "%s" is the same as "%s"\n' "$file1" "$file2"
else
    printf 'The file "%s" is different from "%s"\n' "$file1" "$file2"
fi

The -s flag to cmp will make the utility "silent". The exit status of cmp will be zero when comparing two files that are identical. This is used in the code above to print out a message about whether the two files are identical or not.


If your two input files contains list of pathnames of files that you wish to compare, then use a double loop like so:

#!/bin/bash

filelist1="/home/vekomy/santhosh/bigfiles.txt"
filelist2="/home/vekomy/santhosh/bigfile2.txt"

mapfile -t files1 <"$filelist1"

while IFS= read -r file2; do
    for file1 in "${files1[@]}"; do
        if cmp -s "$file1" "$file2"; then
            printf 'The file "%s" is the same as "%s"\n' "$file1" "$file2"
        fi
    done
done <"$filelist2" | tee file-comparison.out

Here, the result is produced on both the terminal and in the file file-comparison.out.

It is assumed that no pathname in the two input files contain any embedded newlines.

The code first reads all pathnames from one of the files into an array, files1, using mapfile. I do this to avoid having to read that file more than once, as we will have to go through all those pathnames for each pathname in the other file. You will notice that instead of reading from $filelist1 in the inner loop, I just iterate over the names in the files1 array.

Kusalananda
  • 333,661
26

The easiest way is to use the command diff.

example:

let's suppose the first file is file1.txt and he contains:

I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.`

and the second file file2.txt

I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.

then we can use diff to automatically display for us which lines differ between the two files with this command:

diff file1.txt file2.txt

and the output will be:

 2,4c2,4
 < I need to run the laundry.
 < I need to wash the dog.
 < I need to get the car detailed.
 ---
 > I need to do the laundry
 > I need to wash the car.
 > I need to get the dog detailed.

Let's take a look at what this output means. The important thing to remember is that when diff is describing these differences to you, it's doing so in a prescriptive context: it's telling you how to change the first file to make it match the second file. The first line of the diff output will contain:

  • line numbers corresponding to the first file,
  • a letter (a for add, c for change, or d for delete)
  • line numbers corresponding to the second file.

In our output above, "2,4c2,4" means: "Lines 2 through 4 in the first file need to be changed to match lines 2 through 4 in the second file." It then tells us what those lines are in each file:

  • Lines preceded by a < are lines from the first file;
  • lines preceded by > are lines from the second file.
  • The three dashes ("---") merely separate the lines of file 1 and file 2.

Source

Kingofkech
  • 1,028
-2

Here is a pure bash shell script to compare files:

#!/usr/bin/env bash

# @(#) s1       Demonstrate rudimentary diff using shell only.

# Infrastructure details, environment, debug commands for forum posts.
# Uncomment export command to run as external user: not context, pass-fail.
# export PATH="/usr/local/bin:/usr/bin:/bin"
set +o nounset
LC_ALL=C ; LANG=C ; export LC_ALL LANG
pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
pl() { pe;pe "-----" ;pe "$*"; }
db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
db() { : ; }
C=$HOME/bin/context && [ -f "$C" ] && $C
set -o nounset

FILE1=${1-data1}
shift
FILE2=${1-data2}

# Display samples of data files.
pl " Data files:"
head "$FILE1" "$FILE2"

# Set file descriptors.
exec 3<"$FILE1"
exec 4<"$FILE2"

# Code based on:
# http://www.linuxjournal.com/content/reading-multiple-files-bash

# Section 2, solution.
pl " Results:"

eof1=0
eof2=0
count1=0
count2=0
while [[ $eof1 -eq 0 || $eof2 -eq 0 ]]
do
  if read a <&3; then
    let count1++
    # printf "%s, line %d: %s\n" $FILE1 $count1 "$a"
  else
    eof1=1
  fi
  if read b <&4; then
    let count2++
    # printf "%s, line %d: %s\n" $FILE2 $count2 "$b"
  else
    eof2=1
  fi
  if [ "$a" != "$b" ]
  then
    echo " File $FILE1 and $FILE2 differ at lines $count1, $count2:"
    pe "$a"
    pe "$b"
    # exit 1
  fi
done

exit 0

producing:

$ ./s1

Environment: LC_ALL = C, LANG = C
(Versions displayed with local utility "version")
OS, ker|rel, machine: Linux, 3.16.0-4-amd64, x86_64
Distribution        : Debian 8.9 (jessie) 
bash GNU bash 4.3.30

-----
 Data files:
==> data1 <==
I need to buy apples.
I need to run the laundry.
I need to wash the dog.
I need to get the car detailed.

==> data2 <==
I need to buy apples.
I need to do the laundry.
I need to wash the car.
I need to get the dog detailed.

-----
 Results:
 File data1 and data2 differ at lines 2, 2:
I need to run the laundry.
I need to do the laundry.
 File data1 and data2 differ at lines 3, 3:
I need to wash the dog.
I need to wash the car.
 File data1 and data2 differ at lines 4, 4:
I need to get the car detailed.
I need to get the dog detailed.

The comments on specific commands can be removed to exit at the first difference seen, and if you desire to see every line that is read.

See page at http://www.linuxjournal.com/content/reading-multiple-files-bash for details on file descriptors such as "&3".

Best wishes ... cheers, drl

drl
  • 838
  • 1
    head is an external utility, and what is $HOME/bin/context? And what do the comments mean at the top? – Kusalananda Oct 12 '17 at 16:36
  • Head displays the input. It does not play apart in the differencing. As with some other items "context" is local to show the environment context. By including that, we don't have to discuss whether versions of OSs and utilitiies differ. – drl Oct 12 '17 at 16:45
  • There was an export missing, thanks for noticing that. – drl Oct 12 '17 at 16:52
  • 1
    I still don't understand the comment. What's an "external user", and why would you want to set the path for a script that is pure bash? – Kusalananda Oct 12 '17 at 16:53
  • We write code for our shop, so the path settings may differ for external users. We add that if it appears necessary to omit our settings. This is a template that is modified to display the information about the environment in which the code was executed. If it would be transformed into a production code, say for clients, instead of a demo code, we'd want to be sure that none of our local paths are used.The command line for context is designed so that if that file is not found nothing will happen, not even an error, but no versions would be listed. – drl Oct 12 '17 at 17:00