How to compare two files and if found equal ask the user to delete duplicate file by using shell script?

Question

I am learning linux and was given this problem as my homework but i can't solve this that how we can compare two files content in shell mode. (Here, we can assume that both files having text content like this for eg. $cat > f1 this is file 1)

$ cat duplicate_file.sh
echo "Enter file 1:"
read file1
echo "Enter file 2:"
read file2
cmp $file1 $file2 > newfile
x=` wc newfile | cut -d" " -f2 `
if [` $x -eq 0 `]
then
rm -i $file2
fi

I made this program but this ain't working!! So, Any suggestions??

score 6 · Answer 1 · edited Sep 21 '22 at 11:54

The immediate issue in your code is the syntax error on the line reading

if [` $x -eq 0 `]

A space character must separate the [ and ] from the arguments within. Also, the command substitution on this line, `$x -eq 0`, is nonsensical as it would try to run the value of $x as a command.

You also have issues with the non-quoting of your variable expansions, which disqualifies your script from working on filenames containing whitespace characters and filename globbing patterns.

The script also unconditionally clobbers the file newfile needlessly (and would fail if newfile was the name of an existing directory) and lacks a #!-line.

There is no point in asking the user interactively for file paths. It would be better for the user to be able to make use of the shell's filename completion on the command line and provide the pathnames to the files there as two operands:

$ ./script.sh some/path/file1 some/other/path/file2

If running the script in this way, the two pathnames will be available inside the script as "$1" and "$2".

The cmp utility can be used in this script without creating a temporary file. Instead of redirecting its output, make it quiet by using its -s option (for "silent") and use its exit status to determine if the two files were identical or not.

The script would look like

#!/bin/sh
if cmp -s -- "$1" "$2"; then
    rm -i -- "$2"
fi

Or, shorter,

#!/bin/sh
cmp -s -- "$1" "$2" && rm -i -- "$2"

This would call rm -i on the second of the two given pathnames if it referred to a file with identical contents as the first pathname. The -- in the cmp and rm commands is necessary to avoid interpreting a filename starting with a dash as a set of options.

The issue with this script, as with your own script, is that if you give it the same pathname twice, i.e. you compare a file against itself, it will offer to remove it.

Therefore, we also need to ensure that the two pathnames refer to two different files.

You can do that by comparing the two pathname strings with each other:

#!/bin/sh
if [ "$1" != "$2" ] && cmp -s -- "$1" "$2"; then
    rm -i -- "$2"
fi

This may be enough for some applications but does not consider symbolic links or files specified using different paths (such as ./file vs file vs /path/to/file). In most shells, you can also use the non-standard (yet) -ef test ("equal file"), which tests whether two pathnames refer to the same file (same inode number and device, so also returns true for two hardlinks to the same file):

#!/bin/bash
if ! [ "$1" -ef "$2" ] && cmp -s -- "$1" "$2"; then
    rm -i -- "$2"
fi

or,

#!/bin/bash
! [ "$1" -ef "$2" ] && cmp -s -- "$1" "$2" && rm -i -- "$2"

And with some sanity checks (also moving the -ef test to the sanity checks section):

#!/bin/bash
if [ "$#" -ne 2 ]; then
    # did not get exactly two arguments
    printf 'Usage:\n\t%s file1 file2\n' "$0" >&2
    exit 1
elif [ ! -f "$1" ] || [ ! -f "$2" ]; then
    echo 'One of the files does not exist (or is not a regular file)' >&2
    exit 1
elif [ "$1" -ef "$2" ]; then
    printf '%s and %s refer to the same file\n' "$1" "$2" >&2
    exit 1
fi
cmp -s -- "$1" "$2" && rm -i -- "$2"

Note that quoting the variable expansions is important since it's not uncommon for pathnames to contain spaces (on macOS, this is very common). Double quoting variable expansions also stops them from being interpreted as shell globbing patterns (your code would, for example, not work on a file called *). Also, note the use of an appropriate #!-line for the script.

If your homework assignment requires you to read the pathnames of the two files interactively, then do that with read -r and with IFS set to an empty string. This would allow you to read pathnames starting or ending with space or tab characters and containing \ characters (but you still won't be able to specify pathnames containing newline characters):

#!/bin/bash
IFS= read -p '1st pathname: ' -r p1
IFS= read -p '2nd pathname: ' -r p2
if [ ! -f "$p1" ] || [ ! -f "$p2" ]; then
    echo 'One of the files does not exist (or is not a regular file)' >&2
    exit 1
elif [ "$p1" -ef "$p2" ]; then
    printf '%s and %s refer to the same file\n' "$p1" "$p2" >&2
    exit 1
fi
cmp -s -- "$p1" "$p2" && rm -i -- "$p2"

if [ -s "$pathname" ]; then
    printf '%s has non-zero size\n' "$pathname"
else
    printf '%s is empty (or does not exist)\n' "$pathname"
fi

See man test on your system, or refer to the POSIX standard for this utility.

@iBug Not really. To be able to help someone with coding issues, one would need to see the code. Then, it's a matter of picking it apart and suggesting better alternatives. I don't see any other way to do it. If you call it code review, then maybe it is a code review. I'm calling it answering a question. — Kusalananda, Nov 10 '18 at 14:07
Beware a - argument to cmp is interpreted as stdin instead of the file called -. To compare files called -, they need to be passed as ./- or any other path to that file that is not -. — Stéphane Chazelas, Sep 21 '22 at 11:56

score 3 · Answer 2 · edited Nov 10 '18 at 08:19

First include shebang #! at the top like #!/bin/bash

You are having two errors:

Instead of

cmp $file1 $file2 > newfile,

it should be

cmp -- "$file1" "$file2" > newfile

as these values of these variables may have spaces, tabs, newline (characters of $IFS), *, [, ? (wildcard characters) in them or may start with -.

Second error:

Instead of

if [` $x -eq 0 `]

it should be

if [ "$x" -eq 0 ].

Otherwise you will get error

bash: 0: command not found.

Also if you are having whitespace or wildcards in the file names then it should be:

rm -i -- "$file2" otherwise it can delete multiple files.

score 0 · Answer 3 · edited Jul 12 '20 at 10:52

0

There are many ways to solve this, but I'll go with what you've started.

First, don't forget the lead off the script with the interpreter string ("shebang"):

#!/bin/bash
echo "Enter file 1:"
read file1
echo "Enter file 2:"
read file2
cmp $file1 $file2 > newfile

At this point you could test a couple of things:

if newfile is not empty the files differ

if [ ! -s newfile ]; then
  rm -i $file2
fi

Test the exit code for cmp operation. If it is 0, the files match.
```
if [ `echo $?` == 0 ]; then
  rm -i $file2
fi
```

Also, your wc command isn't quite working. Try running it outside of the script. Do you get the result you're expecting?

edited Jul 12 '20 at 10:52

answered Nov 10 '18 at 07:32

kevlinux

389

I think wc command is working, when I tested it, the problem is in cmp $file1 $file2 > newfile command. – Prvt_Yadav Nov 10 '18 at 07:48
1

Why not test with if cmp -s -- "$file1" "$file2"; then? No point in the temporary file really... – Kusalananda Nov 10 '18 at 09:49

score 0 · Answer 4 · edited Jul 11 '20 at 21:04

Here is my implementation.

#! /bin/bash
echo -n "Enter file1: " 
read file1
echo -n "Enter file2: " 
read file2
if cmp -s -- "$file1" "$file2"
then
      echo same 
      rm -i -- "$file2"
else 
      echo different
fi

I have 3 files: var1.txt, var2.txt, var3.txt.

var1 is different that var2
var2 is the same as var3

Running the above script (com.sh) against those files results in:

$ bash com.sh
Enter file1: var1.txt 
Enter file2: var2.txt 
different
$ bash com.sh
Enter file1: var2.txt 
Enter file2: var3.txt 
same
rm: remove regular file 'var3.txt'?

score -1 · Answer 5 · answered May 02 '20 at 05:50

-1

The following code will check whether theoutput is null or blank. If it's blank then files are same, else they're different.

output=`echo | cmp -b $file1 $file2`

if [[ -z $output ]]
then
    echo "Same"
    rm $file2
else
    echo "Diff"    
fi

answered May 02 '20 at 05:50

Harshil Modi

101

How to compare two files and if found equal ask the user to delete duplicate file by using shell script?

5 Answers5