diff command compares to see any difference betwenn two files. Can the same be used to compare two zip files, i.e if there is any difference in data ,like counts etc in individual files in the zipped files?
5 Answers
You will have to unzip them (if only in memory) to compare the two. A cool way I have seen to do this with diff
is:
diff -y <(unzip -l file1.zip) <(unzip -l file2.zip)
That will show you if there are any files contained in one and not the other

- 565
-
5I was thinking about the same, but adding
-qql
instead of-l
to suppress some noise , and sorting by filename at the end| sort -k4
– guido Jun 29 '18 at 17:29 -
1
-
2this compares the number of files. But what about the content inside the files? – UnixPhile Jun 29 '18 at 17:42
-
I would unzip the zip files and create two arrays with the files. Then I would use something like
cmp
to compare nth element of arr1 to nth element of arr2 – Jaken551 Jun 29 '18 at 17:47 -
2@UnixPhile
diff -y --suppress-common-lines -W 333 <(unzip -lqq file1.zip | sort -k 4) <(unzip -lqq /file2.zip | sort -k 4)
will suppress same entries and show missed/extra files as well as files different in size or timestamp. -W is about width, should be reasonably big for looong/path/to/files. Comparing by content will take more effort, let me know if that's really required. – Tagwint Jun 29 '18 at 18:14 -
Yes content comparison is also required, that will be a big saviour looking at the number of files to compare – UnixPhile Jun 29 '18 at 19:27
-
@UnixPhile see my answer here https://unix.stackexchange.com/a/578867/49642 – Ian Apr 09 '20 at 03:50
I posted the longer explanation at "diff files inside of zip without extracting it" but if you want to compare the contents of the files within the zipfile and ignore all the metadata (timestamps in particular) then you should run:
diff \
<(unzip -vqq file1.zip | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3) \
<(unzip -vqq file2.zip | awk '{$2=""; $3=""; $4=""; $5=""; $6=""; print}' | sort -k3)

- 263
- 3
- 7
One option to compare ZIP files and directories is to use zipcmp as mentioned in another post.
zipcmp works as longs as the ZIP archives uses the same directory structure.
If you need to compare 2 ZIP archives with the same files, but in one archive the files are contained in an additional subdirectory, zipcmp flags all files as modified, which can be a problem. So zipcmp can only used to verify if the contents are exactly the same.
I created folderdiff because there are some use cases where you want to compare a backup of you web application with a trusted source.
For example, if you want to find modified files (e.g. possible backdoors) you can use folderdiff for this task and can be used with ZIP archives and folders.
Example:
Wordpress uses a ZIP archive, where the files in the archive are stored in a folder called wordpress
. In this example the files are extracted to /var/www/
but without the wordpress
subfolder.
folderdiff can ignore the wordpress/
folder from the installation archive, with the --prefix
argument and lists only webshell.php
and index.php
as different files.
$ folderdiff wordpress-6.0.3-de_AT.zip backup.zip --prefix wordpress/
===================== Added ======================
+ webshell.php
==================== Modified ====================
* index.php
zipcmp is not able to compare the files, because of the different path used in the archive.

- 2,510

- 151
You could try the following:
- Create a repository with some version control system (e.g. Git)
- Unzip the first zip file
- Commit the current contents of the repository
- Empty the contents of the repository (except its metadata, e.g. the .git folder)
- Unzip the second zip file
- Run a diff (e.g. git diff)

- 147
- 1
- 9
Checksums is the proper way.
diff <(md5sum file1.zip | cut -f1 -d ' ') <(md5sum file2.zip | cut -f1 -d ' ')

- 11
-
1This doesn't work, as if you add the same files to two zips files, but the file timestamps are different, then the file hash won't match. – David Roussel Jun 08 '22 at 14:36
-
1If file1.zip and file2.zip does not use the same compression ratio, and have other differing options, the result would be invalid. Comparing zip files contents needs to uncompress the files before using md5sum on the uncompressed contents. – Biapy Oct 26 '22 at 15:13
-
You might as well skip the MD5 hashsum computation and just run cmp directly on those files. – Cristian Ciupitu Feb 06 '24 at 20:04