53

Near as I can tell the zip -T option only determines if files can be extracted -- it doesn't really test the archive for internal integrity. For example, I deliberately corrupted the local (not central directory) CRC for a file, and zip didn't care at all, reporting the archive as OK. Is there some other utility to do this?

There's a lot of internal redundancy in ZIP files, and it would be nice to have a way of checking it all. Of course, normally the central directory is all you need, but when repairing a corrupted archive often all you have is a fragment, with the central directory clobbered or missing. I'd like to know if archives I create are as recoverable as possible.

3 Answers3

51

unzip -t

Test archive files.

This option extracts each specified file in memory and compares the CRC (cyclic redundancy check, an enhanced checksum) of the expanded file with the original's stored CRC value.

[ source: https://linux.die.net/man/1/unzip ]

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Theophrastus
  • 926
  • 7
  • 13
  • 3
    There are 2 CRCs per file: local and central. unzip -t only tests the latter. – Marc Rochkind Apr 18 '15 at 20:33
  • 2
    i don't know what you mean by "local" versus "central" (central to what?) but when i run "unzip -t myzip_file.zip" i see a line output for commenting on the integrity of each and every zipped file, like (imagine better formatting): "testing: AARiseTransitSet.cpp OK testing: AARiseTransitSet.h OK testing: AASaturn.cpp OK testing: AASaturn.h OK ... – Theophrastus Apr 18 '15 at 20:39
  • 4
    Not the place to explain internal structure of ZIP files. Wikepedia article is pretty good on this. As I said, it is a misleading report that you are seeing. – Marc Rochkind Apr 18 '15 at 20:55
  • If i go into a zip file with a hex-editor and change one byte, then i see for one file: testing: AA_sphere.htm bad CRC 7952862e (should be 44c6f7f8) while the rest are listed as "OK". you'll continue to declare this as "misleading", but that's exactly what i expect for a file-by-file CRC check of a zip file. now... good luck to you Sir. – Theophrastus Apr 18 '15 at 21:02
  • 3
    I think you have changed the central directory CRC, at the end. Try changing the local one, before or after the file. – Marc Rochkind Apr 18 '15 at 21:21
33

Using Info-ZIP, attempting to fix an archive will compare the local and central CRCs, and combining that with archive tests will allow all the CRCs to be checked. If you run

unzip -t archive.zip

and

zip -F archive.zip --out archivefix.zip

and neither complain, that means the archive’s contents match both the central and local CRCs. (You can delete archivefix.zip afterwards.)

To verify this, starting with the Info-ZIP source code for zip 3.0, I created a file as follows:

zip -9 test.zip zip.txt zipup.c

I then corrupted the central directory CRC for zip.txt by changing the byte at offset 0xB137. I got the opposite behaviour to what you observed; unzip -v reported the altered CRC from the central directory, but unzip -t and zip -T reported that the file was OK (checking against the local CRC).

But running

zip -F test --out testfix

reported

Fix archive (-F) - assume mostly intact archive
Zip entry offsets do not need adjusting
 copying: zip.txt
        zip warning: Local Entry CRC does not match CD: zip.txt
 copying: zipup.c

The "corrected" file still listed the altered CRC for zip.txt.

Altering the local CRC for zip.txt at offset 0x10 caused both unzip -t and zip -T to report a CRC error, but zip -F didn't spot anything wrong.

Thus from my experiments, mismatches between an archive entry's contents and its CRCs can be detected as follows:

  • local only: zip -T and unzip -t; zip -F will also complain about the local-central mismatch
  • local and central: zip -T and unzip -t
  • central only: zip -T and unzip -t will not complain, but zip -F will indicate a local-central mismatch

(Note that by default zip -T simply uses unzip -tqq, so zip -T and unzip -t really are equivalent. You can read the unzip source code to check that testing an archive really compares the local CRC, not the central one; look for extract_or_test_files(), extract_or_test_entrylist() and extract_or_test_member(), all in extract.c.)

Stephen Kitt
  • 434,908
  • Complicated. And no doubt very dependent on what versions (GNU, BSD, etc.) And CRC is only one of the numerous integrity checks that can be performed. – Marc Rochkind Apr 18 '15 at 22:55
  • 4
    There aren't many versions of zip and unzip available on Unix-like platforms; Info-ZIP is used pretty much everywhere... – Stephen Kitt Apr 18 '15 at 23:02
  • 4
    As far as it being complicated, it takes just two commands; if both unzip -t and zip -F run without error, you're OK and both CRCs have been checked. – Stephen Kitt Apr 18 '15 at 23:04
  • Thanks! Will check this out. Also, forgot to mention: ZIP files are ZIP64. – Marc Rochkind Apr 18 '15 at 23:18
  • Answers like this should be golden standard on this website: a thorough research, including experiments and digging through source code, summed up in a concise manner. – ScumCoder Dec 01 '20 at 23:30
  • I have an archive where the process described here is producing a slightly smaller zip file without "reporting any issue" as exit code or in the output. So to "check integrity this way you might want to compare the size of the produced file with the original one. – karfau Mar 03 '21 at 06:34
  • @karfau that’s interesting — does the smaller archive still contain all the files present in the original? – Stephen Kitt Mar 03 '21 at 08:52
  • @StephenKitt yes, I'm not sure what went wrong, but the file was supposed to have even more files in it and was somehow not complete when being zipped or transferred. Sadly I can not offer to share the file, but if you want me to run any commands on it and send you the output, let me know. – karfau Mar 03 '21 at 11:43
  • @karfau and unzip -t doesn’t complain? – Stephen Kitt Mar 03 '21 at 11:54
  • @StephenKitt yes, that one is not indicating any issue. – karfau Mar 03 '21 at 11:56
  • @karfau I suspect that your archive has unused data at the end — what does cmp origarchive.zip smallerarchive.zip say? – Stephen Kitt Mar 03 '21 at 13:07
  • @StephenKitt origarchive.zip smallerarchive.zip differ: byte 7, line1 – karfau Mar 03 '21 at 14:32
3

You might want to have a look at zipdetails. From its man page:

Zipdetails displays information about the internal record structure of the zip file. It is not concerned with displaying any details of the compressed data stored in the zip file.

I do not know if zipdetails will detect inconsistencies, but it should help you in finding/understanding inconsistencies. Here's a small sample from its output:

00000 LOCAL HEADER #1       04034B50
00004 Extract Zip Spec      14 '2.0'
00005 Extract OS            00 'MS-DOS'
00006 General Purpose Flag  0808
      [Bits 1-2]            0 'Normal Compression'
      [Bit  3]              1 'Streamed'
      [Bit 11]              1 'Language Encoding'
00008 Compression Method    0008 'Deflated'
0000A Last Mod Time         5352884C 'Mon Oct 18 17:02:24 2021'
0000E CRC                   00000000
00012 Compressed Length     00000000
00016 Uncompressed Length   00000000
0001A Filename Length       000B
0001C Extra Length          0000
0001E Filename              'graphic.svg'
00029 PAYLOAD

02947 STREAMING DATA HEADER 08074B50 0294B CRC C622C669 0294F Compressed Length 0000291E 02953 Uncompressed Length 0002F706

I can also confirm this bit from the man page:

Error handling is still a work in progress. If the program encounters a problem reading a zip file it is likely to terminate with an unhelpful error message.

  • 1
    zipdetails is indeed a useful tool; while it doesn’t flag inconsistencies, it does display both the local and central CRCs, and those values can be compared manually. – Stephen Kitt Dec 16 '21 at 16:32