2

I'm trying to decompress a big zip file containing more that a hundred files, 128 MB each. When the decompression is interrupted, I have to delete the last file that was being decompressed, and restart all over with the option to skip existing files enabled, like so:

unzip -n my_compressed_file.zip -d destination

Is there a way to decompress zip files so that only those fully decompressed files appear in the destination directory?

rraallvv
  • 123
  • How and why is decompression being interrupted? – JigglyNaga Nov 06 '18 at 12:17
  • @JigglyNaga The decompression is executed remotely on a VPS, and the hosting provider has shutdown my server multiple times for high CPU usage. Until I can find a way to limit the CPU quota for some processes so that they don't shut down the server again I can't be sure the decompression won't be interrupted. – rraallvv Nov 06 '18 at 12:29

1 Answers1

2

You could write a wrapper script that extracts the files to a temporary location, and only moves them to their final destination when they are complete. Something like

tempdir="incomplete/"

mkdir -p "$tempdir"
zipinfo -1 compressed.zip | while read f ; do
        test -f "$f" && continue # skip anything extracted by a previous attempt
        printf "extracting %s..." "$f"
        unzip -p compressed.zip "$f" > "$tempdir/$f"
        printf "done!\n"
        mv "$tempdir/$f" "$f"
done
rm -r "$tempdir"

If this is interrupted, then you'll still have a partial file, but when you run it again, it will skip complete files (in their correct location) and immediately overwrite the partial one (in the temporary directory). When it finally reaches the end of the archive, it will remove the temporary directory entirely.

There are some limits to my example script. It assumes that the zip doesn't contain a directory structure of its own, and uses the temporary directory incomplete/ inside the working folder. If this is unacceptable, you'd have to

  • use another value for tempdir, that is somewhere on the same filesystem (to permit atomic mv) and is guaranteed not to be used by any other process, and
  • add an additional mkdir step, inside the loop, to reconstruct the extracted directory structure

See also Is mv atomic on my fs?

JigglyNaga
  • 7,886