0

Let's say I have a python script which generates some file file_i.yo in folder folder ever second. After a while I decided that I have to zip the current state of folder and send it somewhere. Thus I call in bash

zip -r current_folder.zip folder

and the zipping takes around 1 minute.

What will happen in such case? What interest me the most is that if there is a chance that the zipped file will be broken.

1 Answers1

2

The archive itself will be OK (i.e. it will be a valid zip archive) but file_i.yo inside may or may not be damaged. It depends on how file_i.yo is updated. Compare this answer:

When you want to modify a file, you have two options, each with its benefits and drawbacks.

  • You can overwrite the file in place. This does not use any extra space, and conserves the hard links, permissions and any other attribute beyond the content of the existing file. The major drawback of doing this is that if anything happens while the file is being written (the application crashes, or the power goes out), you end up with a partially written file.
  • You can write the new version of the file to a new file with a different name, then move it into place. This uses more space and breaks hard links, and if you have write permissions on a file but not on the directory it contains, you can't do it at all. On the flip side, the old version of the file is atomically replaced by the new version, so at every point in time the file name points to a valid, complete version of the file.

In the first case the file may change when your zip process is reading it. In effect the content read will be like a binary equivalent of panorama fail: different fragments of the result will come from different "versions" of reality. If this happens then the resulting archive will contain mangled, empty or partial file_i.yo file (depending on how the file is updated in place and during what phase(s) of updating the zip process managed to read the file).

In the second case zip will open the file and read some single "version". Even if a new file (new "version") replaces the old one in the directory, the descriptor used by zip will point to the old file and the tool will read it to the end. The resulting archive will contain some valid file_i.yo file.

  • By generating file_i.yo file i mean that every time that there is a new file with different i index. Nonetheless, I understand that the the bunch of files which were created before calling zip will be fully functional and those after calling zip may be mangled. Am I right? – Fallen Apart May 07 '21 at 15:44
  • @FallenApart Basically you're right. A file created anew and written to when zip works may be archived as incomplete (I mean it will miss some data you would expect; I don't mean corruption in the filesystem or in the archive). A general way to deal with such problems may be to cp -l the entire directory to a new place. This creates hardlinks. You wait until all files in the "copy" stop being edited (written to). New files may appear in the original location but the "copy" at some point becomes static. Then you zip the "copy". – Kamil Maciorowski May 07 '21 at 16:15