1

I have a shell script, which is using unzip to get a very huge file from a remote directory to local directory within the shell. This operations taking pretty long time, roughly from 20-30 min.

#!/bin/sh

unzip RemoteHostNFSDirectory -d LocalHostDirectory > output.log

(it is a 6.2 GB file).

How can I embed the above command in a progress or hour-glass bar so users don't think that it is just hanging and once extract to the local directory is done successfully. I can print success or failed error.

(Newbie to shell, apologize for any inconvenience.)

1 Answers1

0

Preliminary note

Displaying "a progress or hour-glass bar so users don't think that it is just hanging" regardless if unzip is not hanging would be quite easy. In this answer I assume you don't want to mislead users. The answer tries to show the progress of unzip, not just a fake indicator.


Most straightforward method

If your unzip can read from stdin and you want to extract the whole archive then measure progress when reading the archive:

< /path/to/archive.zip pv | unzip -

pv can be replaced by any tool that displays progress information and passes the data through.


Other general methods

If your unzip cannot read from stdin but you're sure there is just one file in the archive and you know the name you want to use for the extracted file, then extract to stdout and pass through pv to get a progress indicator:

unzip -p /path/to/archive.zip | pv > /extracted/name

If there may be more files in the archive then you need to specify a single file to extract:

unzip -p /path/to/archive.zip internal/path/to/compressed/file | pv > /extracted/name

Extracting many files with a single unzip -p will concatenate them in /extracted/name. To extract more than one file run unzip many times, redirecting to different pathname each time.

If you don't know internal names then you need to parse unzip -l or unzip -v beforehand. This way you can also learn the uncompressed size, if you want to use it with pv -s. I admit I don't know how stable and parsable (unambiguous) in general these formats are.

With unzip -p you won't get any log from unzip. Rely on its exit status. If you need some kind of log then the shell script itself should write to it. The script must know at least the /extracted/name, so it can log at least this.


FUSE?

I expect any FUSE-based solution to allow you to use any tool able to copy a regular file. The progress bar can then depend on the tool. The command may be as simple as:

pv /mountpoint/internal/path/to/compressed/file > /extracted/name

This won't necessarily help you. I tested fuse-zip. It seems it extracts (to a temporary file or in memory, whatever) before the actual copying tool can start its job. So the actual extracting is still without any progress indicator; the chosen tool can only indicate copying of an already extracted file later. Caching a "very huge file" has its own problems, I'm not sure if and how the tool tries to solve them. Irrelevant, because fuse-zip cannot solve your original problem anyway.

I also tested archivemount. The progress bar of pv started immediately but the whole setup was painfully slow. I discovered archivemount jumps (seeks) within the archive back and forth even if the reading process reads sequentially. Probably this will be unpractical for your "very huge file". Maybe some tweaks are possible, maybe I missed them.


Trick with pv

A clever but somewhat cumbersome method is with pv -d:

unzip /path/to/archive.zip > output.log &
pv -d "$!"
wait "$!"

The method should be fine to make "users not think that it is just hanging", although in its basic form it will show users more than you probably want. Some options of pv or even "manually" parsing /proc/$!/fd and /proc/$!/fdinfo without pv may help.

unzip working in the background will not be able to get response from the user easily, so consider unzip -o (with caution).

pv will terminate after unzip terminates, so there is no need for wait if you want to strictly wait. wait "$!" is there to return the exit status from unzip.

  • Hello Sir, I am on RedHat 7.8 and pv is not available or not even option as it is not part of standard corporate build. I was thinking of some kind of spinner or hour-glass to roll/rotate while the files are getting extracted. Although "pv" looks like pretty simple and handy. – Naveed Iftikhar Apr 16 '21 at 06:11
  • @NaveedIftikhar In this case pv (but not pv -d) can be replaced by any tool that displays progress information and passes the data through. Is mbuffer available? Does your dd support status=progress? Or is ifne available? (it does not print progress but can be used to create a tool that does). With tee the tool does not even need to pass the data through. "What to use instead of pv?" and "how to make unzip work with pv or like?" are different questions. This site works best when there is one question per question. But I'm willing to help, just answer my questions. – Kamil Maciorowski Apr 16 '21 at 06:35
  • @NaveedIftikhar some kind of spinner – If the input is of undetermined size (and this is the case when reading from a pipe, unless you use -s) pv displays a progress indicator that moves back and forth. pv -p displays the indicator only. It's not that different from a spinner. Or did you mean a spinner in GUI? – Kamil Maciorowski Apr 16 '21 at 06:41
  • Sir Kamil, No in GUI just on simple bash shell. I was able to create a spinner (google the great) and was able to implement it. Appreciate all of your input and help. Thanks a lot. – Naveed Iftikhar Apr 16 '21 at 17:35