1

I have been using the following snippet pulled from a similar post - https://unix.stackexchange.com/a/101273/212793 - to get a filename from a tar.gz file:

tar tzf "archive.tar.gz" | awk -F/ '{ if($NF != "") print $NF }'

For my case, I only need one specific file, so I use something like:

tar tzf "archive.tar.gz" | awk -F/ '{ if($NF != "") print $NF }' | grep -e "^..*my-file-name\\.ext$"

The key part is that, my .tar.gz is very big, and contains a lot of files. However, each one has a similar "hash" appended to the beginning (hence the ^..* part of my grep'd regex).

So the files might look like:

- 4b77e4e1_file-a.ext
- 4b77e4e1_file-b.ext
- 4b77e4e1_file-c.ext
# ect.

I noticed the command to get all the files (tar tzf "archive.tar.gz" | awk -F/ '{ if($NF != "") print $NF }') streams the output.

My thought is, if I could "break" the stream, then extract that first hash part, I could build my filename that I'd eventually need without having to loop through the entire contents of the .tar.gz file.

So my question is, how can I "break" awk on its first output, as opposed to waiting for the whole command to finish (which takes several minutes) and grepping the result to get the filename I eventually want.

EDIT: Looks like I actually want to break tar, as simply exiting after the first result doesn't change the execution time.

romellem
  • 111
  • I think you could be using expect to solve your problem : let it spawn the tar and ask it to print the relevant information and close the process once such information is found. – Aaron Feb 22 '17 at 17:23

3 Answers3

2

You can tell AWK to exit after printing something:

awk -F/ '$NF != "" { print $NF; exit }'

Since you're looking for a specific filename:

awk -F/ '/my-file-name\.ext$/ && $NF != "" { print $NF; exit }'

The $NF test is redundant then:

awk -F/ '/my-file-name\.ext$/ { print $NF; exit }'
Stephen Kitt
  • 434,908
2

try something like

 tar tzf "archive.tar.gz" | awk -F/ '$NF ~ /my-filename$/ {print $NF ; exit }'

or

 tar tzf "archive.tar.gz" | awk -F/ 'substr($NF,4,11) == "my-filename" {print $NF ; exit }'

(where 4 and 11 are to be match with actual filename).

Archemar
  • 31,554
2

If one of the programs in the pipeline exits, then the programs to the left of it will also exit. The way this works is:

  • In foo | bar, bar exits.
  • Exiting the process closes the read end of the pipe.
  • When foo tries to write to the pipe, it receives a SIGPIPE signal.
  • foo dies.

This assumes that foo hasn't protected against SIGPIPE; programs can do that, but typical command line programs don't.

Since you only want one match, make awk exit as soon as it's found the line you're interested in:

tar tzf "archive.tar.gz" | awk -F/ '$NF ~ /.my-file-name\.ext$/ {print $NF; exit}'

or (not really advantageous here)

tar tzf "archive.tar.gz" | sed -n '/[^/]my-file-name\.ext$/ {s!.*/!!; q}'

or, sticking with your more complicated approach of using grep separately from awk

tar tzf "archive.tar.gz" | awk -F/ '{ if($NF != "") print $NF }' | grep -e "^..*my-file-name\\.ext$" | head -n 1

Exiting from the reader causes tar to exit when it next writes to the pipe, which can take a little while, because of output buffering. (It'll take especially long if there are more than two processes on the pipe, since there'll be some delay for each one to receive SIGPIPE.) After awk exits, find will spend a little while reading the archive and filling the next buffer with file names, then finally try to write the buffer and get killed with SIGPIPE. For this application, it would very likely be faster to switch tar to line buffering for output, which you can do with stdbuf:

stdbuf -oL tar tzf "archive.tar.gz" | awk -F/ '$NF ~ /.my-file-name\.ext$/ {print $NF; exit}'

Alternatively, you could arrange to kill the tar program when awk exits, but that's more complicated.

sh -m 'tar tzf "archive.tar.gz" | {
         awk -F/ "$0";
         kill -TERM -$$;
       }' '$NF ~ /.my-file-name\.ext$/ {print $NF; exit}'
[ $? -eq 143 ]