-1

This code was written to extract multiple .tar files to another location and woks fine

ls -1 ${short_filename} \
| while read file; do \
  tar -zxvf "$file" -C ${pentaho_temp_path}/${sessionkey}; \
done

I want to do the same for .gz files on a Linux machine. My files have no whitespace in their names and they look like this:

xport_RAN_Maa_LIM_10.93.217.170_20220629030929.xml.gz
xport_RAN_Maa_LIM_10.93.217.170_20220630030936.xml.gz 
xport_RAN_Mau_MPU_10.188.83.138_20220629031403.xml.gz 
xport_RAN_Mau_MPU_10.188.83.138_20220630031444.xml.gz 
terdon
  • 242,166
Wessley
  • 11
  • 3
    never ever parse the output of ls; not even with -1: Filenames can have all kinds and numbers of line breaks in them! A for file in ${short_filename}/*; instead of the ls .. | while read… would be safer, shorter, faster and nicer to read! – Marcus Müller Jul 01 '22 at 09:16
  • 1
    Please [edit] your question and show example values for short_filename. Instead of parsing the output of ls you should better use a for loop. The details depend on the value of short_filename. – Bodo Jul 01 '22 at 09:18
  • filenames go not have breaks or example of the short_filename are: xport_RAN_Maa_LIM_10.93.217.170_20220629030929.xml.gz
    xport_RAN_Maa_LIM_10.93.217.170_20220630030936.xml.gz
    xport_RAN_Mau_MPU_10.188.83.138_20220629031403.xml.gz
    xport_RAN_Mau_MPU_10.188.83.138_20220630031444.xml.gz
    – Wessley Jul 01 '22 at 09:22
  • 1
    Please don't use comments to provide requested information. [Edit] your question instead. Please make clear in your question if the value of short_filename is a single filename, a directory name or a wildcard. – Bodo Jul 01 '22 at 09:28
  • Is $short_filename a single filename, a pattern, or something else? – Kusalananda Jul 01 '22 at 10:17

3 Answers3

4

gzip files do not contain "files"; just a stream of bytes. So, if you want to uncompress them, just gzip -d --stdout "${file}" > "${target_directory}/${target_file}".

Notize that your script only works on *.tar.gz, already, because of the -z flag.

2

Your script is needlessly complicated. Lines can be broken at both | and do, as well as after any command, so none of your \ are needed. The exact same script could be written like this:

ls -1 "$short_filename" | 
    while read file; do
        tar -zxvf "$file" -C "$pentaho_temp_path/$sessionkey"
    done

But that is a bad idea. As others have pointed out, parsing ls is very fragile and should be avoided. It is also pointless in this case, since all you need for the above script is:

for file in "$short_filename"/*; do
    tar -zxvf "$file" -C "$pentaho_temp_path/$sessionkey"
done

But since your files are all just .gz and not tarred, you don't need tar at all and can just do:

for file in "$short_filename"/*; do
    zcat "$file" > "$pentaho_temp_path/$sessionkey"/"$(basename "$file" .gz)"
done
terdon
  • 242,166
  • 1
    yeah, zcat is probably the rightest tool – Marcus Müller Jul 01 '22 at 10:19
  • +1 zcat, and maybe background each process will leverage resource usage, also use basename to avoid the full path in $file: for file in "$short_filename"/*.gz; do zcat "$file" > "$pentaho_temp_path/$sessionkey"/"$(basename "$file" .gz)" & ; done ; wait – Thibault LE PAUL Jul 01 '22 at 11:19
  • 1
    @ThibaultLEPAUL depending on the number of files, that could easily crash the machine, and if doing it over a network, disrupt all users of the network. Not something to suggest without knowing more context. (PS. you would want for file ...; zcat ... & done and not & ;, that's a syntax error). – terdon Jul 01 '22 at 11:22
  • Will not work as is because mv target filename is not stripped from parent dir – Thibault LE PAUL Jul 01 '22 at 21:40
  • @ThibaultLEPAUL oh wow, of course! I had completely missed that and also didn't notice you mentioned it in your previous comment. I... need to learn to read. Fixed now, thank you! – terdon Jul 01 '22 at 22:34
1

Consider this:

for file in "$short_filename"/*.gz; do
  zcat "$file" > "$pentaho_temp_path/$sessionkey/$(basename "$file" .gz)"
done
  • avoid piping ls output
  • filter only .gz extensions
  • parse full path to isolate file name without extension basename "$fullpath" .gz

if you are sure that no filename contains newlines, you can pipe them to parallelize the process:

myzcat(){ zcat "$1" > "$pentaho_temp_path/$sessionkey/$(basename "$1" .gz)"; }

ls "$short_filename"/*.gz | parallel myzcat

  • use pipe instead of list model
  • define a local function that takes one single argument
  • run as many parallel processes as there are processors