16

I have a large tarball that is busy being FTP'd over from a remote system to our local system.

I want to know if it is possible to start untarring lets say 50 files at a time so that those files can begin being processed while the transfer takes place.

3 Answers3

24

Here is a detailed explanation on how it is possible to extract specific files from an archive. Specifically GNU tar can be used to extract a single or more files from a tarball. To extract specific archive members, give their exact member names as arguments.

For example:

tar --extract --file={tarball.tar} {file}

You can also extract those files that match a specific globbing pattern (wildcards). For example, to extract from cbz.tar all files that begin with pic, no matter their directory prefix, you could type:

tar -xf cbz.tar --wildcards --no-anchored 'pic*'

To extract all php files, enter:

tar -xf cbz.tar --wildcards --no-anchored '*.php'

Where,

-x: instructs tar to extract files.
-f: specifies filename / tarball name.
-v: Verbose (show progress while extracting files).
-j: filter archive through bzip2, use to decompress .bz2 files.
-z: filter archive through gzip, use to decompress .gz files.
--wildcards: instructs tar to treat command line arguments as globbing patterns.
--no-anchored: informs it that the patterns apply to member names after any / delimiter.

Eugene S
  • 3,484
  • 3
    I don't want to extract specific files. I just want to extract the first 50 files as I don't know what the names of the files are. – Pieter van Niekerk Jul 03 '12 at 06:35
  • 4
    You can get a list of the filenames using "tar -tf", take the first 50 using "head", then feed that list into another tar command as the list of filenames to extract. Like this: "tar -xf file.tar --no-anchored `tar -tf file.tar|head -50`" – Simon Hibbs Jul 03 '12 at 09:01
  • It is quite possible (in my testing) to extract a partially trandferred 50th file.. It would be a good idea to avoid extracting the (current) last file in the -t list until the tarball is fully downloaded. At any point of time, the list shows only filenames which have been, or are being, transferred, ie. not the full list; until it is fully downloaded. – Peter.O Jul 04 '12 at 03:36
3
tar -tvf tarfile.tar

gives you the whole list of files in tarfile.tar

tar -xvf tarfile.tar fileToRestore  

This command restores the fileToRestore

To untar multiple files, but not all of them you can:

  • You put all file list from tarfile.tar into tar.txt

    tar -tvf tarfile.tar > tar.txt
    
  • Now tar.txt has the whole list of files in tarfile.tar and you can leave only the files you want to restore or with head...

    head -n50 tar.txt > tar2.txt
    

You can put these lines into a file

cat tar.txt|while read line
do
   tar -xvf tarfile.tar ${line}
done

Or the complete script file:

#!/bin/bash

if [[ "$1" = "" || "$2" = "" ]]
   then
   echo ""
   echo "Uso: untar-list.sh tarfile.tar listfile.txt"
   echo ""
   exit 1
fi

tarfile=$1
file=$2

if [[ ! -f ${tarfile} ]]
   then
   echo ""
   echo "Archivo ${tarfile} no existe"
   echo ""
   exit 1
fi

if [[ ! -f ${file} ]]
   then
   echo ""
   echo "Archivo ${file} no existe"
   echo ""
   exit 1
fi

cat ${file}|while read line
do
  tar -xvf ${tarfile} ${line}
done

echo ""
echo "Finalizado"
echo ""

And that's all

HalosGhost
  • 4,790
  • 2
    That has to be painfully slow to call the tar command in a loop. It reads the the whole file each time, right? – swdev Nov 20 '14 at 17:58
2

Didn't try this myself, but how about this:

tar xvf archive.tar | head -n50

Tar outputs a line to STDOUT for each file extracted, then the head command will kill the pipe after 50 lines. Upon the pipe dying, I'd expect the tar to die too.

jippie
  • 14,086