37

Respectable projects release tar archives that contain a single directory, for instance zyrgus-3.18.tar.gz contains a zyrgus-3.18 folder which in turn contains src, build, dist, etc.

But some punk projects put everything at the root :'-( This results in a total mess when unarchiving. Creating a folder manually every time is a pain, and unnecessary most of the time.

  • Is there a super-fast way to tell whether a .tar or .tar.gz file contains more than a single directory at its root? Even for a big archive.
  • Or even better, is there a tool that in such cases would create a directory (name of the archive without the extension) and put everything inside?
  • 4
  • 2
    I think broken packaging worth a bug report to package author. –  Nov 11 '15 at 10:58
  • 14
    I've historically (since the mid 90's) just always untarred into a subdirectory. If its all put into a single directory (as it should be), its contents can then be moved to the right place with mv, then you can delete the superfluous extra directory. Two extra steps yes, but it beats cleaning up the mess from a mis-made tar file. – T.E.D. Nov 11 '15 at 15:04
  • 7
    But some punk projects put everything at the root :'-( And some punk projects put everything inside a folder completely unnecessarily, considering that they're already putting everything inside an enclosing archive, so that when you download and unzip it into its own folder like any smart user would do, you end up with all the content buried another layer down. ;-) – Mason Wheeler Nov 11 '15 at 18:31
  • We need a --nobomb switch that creates a directory if more than one dir or file would be created in root. Who's up for writing a patch? – Roger Dahl Nov 11 '15 at 19:37
  • 3
    @MasonWheeler There is a kind of "de-facto standard" for tar archives to have everything in one folder inside. – glglgl Nov 12 '15 at 09:28
  • As a sys admin I would repack my own tar for deployment, if this is indeed not a bug and no maintainer concerned for this filth, then that's a strong indication that it's not ready for production system either. I try my best to purge my life and my systems of such programs and projects. – ThorSummoner Nov 18 '15 at 01:36

5 Answers5

32

patool handles different kinds of archives and creates a subdirectory in case the archive contains multiple files to prevent cluttering the working directory with the extracted files.

Extract archive

patool extract archive.tar

To obtain a list of the supported formats, use patool formats.

Marco
  • 33,548
  • FYI: Found it at http://sourceforge.net/projects/patool/ . It's an rpm and I used alien to convert it to a deb for Ubuntu. – Joe Nov 13 '15 at 23:06
  • patool should be in the repos for Debian and Ubuntu if you're running a current version. – Marco Nov 14 '15 at 12:30
12

You could do something like

tar tf thefile.tar | cut -d/ -f1 | sort -u

to see what top-level entries a tar has; pipe to wc -l to check if there's more than one. Note that there are a few cases where this would fail, e.g. if the tar contains file paths of the form somedir/whatever and also ./somedir/whatever (or something crazier); this should be uncommon, though.

This will read the whole tar file before outputting anything, because of the sort, though it should be faster than actually extracting because it's just one sequential read and it can skip large files.

If you're doing this interactively and the file might be large, you can change sort -u to uniq and Control+C if it prints out more than one thing.

Danica
  • 221
7

you can do:

pax <some.tar

...to list the contents of a tar file.

if you want to know how many levels deep it goes, you can do:

pax <some.tar | tr -dc /\\n | sort -r | head -n1

you can explicitly forbid an explosion on extraction with:

mkdir some.tar
pax -'rs|^|some.tar/|' <some.tar
mikeserv
  • 58,310
3

This should do what you want. I'm sure someone can improve it. In these examples I assume a gzip compressed tar archive since this is the most common.

You want an archive where there are no sibling nodes in the root level directory tree.

Every entry in the tar content list must begin with the same pattern. This pattern is the base directory path that all entries in the archive must share. If any two entries do not begin with the same pattern then they are siblings.

The first line in the tar content list will give you the minimal pattern you need to check for. This is the BASEPATH.

BASEPATH=$(tar ztf example.tar.gz | (read line; echo $line))

Then to test for explosive tarballs you need to check if any line of the tar content list does not begin with the BASEPATH.

tar ztf example.tar.gz | grep -qv "^${BASEPATH}"

Turn this into a shell function:

is_explosive() {
    TARBALL_NAME=$1
    tar ztf "${TARBALL_NAME}" | grep -qv "^$(tar ztf "${TARBALL_NAME}" | (read line; echo ${line}))"
    return $?
}

From here you can write a safe tar archive extraction function.

is_explosive() {
    TARBALL_NAME=$1
    tar ztf "${TARBALL_NAME}" | grep -qv "^$(tar ztf "${TARBALL_NAME}" | (read line; echo ${line}))"
    return $?
}

safe_tar_x() {
    TARBALL_NAME=$1
    if is_explosive ${TARBALL_NAME}; then
        SUBDIR=${TARBALL_NAME%.tar.gz}
        SUBDIR=${SUBDIR##*/}
        mkdir "${SUBDIR}"
        echo "WARNING: This tarball is explosive. Opening in subdirectory, ${SUBDIR}, for safety." >&2
    else
        SUBDIR="."
    fi
    # Tar quirks: "--directory" must be last, and using more than
    #     one option group requires that all groups start with a dash.
    tar -zxf "${TARBALL_NAME}" --directory "${SUBDIR}"
    return $?
}
0

aunpack archive.tar is what I use.

Part of the good old atool package. Manpage: https://linux.die.net/man/1/atool