0

How do I extract only files from a directory in a tar archive? I need to extract all the files in the directory, not specific files. For example, with a tar archive of this structure:

wordpress
 --wp-admin
 --wp-content
 --wp-includes
 --index.php
 --license.txt
 --readme.html

I want to extract only the 3 (or more) files in the wordpress directory not the sub-directories.

Note that the other solutions I could find on this site are about extracting specific files. My question is about extracting all files.

  • could there be folders with . in their names within the archive (like folder.suff)? – RomanPerekhrest Jun 10 '17 at 13:47
  • Good question. I think the --no-wildcards-match-slash option prevents any sub-directories from being matched. I haven't tested this, so I can't confirm it, though. – Chetan Crasta Jun 10 '17 at 14:05
  • @ChetanCrasta, I've tested those options --wildcards --no-wildcards-match-slash and they do not prevent subdirectories extraction at all – RomanPerekhrest Jun 10 '17 at 14:10
  • Do you mean it doesn't work with subdirectories of type folder.name? Or with all sub-directories? I can confirm that the solution given works on Debian. – Chetan Crasta Jun 10 '17 at 14:57
  • you want to extract specific files -- the ones that aren't subdirectories; does the --files-from option not work for you? You'd list the contents of the tar file, then filter for only the wordpress folder, then exclude the sub-folders, then pass that file to tar. – Jeff Schaller Jun 12 '17 at 20:29

3 Answers3

0

Unless I can make an assumption about the format of directory names and file names, I can't find any way of extracting files at a particular level without also extracting (empty) directories at that same level. (If file names contain a dot but directories do not, then another answer will be sufficient.)

Consider 370358.tar with this content

wordpress/
wordpress/index.php
wordpress/license.txt
wordpress/readme.html
wordpress/wp-admin/
wordpress/wp-admin/admin.php
wordpress/wp-content/
wordpress/wp-content/2021/
wordpress/wp-content/2021/01/
wordpress/wp-content/2021/01/19/
wordpress/wp-content/2021/01/19/today.txt
wordpress/wp-content/content.txt
wordpress/wp-content/here.txt
wordpress/wp-content/some.txt
wordpress/wp-includes/

You can extract files matching wordpress/* like this

tar xf /tmp/370358.tar --wildcards --exclude '*/*/*' && rmdir */* 2>/dev/null

ls -R wordpress wordpress: index.php license.txt readme.html

Notice the trailing rmdir. We cannot tell tar to skip directory creation even though the resulting directories will be empty, so we simply delete them afterwards.

If you don't actually want the wordpress directory to be included in the extraction,

tar xf /tmp/370358.tar --strip-components 1 --wildcards --exclude '*/*/*' && rmdir * 2>/dev/null

ls -R .: index.php license.txt readme.html

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
0

Assuming GNU tar, you could extract the archive through a short bash script:

tar -x -f archive.tar --to-command=./script.sh

... where script.sh is the executable script

#!/bin/bash

if [ "$TAR_FILETYPE" = f ] && [[ $TAR_FILENAME == wordpress/* ]] && [[ $TAR_FILENAME != wordpress// ]] then mkdir -p "${TAR_FILENAME%/*}" && cat >"$TAR_FILENAME" && chmod "$TAR_MODE" "$TAR_FILENAME" && chown "$TAR_UNAME:$TAR_GNAME" "$TAR_FILENAME" || true else cat >/dev/null fi

The TAR_* variables used here are set in the script's environment by GNU tar, and the data of the file currently being extracted is arriving over standard input.

The tests used in the script is to make sure that

  1. The file we're extracting is a regular file, and
  2. The file is located in the wordpress directory and nowhere else.

To extract a file, we simply make sure that the file's directory exists, redirect the standard input to the correct filename, and then modify the file mode and ownerships so that the permissions and owners are correct.

You could obviously also give the script in-line on the tar command line:

tar -x -f archive.tar --to-command='bash -c "
    if [ \"\$TAR_FILETYPE\" = f ] &&
       [[ \$TAR_FILENAME == wordpress/* ]] &&
       [[ \$TAR_FILENAME != wordpress/*/* ]]
    then
        mkdir -p \"\${TAR_FILENAME%/*}\" &&
        cat >\"\$TAR_FILENAME\" &&
        chmod \"\$TAR_MODE\" \"\$TAR_FILENAME\" &&
        chown \"\$TAR_UNAME:\$TAR_GNAME\" \"\$TAR_FILENAME\" || true
    else
        cat >/dev/null
    fi"'
Kusalananda
  • 333,661
-1

This will extract all the files in a directory in a tar file, without going into the sub-directories:

tar --wildcards --no-wildcards-match-slash -xvf file.tar.gz directory_in_tar/*.*