18

I have a .zip created on a Windows machine (outside of my control). The zip file contains paths that I need to preserve when I unzip.

However, when I unzip, all files end up like:
unzip_dir/\window\path\separator\myfile.ext

I've tried both, with and without -j option. My issue is that I need that path information under \window\path\separator\. I need that file structure to be created when I unzip.

I can mv the file and flip the \ to / easily enough in a script, but then there are errors that the destination path directories do not exist. My workaround for now is to mkdir -p the paths (after converting \ to /) and then cp the files to those paths.

But there are a lot of files, and these redundant mkdir -p statements for every file really slows things down.

Is there any more elegant way to convert a zip file with Windows paths to Linux paths?

Slav
  • 283
  • Go Back to Windows. Tell whoever creates the Zip File not to use the Native Zip Interface, but a program like 7-Zip, and have them create a tar file. If that can't be done you need to unzip the files while ignoring the path, using the unzip -j -d options. See Forcing Unzip - No Paths – eyoung100 Nov 05 '14 at 17:04
  • Even with -j, I still get filename \window\path\separator\myfile.ext cause Linux/Zip don't treat it as a paths. And I have absolutely no control over the zip file creation. – Slav Nov 05 '14 at 17:25
  • 2
    I have a hunch, that the zip file creator is using the native Windows Zip Interface, i.e, create an empty file and then adding files into the zip file. This method is not portable, as you have discovered. You need to use a program like WinZip or 7-Zip that has a CLI, like Anton used below. I just tried to use Zip.exe and got not recognized. – eyoung100 Nov 05 '14 at 17:35

8 Answers8

11

Use 7z rn to rename the files within the archive so that they have a forward slash. Then when you extract the archive, directories will be created.

To rename the files, list the paths of the files within the archive containing slashes, and generate a list of replacement strings that change the backslash to a slash using awk, for example.

7z rn windows.zip $(7z l windows.zip | grep '\\' | awk '{ print $6, gensub(/\\/, "/", "g", $6); }' | paste -s)
xn.
  • 323
8

I think something went wrong with the creation of the zip file, because when I create a zip file on Windows is has (portable) forward slashes:

zip.exe -r pip pip
updating: pip/ (244 bytes security) (stored 0%)
  adding: pip/pip.log (164 bytes security) (deflated 66%)

But now that you have the files with file names that contain "paths" with backslashes, you can run the following program in unzip_dir:

#! /usr/bin/env python

# already created directories, walk works topdown, so a child dir
# never creates a directory if there is a parent dir with a file.
made_dirs = set()

for root, dir_names, file_names in os.walk('.'):
    for file_name in file_names:
        if '\\' not in file_name:
            continue
        alt_file_name = file_name.replace('\\', '/')
        if alt_file_name.startswith('/'):
            alt_file_name = alt_file_name[1:]  # cut of starting dir separator
        alt_dir_name, alt_base_name = alt_file_name.rsplit('/', 1)
        print 'alt_dir', alt_dir_name
        full_dir_name = os.path.join(root, alt_dir_name)
        if full_dir_name not in made_dirs:
            os.makedirs(full_dir_name)  # only create if not done yet
            made_dirs.add(full_dir_name)
        os.rename(os.path.join(root, file_name),
                  os.path.join(root, alt_file_name))

This handles files in any directory under the directory from where the program is started. Given the problem that you describe, the unzip_dir probably doesn't have any subdirectories to start with, and the program could just walk over the files in the current directory only.

Anthon
  • 79,293
  • This is pretty much what I am doing in shell, but there are a lot of files (and not that many directories), so there are a lot of redundant os.makedirs(os.path.join(root, alt_dir_name)) statements, and the script really seems to slow down around that block – Slav Nov 06 '14 at 15:57
  • @Slav actually with 2 files in the same directory the script gave an error. I now only make each directory once. The names are cached in memory in the set made_dirs. That way it doesn't try, or even have to check the disk for files in the same directory. This could be optimized some more if necessary, as makedirs() actually excepts on all the intermediate directories. That makes sense if most files live alone in a separate directory. – Anthon Nov 06 '14 at 16:11
  • That last sentence of the previous comment refers to the optimization. – Anthon Nov 06 '14 at 16:21
  • 1
    The script errored becase a folder already existed. I replaced the line
    os.makedirs(full_dir_name)
    
    

    with

            try:
                os.makedirs(full_dir_name)
            except OSError as exc:
                if exc.errno == errno.EEXIST and os.path.isdir(full_dir_name):
                    # the pass already exists and is a folder, let's just ignore it
                    pass
                else:
                    raise
    
    – madmuffin Aug 18 '16 at 13:01
  • Note: The script is missing an import os to be able to run out of the box. – madmuffin Aug 18 '16 at 13:03
  • import errno for errno.EEXIST. – abdulwadood Jun 16 '21 at 11:22
3

This is just an update of @anton's answer which includes fixes by @madmuffin (FileExistsError: [Errno 17] File exists and missing os module import), a fix for python 3 (SyntaxError: Missing parentheses in call to 'print') and a fix for the missing errno module import (NameError: name 'errno' is not defined).

#! /usr/bin/env python

import os
import errno

# already created directories, walk works topdown, so a child dir
# never creates a directory if there is a parent dir with a file.
made_dirs = set()

for root, dir_names, file_names in os.walk('.'):
    for file_name in file_names:
        if '\\' not in file_name:
            continue
        alt_file_name = file_name.replace('\\', '/')
        if alt_file_name.startswith('/'):
            alt_file_name = alt_file_name[1:]  # cut of starting dir separator
        alt_dir_name, alt_base_name = alt_file_name.rsplit('/', 1)
        print('alt_dir', alt_dir_name)
        full_dir_name = os.path.join(root, alt_dir_name)
        if full_dir_name not in made_dirs:
            try:
                os.makedirs(full_dir_name)
            except OSError as exc:
                if exc.errno == errno.EEXIST and os.path.isdir(full_dir_name):
                    # the pass already exists and is a folder, let's just ignore it
                    pass
                else:
                    raise 
            made_dirs.add(full_dir_name)
        os.rename(os.path.join(root, file_name),
                  os.path.join(root, alt_file_name))
Daishi
  • 196
3

Had to make a few changes to @xn.'s answer for Mac. This worked for me:

  1. brew install gawk
  2. 7z rn windows.zip $(7z l windows.zip | grep '\\' | gawk '{ print $6, gensub(/\\/, "/", "g", $6); }' | paste -s -)
DozyBrat
  • 31
  • 2
2

The standard makes it clear that all slashes must be forward slashes

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

4.4.17.1 The name of the file, with optional relative path.
   The path stored MUST NOT contain a drive or
   device letter, or a leading slash.  All slashes
   MUST be forward slashes '/' as opposed to
   backwards slashes '\' for compatibility with Amiga
   and UNIX file systems etc.  If input came from standard
   input, there is no file name field.

Here is a script that fixes a non-complaint zip file AND it also handles white spaces, which was missing from a previous answer:

#!/bin/sh
CMD_INSTRUCTIONS=$(7z l -ba -slt "$1" | grep '\\' | sed 's/^Path = //g' | sed 's/.*/"&"/' | gawk '{ print $0, gensub(/\\/, "/", "g", $0); }' | sed 's|\\|\\\\|g' | paste -s -)
CMD_7Z="7z rn \"$1\" $CMD_INSTRUCTIONS"
eval "$CMD_7Z"

Source:

Toby Speight
  • 8,678
Emilio
  • 131
1

or, after unzipping use bash to fix the filenames:

for i in *; do new=${i//\\/\/}; newd=$(dirname "$new"); mkdir -p "$newd"; mv "$i" "$new"; done
Henrik
  • 11
  • @DozyBrat method failed for me for filenames that had slashes and spaces in it. They weren't too many, and the remaining ones were moved to the correct folder with this line. – dmcontador Mar 12 '21 at 06:43
0

I had the same problem yesterday: I had downloaded a Windows driver containing paths with a backslash ("") as path separator. For installing Windows on my Linux machine I needed to unzip this file to a usb file using Linux, but Linux refused to do so due to invalid paths. The above solutions didn't work for me (I can't really tell why).

This is how I solved it.

Requirements:

  • Python 3 (apt-get install python3)

Steps:

  1. Copy this code into an empty file in the zip's directory:
#!/usr/bin/python

import zipfile import sys

def main(argv): if len(argv)!=1: filename=raw_input("Which file do you want to extract? Input here:\n") else: filename=argv[0] zipObj=zipfile.ZipFile(filename, 'r') for info in zipObj.infolist(): info.filename=info.filename.replace('\', '/') zipObj.extract(info)

if name == "main": main(sys.argv[1:])

  1. Make the file executable.
  2. Run the file. Either specify a file on the command line or input the zip's filename. It will be extracted to the same folder, but path separators will be repaired.
0

I realize this script is probably similar to what OP used, but I thought I'd add it here for others who might need a bash solution instead of python.

I had the same problem with big zip file that I unzipped on a linux box. Everything dumped into one directory with backslashes in the file names. The script below

  1. loops through all of those files
  2. Separates the directory path from the file name
  3. Flips backslashes into forward slashes
  4. tests to see if the directory path exists, and creates it if it doesn't
  5. moves the file into the correct directory

The first argument is the directory where the files exist. If you name this script "correct-paths.sh" you can call it with

$ ./correct-paths.sh dir-with-backslash-files

Here's the content of the bash script

#!/bin/bash

if [[ ! -d "$1" ]] then echo "The argument must be a directory where the files live." echo "'$1' is not a directory'" exit 0 fi

cd $1

for fullpath in * do filename=$(echo $fullpath | awk -F'\' '{ print $NF }') dirpath=$(echo ${fullpath%\*} | sed 's/\///g') if [[ ! -d "$dirpath" ]] then mkdir -p "$dirpath" fi

mv "$fullpath" "$dirpath/$filename"

done