3

I have hundreds of multiple folders which contains thousands of zip files which contain nested within the zip files like show on three below

start tree structure
012016/
├── 2016-01
│   └── 2016-01
│       ├── build
│       ├── DOC
│       │   ├── WONWA1
│       │   │   ├── WO1NWA1
│       │   │   │   ├── WO2016000001NWA1.xml
│       │   │   ├── WO1NWA1.zip
│       │   │   ├── WO2NWA1
│       │   │   │   ├── WO2016000002NWA1_tr.xml
│       │   │   ├── WO2NWA1.zip
└── 2016-01.zip

end tree structure

I have created a short script below which check for the folder and contents recursively, and if it finds any zip files it extracts the contents and then continues to check the contents of the extracted folder.

When I try to run the script below:

recurse() {
    for i in "$1"/*;
    do
        currentItem="$i"
        extension="${currentItem##*.}"

        if [ -d "$i" ]; then
            #echo "dir: $i"
            recurse "$i"
        elif [ -f "$i" ];   then
            #echo "file: $i"
            #echo "ext: $extension"

            [[ ${extension} = +(sh|xslt|dtd|log|txt) ]] && break

            extractionDirectory=$(dirname $currentItem)/$(basename -s .zip $currentItem )

            [[ ${extension} = "zip" ]] && unzip -uq $currentItem -d "${extractionDirectory}"

            recurse ${extractionDirectory}
        fi
    done }
    recurse $PWD

However, when i run the above script I am getting the error:

Segmentation fault (core dumped)

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
  • So basically you just recurse and unzip the archives in the same directory as the archive right? – Kenpachi Jul 18 '16 at 15:20
  • Scripts do not core-dump, applications do. Do you have any idea which executable is core dumping ? If not, try either running your script with -x option, as in bash -x /path/to/myscript.sh or before any processing occurs in your script, insert the line set -x. The verbose output can show you the core dumping executable. – MelBurslan Jul 18 '16 at 15:25
  • @MelBurslan Bash does core dump if the function call stack grows too large, which I think is what happens here due to an out-of-control recursion. – Gilles 'SO- stop being evil' Jul 18 '16 at 23:34
  • @Gilles Thanks for the response. You are right. The recursion got out-of-control. If you fopen a file, it fails and the file pointer returned is NULL and you try to read from that file pointer. This will give you a segmentation fault. last comment here: link – Noel Alex Makumuli Jul 19 '16 at 00:15
  • @NoelAlexMakumuli Attempting to read from a file when fopen returned NULL is just one example among many, many, many of a segfault. – Gilles 'SO- stop being evil' Jul 19 '16 at 00:38

2 Answers2

4

There are many reasons for a segmentation fault. The most common low-level cause is that the process tried to access a memory address which isn't defined, i.e. an invalid pointer dereference. This is often a bug in the program.

Here, you're running a shell program. The shell is a high-level programming language, without pointers, so your script can't cause an invalid pointer dereference as such.

Many programs have limited space for their call stack and die of a segmentation fault is the stack size is exceeded. In most cases, the stack size limit is large enough for any reasonable data, but an infinite recursion can blow the stack.

In bash, infinite recursion in a function call does cause a segmentation fault. (The same goes for dash and mksh; ksh and zsh are smarter and apply a maximum function call nesting depth at the shell level so that they don't segfault.)


Your script has several bugs. The one that's biting you is that in the case of a regular file, you always call recurse at the end, whereas you clearly meant to do it only for zip files.

Don't use && or || when you mean if. It's clearer to write what you mean; brevity through obscurity is not a good idea and it bit you here.

if [[ ${extension} = "zip" ]]; then
  unzip -uq $currentItem -d "${extractionDirectory}"
  recurse ${extractionDirectory}
fi

Another bug is that you're missing double quotes around variable substitutions, so your program will choke on file names containing whitespace (among others). Always use double quotes around variable substitutions unless you know that you need to leave them off.

Use parameter expansion instead of calling basename and dirname. It's easier to deal with special cases (e.g. file name beginning with -) and it's faster.

Another bug I happened to spot is that the pattern +(sh|xslt|dtd|log|txt) is clearly meant to be @(sh|xslt|dtd|log|txt) (match these extensions, not shsh, dtdtxtshdtd etc.).

Here's the regular file case, with the bugs above fixed and rewritten with case for clarity:

case "$extension" in
  sh|xslt|dtd|log|txt) break;;
  zip)
    extractionDirectory=$"{currentItem%.zip}"
    unzip -uq "$currentItem" -d "${extractionDirectory}"
    recurse "${extractionDirectory}"
esac

Note that I haven't verified the logic or tested the code. This seems to be a complicated way of writing

find -type f -name '*.zip' -exec sh -c 'unzip -uq "$0" -d "${0%.zip}"' {} \;
  • Thanks a bunch for pointing out my bugs. The extension block is unnecessary, just putting the zip extension check in the a condition solve my problem as you pointed it out. if [[ ${extension} = "zip" ]]; then unzip -uq $currentItem -d "${extractionDirectory}" recurse ${extractionDirectory} fi – Noel Alex Makumuli Jul 19 '16 at 10:08
1

From Gilles' answer:

In bash, infinite recursion in a function call does cause a segmentation fault. (The same goes for dash and mksh; ksh and zsh are smarter and apply a maximum function call nesting depth at the shell level so that they don't segfault.)

In Bash you can also set the maximum function call nesting depth by setting FUNCNEST. This is described in man bash:

The FUNCNEST variable, if set to a numeric value greater than 0, defines a maximum function nesting level. Function invocations that exceed the limit cause the entire command to abort.

Here you can see it in action:

$ f () { f; }
$ FUNCNEST=10 f
bash: f: maximum function nesting level exceeded (10)