52

I've run across some scripting like this recently:

( set -e ; do-stuff; do-more-stuff; ) || echo failed

This looks fine to me, but it does not work! The set -e does not apply, when you add the ||. Without that, it works fine:

$ ( set -e; false; echo passed; ); echo $?
1

However, if I add the ||, the set -e is ignored:

$ ( set -e; false; echo passed; ) || echo failed
passed

Using a real, separate shell works as expected:

$ sh -c 'set -e; false; echo passed;' || echo failed
failed

I've tried this in multiple different shells (bash, dash, ksh93) and all behave the same way, so it's not a bug. Can someone explain this?

Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102
MadScientist
  • 3,108

5 Answers5

54

According to this thread, it's the behavior POSIX specifies for using "set -e" in a subshell.

(I was surprised as well.)

First, the behavior:

The -e setting shall be ignored when executing the compound list following the while, until, if, or elif reserved word, a pipeline beginning with the ! reserved word, or any command of an AND-OR list other than the last.

The second post notes,

In summary, shouldn't set -e in (subshell code) operate independently of the surrounding context?

No. The POSIX description is clear that surrounding context affects whether set -e is ignored in a subshell.

There's a little more in the fourth post, also by Eric Blake,

Point 3 is not requiring subshells to override the contexts where set -e is ignored. That is, once you are in a context where -e is ignored, there is nothing you can do to get -e obeyed again, not even a subshell.

$ bash -c 'set -e; if (set -e; false; echo hi); then :; fi; echo $?' 
hi 
0 

Even though we called set -e twice (both in the parent and in the subshell), the fact that the subshell exists in a context where -e is ignored (the condition of an if statement), there is nothing we can do in the subshell to re-enable -e.

This behavior is definitely surprising. It is counter-intuitive: one would expect the re-enabling of set -e to have an effect, and that the surrounding context would not take precedent; further, the wording of the POSIX standard does not make this particularly clear. If you read it in the context where the command is failing, the rule does not apply: it only applies in the surrounding context, however, it applies to it completely.

Thanatos
  • 917
  • Thanks for those links, they were very interesting. However, my example is (IMO) substantively different. Most of that discussion is whether set -e in a parent shell is inherited by the subshell: set -e; (false; echo passed;) || echo failed. It does not surprise me, actually, that -e is ignored in this case given the wording of the standard. In my case, though, I'm explicitly setting -e in the subshell, and expecting the subshell to exit on failure. There's no AND-OR list in the subshell... – MadScientist Feb 21 '13 at 12:36
  • 1
    I disagree. The second post (I can't get the anchors to work) says "The POSIX description is clear that surrounding context affects whether set -e is ignored in a subshell." - the subshell is in the AND-OR list. – Aaron D. Marasco Feb 21 '13 at 23:38
  • 1
    The fourth post (also Erik Blake) also says "Even though we called set -e twice (both in the parent and in the subshell), the fact that the subshell exists in a context where -e is ignored (the condition of an if statement), there is nothing we can do in the subshell to re-enable -e." – Aaron D. Marasco Feb 21 '13 at 23:41
  • You're right; I'm not sure how I misread those. Thanks. – MadScientist Feb 22 '13 at 05:00
  • 6
    I am delighted to learn that this behavior I'm tearing my hair out over turns out to be in POSIX spec. So what is the work around?! if and || and && are infectious? this is absurd – Steven Lu Jun 04 '15 at 20:40
  • BTW set -e works as we might expect it to, inside bash in OS X so i am running my test script on Linux now and it happily keeps on going when everything fails. Because I wrote my script on OS X. – Steven Lu Jun 04 '15 at 20:50
  • 1
    @StevenLu, see my answer which explains how to fix this. – skozin Jan 11 '16 at 16:47
  • 1
    very un-intuitive. arg. :|. – Trevor Boyd Smith Mar 20 '19 at 19:42
  • @AaronD.Marasco :waves: Just stumbled on your answer while looking this up :) – Iguananaut Feb 02 '22 at 15:24
  • @Iguananaut LOL yeah about every two to three months I get a notification that somebody else stumbled upon this giant "WTF?" 9 years and counting... – Aaron D. Marasco Feb 02 '22 at 23:46
  • 11 years LOL... @Shayan I wouldn't expect it to crash on if because the if is using the result. set -e is to exit on an uncaptured error condition. – Aaron D. Marasco Mar 19 '24 at 23:29
14

Indeed, set -e has no effect inside subshells if you use || operator after them; e.g., this wouldn't work:

#!/bin/sh

# prints:
#
# --> outer
# --> inner
# ./so_1.sh: line 16: some_failed_command: command not found
# <-- inner
# <-- outer

set -e

outer() {
  echo '--> outer'
  (inner) || {
    exit_code=$?
    echo '--> cleanup'
    return $exit_code
  }
  echo '<-- outer'
}

inner() {
  set -e
  echo '--> inner'
  some_failed_command
  echo '<-- inner'
}

outer

Aaron D. Marasco in his answer does a great job of explaining why it behaves this way.

Here is a little trick that can be used to fix this: run the inner command in background, and then immediately wait for it. The wait builtin will return the exit code of the inner command, and now you're using || after wait, not the inner function, so set -e works properly inside the latter:

#!/bin/sh

# prints:
#
# --> outer
# --> inner
# ./so_2.sh: line 27: some_failed_command: command not found
# --> cleanup

set -e

outer() {
  echo '--> outer'
  inner &
  wait $! || {
    exit_code=$?
    echo '--> cleanup'
    return $exit_code
  }
  echo '<-- outer'
}

inner() {
  set -e
  echo '--> inner'
  some_failed_command
  echo '<-- inner'
}

outer

Here is the generic function that builds upon this idea. It should work in all POSIX-compatible shells if you remove local keywords, i.e. replace all local x=y with just x=y:

# [CLEANUP=cleanup_cmd] run cmd [args...]
#
# `cmd` and `args...` A command to run and its arguments.
#
# `cleanup_cmd` A command that is called after cmd has exited,
# and gets passed the same arguments as cmd. Additionally, the
# following environment variables are available to that command:
#
# - `RUN_CMD` contains the `cmd` that was passed to `run`;
# - `RUN_EXIT_CODE` contains the exit code of the command.
#
# If `cleanup_cmd` is set, `run` will return the exit code of that
# command. Otherwise, it will return the exit code of `cmd`.
#
run() {
  local cmd="$1"; shift
  local exit_code=0

  local e_was_set=1; if ! is_shell_attribute_set e; then
    set -e
    e_was_set=0
  fi

  "$cmd" "$@" &

  wait $! || {
    exit_code=$?
  }

  if [ "$e_was_set" = 0 ] && is_shell_attribute_set e; then
    set +e
  fi

  if [ -n "$CLEANUP" ]; then
    RUN_CMD="$cmd" RUN_EXIT_CODE="$exit_code" "$CLEANUP" "$@"
    return $?
  fi

  return $exit_code
}


is_shell_attribute_set() { # attribute, like "x"
  case "$-" in
    *"$1"*) return 0 ;;
    *)    return 1 ;;
  esac
}

Example of usage:

#!/bin/sh
set -e

# Source the file with the definition of `run` (previous code snippet).
# Alternatively, you may paste that code directly here and comment the next line.
. ./utils.sh


main() {
  echo "--> main: $@"
  CLEANUP=cleanup run inner "$@"
  echo "<-- main"
}


inner() {
  echo "--> inner: $@"
  sleep 0.5; if [ "$1" = 'fail' ]; then
    oh_my_god_look_at_this
  fi
  echo "<-- inner"
}


cleanup() {
  echo "--> cleanup: $@"
  echo "    RUN_CMD = '$RUN_CMD'"
  echo "    RUN_EXIT_CODE = $RUN_EXIT_CODE"
  sleep 0.3
  echo '<-- cleanup'
  return $RUN_EXIT_CODE
}

main "$@"

Running the example:

$ ./so_3 fail; echo "exit code: $?"

--> main: fail
--> inner: fail
./so_3: line 15: oh_my_god_look_at_this: command not found
--> cleanup: fail
    RUN_CMD = 'inner'
    RUN_EXIT_CODE = 127
<-- cleanup
exit code: 127

$ ./so_3 pass; echo "exit code: $?"

--> main: pass
--> inner: pass
<-- inner
--> cleanup: pass
    RUN_CMD = 'inner'
    RUN_EXIT_CODE = 0
<-- cleanup
<-- main
exit code: 0

The only thing that you need to be aware of when using this method is that all modifications of Shell variables done from the command you pass to run will not propagate to the calling function, because the command runs in a subshell.

skozin
  • 363
  • Another workaround: Do something like my_func() { ( set -x; my_commands; ...; ) }; set +x; my_func; return_code=$?; set -x. It's dumb, but maybe more compact, and still prevents the parent script from prematurely existing? – Eric Cousineau Jul 27 '20 at 17:04
  • Ah, Ciro had done this here: https://unix.stackexchange.com/a/452584/304032 – Eric Cousineau Jul 27 '20 at 17:04
  • I think wait exit status 127 (i.e. the background job has already exited by the time wait is called) needs to be treated as 0. Of course, you'd lose the detection of a status 127 in the inner command, but that's the lesser of two evils, I guess. – EndlosSchleife Apr 13 '23 at 11:13
  • Nice trick with the background job. However, why set -e in the generic run function? run shouldn't need it, and without it, it would more flexible by letting the caller choose (plus simpler). – EndlosSchleife Apr 13 '23 at 12:52
2

Workaround when usint toplevel set -e

I came to this question because I was using set -e as an error detection method:

/usr/bin/env bash
set -e
do_stuff
( take_best_sub_action_1; take_best_sub_action_2 ) || do_worse_fallback
do_more_stuff

and without ||, the script would stop running and never reach do_more_stuff.

Since there seems to be no clean solution, I think I will be just doing a simple set +e on my scripts:

/usr/bin/env bash
set -e
do_stuff
set +e
( take_best_sub_action_1; take_best_sub_action_2 )
exit_status=$?
set -e
if [ "$exit_status" -ne 0 ]; then
  do_worse_fallback
fi
do_more_stuff
Ciro Santilli OurBigBook.com
  • 18,092
  • 4
  • 117
  • 102
1

I wouldn't rule out it's a bug just because several shells behave that way. ;-)

I have more fun to offer:

start cmd:> ( eval 'set -e'; false; echo passed; ) || echo failed
passed

start cmd:> ( eval 'set -e; false'; echo passed; ) || echo failed
failed

start cmd:> ( eval 'set -e; false; echo passed;' ) || echo failed
failed

May I quote from man bash (4.2.24):

The shell does not exit if the command that fails is [...] part of any command executed in a && or || list except the command following the final && or || [...]

Perhaps the eval over several commands leads to ignoring the || context.

Hauke Laging
  • 90,279
  • Well, if all the shells behave that way it's by definition not a bug... it's standard behavior :-). We may lament the behavior as non-intuitive but... The trick with eval is very interesting, that's for sure. – MadScientist Feb 21 '13 at 12:42
  • 1
    What shell do you use? The eval trick does not work for me. I tried bash, bash in posix mode, and dash. – Dunatotatos Sep 12 '17 at 16:40
  • @Dunatotatos, as Hauke said, that was bash4.2. It was "fixed" in bash4.3. pdksh-based shells will have the same "issue". And several versions of several shells have all sorts of different "issues" with set -e. set -e is broken by design. I wouldn't use it for anything but the simplest of shell scripts without control structures, subshells or command substitutions. – Stéphane Chazelas Sep 22 '17 at 14:04
0

Approach with running command in a background suggested by @skozin won't work in bash. bash still disables set -e for background commands. dash and ash work fine though.

Paulo Tomé
  • 3,782