0

I can't append to an array when I use parallel, no issues using a for loop.

Parallel example:

append() { arr+=("$1"); } 
export -f append

parallel -j 0 append ::: {1..4} declare -p arr

Output:

-bash: declare: arr: not found

For loop:

for i in {1..4}; do arr+=("$i"); done
declare -p arr

Output:

declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4")

I thought the first example is a translation of the for loop in functional style, so what's going on?

Nickotine
  • 467

3 Answers3

2

Your parallel appears to be the GNU one, which is a perl script that runs commands in parallel.

It tries very hard to tell what shell it is being invoked from so that the command that you pass to it is interpreted by that shell, but to do that it runs a new invocation of that shell in separate processes.

If you run:

bash-5.2$ env SHELLOPTS=xtrace PS4='bash-$$> ' strace -qqfe /exec,/exit -e signal=none  parallel -j 0 append ::: {1..4}
execve("/usr/bin/parallel", ["parallel", "-j", "0", "append", ":::", "1", "2", "3", "4"], 0x7ffe5e848c90 /* 56 vars */) = 0
[...skipping several commands run by parallel during initialisation...]
[pid  7567] execve("/usr/bin/bash", ["/usr/bin/bash", "-c", "append 1"], 0x55a2615f03e0 /* 67 vars */) = 0
bash-7567> append 1
bash-7567> arr+=("$1")
[pid  7567] exit_group(0)               = ?
[pid  7568] execve("/usr/bin/bash", ["/usr/bin/bash", "-c", "append 2"], 0x55a2615f03e0 /* 67 vars */) = 0
[pid  7568] exit_group(0)               = ?
[pid  7569] execve("/usr/bin/bash", ["/usr/bin/bash", "-c", "append 3"], 0x55a2615f03e0 /* 67 vars */) = 0
bash-7568> append 2
bash-7568> arr+=("$1")
[pid  7569] exit_group(0)               = ?
[pid  7570] execve("/usr/bin/bash", ["/usr/bin/bash", "-c", "append 4"], 0x55a2615f03e0 /* 67 vars */) = 0
bash-7569> append 3
bash-7569> arr+=("$1")
[pid  7570] exit_group(0)               = ?
bash-7570> append 4
bash-7570> arr+=("$1")
exit_group(0)                           = ?

Where strace shows what commands are executed by what process and the xtrace option causes the shell to show what it does.

You'll see each bash shell appending an element to their own $arr, and then exit, and of course their own memory space including their individual $arr array is gone, the $arr array is not automagically shared between all bash shell invocations on your system.

In any case, running commands concurrently implies running them in different processes, so there's no way it can run those functions in the invoking shell, those functions will be run in new shell instances in separate processes and they will update the arr variables of those shells, not the one of the shell you run parallel from.

Given that bash has not builtin multithreading support, even if parallel was an internal command of the shell or implemented as a shell function, it would still need to run the commands in separate processes each process having their own memory. You'll find that in:

append 1 & append 2 & append 3 & wait

Or:

append 1 | append 2 | append 3

The $arr array of the parent shell is not modified either.

If you want to collect the result of each job started by parallel, you can do it via stdout or via files.

For instance:

#! /bin/bash -
do_something() {
  output=$(
    echo "$1: some complex computation or otherwise there would
          be no point using GNU parallel and its big overhead"
  )
  # output the result NUL delimited.
  printf '%s\0' "$output"
}
export -f do_something
readarray -td '' arr < <(
  PARALLEL_SHELL=/bin/bash parallel do_something ::: {1..4}
)
typeset -p arr

(here telling parallel which shell to use for it to avoid having to guess).

Note that parallel stores the output of each shell in a temporary file and dumps them in order on stdout so you get the elements of the array in correct order.

  • I get it now thanks seems that the array is set for that parallel instance and so is lost. – Nickotine Jun 23 '23 at 20:17
  • @Nikotine, not really, see if the edit makes it any clearer? – Stéphane Chazelas Jun 24 '23 at 08:35
  • thanks that makes sense, I had it working with mapfile and writing to a file but I stopped doing mapfile since I got the much simpler arr=(whatever) from you or arr+=(whatever) and yes I normally use parallel when downloading a bunch of large video files or just because I was bored of loops and wanted to try functional style instead. – Nickotine Jun 24 '23 at 13:45
  • like mimicking python list comprehensions with parallel – Nickotine Jun 24 '23 at 13:49
  • @Nickotine, arr=( $(cmd) ) does split+glob which you generally need to tune before using and bash can't split on NULs, while its readarray (same as mapfile, but mapfile is a misnomer) can properly read a list of records into an array. – Stéphane Chazelas Jun 24 '23 at 13:52
  • still hung up on split+glob, I have no issues as I have IFS=$'\n' always set, does it only apply when using wildcards? – Nickotine Jun 24 '23 at 13:54
  • Try arr=( $(echo '/*/'; echo '/???/') ); typeset -p arr. that unquoted $(...) undergoes splitting (which you want) and globbing (which you don't want), hence the split+glob name for that "operator" (or misfeature depending on PoV), and why when using it, you need to tune it (set $IFS and enable or disable the noglob option) or use a proper shell with proper splitting operators such as zsh. – Stéphane Chazelas Jun 24 '23 at 14:01
  • I think I get it now but to make sure, so noglob should be set when you want to take * literally rather than as a wildcard? – Nickotine Jun 24 '23 at 14:05
  • 1
    You use noglob when you use $var or $(cmd) or $(( arith )) unquoted in order to split those expansions (and don't want the *, ?, and other wildcard operators in those to trigger filename generation which you almost never want). That's the What about when you do need the split+glob operator? section at Security implications of forgetting to quote a variable in bash/POSIX shells – Stéphane Chazelas Jun 24 '23 at 14:08
  • apologies, I've read your answer on that post a couple of times and I know that to be safe you should just always quote, but can I get away with thinking that as long as I don't use bash for api stuff I'm ok? And that my ssh keys are secure? Of course that's a risk but just hypothetically? – Nickotine Jun 24 '23 at 14:17
1

You should be aware of parset which is part of GNU Parallel:

$ parset arr echo ::: 1 2 3 4
$ declare -p arr
declare -a arr=([0]="1" [1]="2" [2]="3" [3]="4")

It does not append (so it is not a valid answer for your question), but it may still be useful to you.

Ole Tange
  • 35,514
  • I remember I was very excited at your parset recommendation a while ago but I failed to set it up properly then forgot about it. Yes I really should be using parset since I use parallel for creating arrays a lot. – Nickotine Jun 25 '23 at 09:32
0

This works:

arr+=($(parallel -j 0 echo ::: {1..4}))

The array was lost as it was set in the parallel instance, has nothing to do with the echo.

Nickotine
  • 467