2

I have a script equijoin2:

#! /bin/bash

# default args                                                                                                                                                                    
delim="," # CSV by default                                                                                                                                                        
outer=""
outerfile=""
# Parse flagged arguments:                                                                                                                                                        
while getopts "o:td:" flag
do
  case $flag in
    d) delim=$OPTARG;;
    t) delim="\t";;
    o) outer="-a $OPTARG";;
    ?) exit;;
  esac
done
# Delete the flagged arguments:                                                                                                                                                   
shift $(($OPTIND -1))
# two input files                                                                                                                                                                 
f1="$1"
f2="$2"
# cols from the input files                                                                                                                                                       
col1="$3"
col2="$4"


join "$outer" -t "$delim" -1 "$col1" -2 "$col2" <(sort "$f1") <(sort "$f2")

and two files

$ cat file1
c c1
b b1
$ cat file2
a a2
c c2
b b2

Why does the last command fail? Thanks.

$ equijoin2 -o 2  -d " " file1 file2 1 1
a a2
b b1 b2
c c1 c2
$ equijoin2 -o 1  -d " " file1 file2 1 1
b b1 b2
c c1 c2
$ equijoin2   -d " " file1 file2 1 1
join: extra operand '/dev/fd/62'
Tim
  • 101,790

1 Answers1

4

"$outer" is a quoted scalar variable so it always expands to one argument. If empty or unset, that still expands to one empty argument to join (and when you call your script with -o2, that's one -a 2 argument instead of the two arguments -a and 2).

Your join is probably GNU join in that it accepts options after non-option arguments. That "$outer" is a non-option argument when empty as it doesn't start with - so is treated as a file name and join complains about the third file name provided which it doesn't expect.

If you want a variable with a variable number of arguments, use an array:

outer=()
...
(o)
   outer=(-a "$OPTARG");;

...
join "${outer[@]}"

Though here you could also do:

outer=
...
(o)
   outer="-a$OPTARG";;
...
join ${outer:+"$outer"} ... <(sort < "$f1") <(sort < "$f2")

Or:

unset -v outer
...
(o)
   outer="$OPTARG";;
...
join ${outer+-a "$outer"} ...

(that one doesn't work in zsh except in sh/ksh emulation).

Some other notes:

  • join -t '\t' doesn't work. You'd need delim=$'\t' to store a literal TAB in $delim
  • Remember to use -- when passing arbitrary arguments to commands (or use redirections where possible). So sort -- "$f1" or better sort < "$f1" instead of sort "$f1".
  • arithmetic expansions are also subject to split+glob so should also be quoted (shift "$((OPTIND - 1))") (here not a problem though as you're using bash which doesn't inherit $IFS from the environment and you're not modifying IFS earlier in the script, but still good practice).
  • Thanks. For sort -t '\t', (1) does it also apply to join -t? (2) coreutils manual doesn't mention that or I miss it. The manual says "To specify ASCII NUL as the fi eld separator, use the two-character string‘ \0’, e.g., ‘sort -t ’\0’’." Is \t '\0' an exception? – Tim Jul 24 '18 at 20:41
  • @Tim, my bad, I meant join -t, not sort -t. join -t '\0' is a GNU extension. Generally, other implementations of text utilities can't cope with NUL bytes as that's not text. NUL is the one byte that can't be passed as argument to an executed command, so it has to be represented by some form of encoding. – Stéphane Chazelas Jul 24 '18 at 20:46
  • @Tim, that's not bash, that's GNU join which chooses to understand \0 as the NUL byte. bash's $'\0' actually expands to the empty string, not a NUL byte. zsh's $'\0' expands to a NUL byte but only works for builtins or functions. A NUL byte can't be passed as argument to a command that is executed because the list of argument passed to the execve() system call is a list of NUL-delimited strings. – Stéphane Chazelas Jul 24 '18 at 21:08
  • Sorry deleted the comment. But please keep your reply. The reason I asked if it is bash's ANSI C quoting is "sort won’t accept ‘\t’, since it treats it as a multi-byte character. The solution is to place a $ before it. The dollar sign tells bash to use ANSI-C quoting" https://robfelty.com/2008/07/14/sort-using-tab-as-field-separator-in-bash Is it wrong? – Tim Jul 24 '18 at 21:08
  • "join -t '\0' is a GNU extension". Do you mean the GNU extension allows just for join -t '\0' or also for other such as join -t '\t'? – Tim Jul 24 '18 at 21:09
  • bash expands $'\t' to a TAB character. '\t' passes a string of two characters \ and t. The point is that you need to pass the character as-is to join -t, you can also do join -t '<literal-tab-here-entered-with-Ctrl-V-tab-for-instance>' or join -t "$(printf '\t')", but obviously that can't be done for the NUL character as a NUL character can never be passed in an argument, that's a limitation of the execve() system call. – Stéphane Chazelas Jul 24 '18 at 21:17
  • Another question "when you call your script with -o 2, that's one -a 2 argument" to join, and why does that one -a 2 argument still work for join in the same way as -a and 2 two arguments for join? See the last three examples in my post. – Tim Jul 24 '18 at 21:34
  • @Tim, because join '-a 2' is like join -a ' 2' and when join parses ' 2' to extract the number, it skips and ignores the leading spaces. – Stéphane Chazelas Jul 24 '18 at 21:38
  • Thanks. Which provides the way of typing tab by Ctrl-V-tab: bash's readline or terminal emulator or X window system? More at https://unix.stackexchange.com/q/458242/674 – Tim Jul 25 '18 at 16:21