I have a script equijoin2
:
#! /bin/bash
# default args
delim="," # CSV by default
outer=""
outerfile=""
# Parse flagged arguments:
while getopts "o:td:" flag
do
case $flag in
d) delim=$OPTARG;;
t) delim="\t";;
o) outer="-a $OPTARG";;
?) exit;;
esac
done
# Delete the flagged arguments:
shift $(($OPTIND -1))
# two input files
f1="$1"
f2="$2"
# cols from the input files
col1="$3"
col2="$4"
join "$outer" -t "$delim" -1 "$col1" -2 "$col2" <(sort "$f1") <(sort "$f2")
and two files
$ cat file1
c c1
b b1
$ cat file2
a a2
c c2
b b2
Why does the last command fail? Thanks.
$ equijoin2 -o 2 -d " " file1 file2 1 1
a a2
b b1 b2
c c1 c2
$ equijoin2 -o 1 -d " " file1 file2 1 1
b b1 b2
c c1 c2
$ equijoin2 -d " " file1 file2 1 1
join: extra operand '/dev/fd/62'
sort -t '\t'
, (1) does it also apply tojoin -t
? (2) coreutils manual doesn't mention that or I miss it. The manual says "To specify ASCII NUL as the fi eld separator, use the two-character string‘ \0’, e.g., ‘sort -t ’\0’’." Is\t '\0'
an exception? – Tim Jul 24 '18 at 20:41join -t
, notsort -t
.join -t '\0'
is a GNU extension. Generally, other implementations of text utilities can't cope with NUL bytes as that's not text. NUL is the one byte that can't be passed as argument to an executed command, so it has to be represented by some form of encoding. – Stéphane Chazelas Jul 24 '18 at 20:46bash
, that's GNU join which chooses to understand\0
as the NUL byte.bash
's$'\0'
actually expands to the empty string, not a NUL byte.zsh
's$'\0'
expands to a NUL byte but only works for builtins or functions. A NUL byte can't be passed as argument to a command that is executed because the list of argument passed to theexecve()
system call is a list of NUL-delimited strings. – Stéphane Chazelas Jul 24 '18 at 21:08join -t '\0'
is a GNU extension". Do you mean the GNU extension allows just forjoin -t '\0'
or also for other such asjoin -t '\t'
? – Tim Jul 24 '18 at 21:09bash
expands$'\t'
to a TAB character.'\t'
passes a string of two characters\
andt
. The point is that you need to pass the character as-is tojoin -t
, you can also dojoin -t '<literal-tab-here-entered-with-Ctrl-V-tab-for-instance>'
orjoin -t "$(printf '\t')"
, but obviously that can't be done for the NUL character as a NUL character can never be passed in an argument, that's a limitation of theexecve()
system call. – Stéphane Chazelas Jul 24 '18 at 21:17-o 2
, that's one-a 2
argument" tojoin
, and why does that one-a 2
argument still work forjoin
in the same way as-a
and2
two arguments forjoin
? See the last three examples in my post. – Tim Jul 24 '18 at 21:34join '-a 2'
is likejoin -a ' 2'
and whenjoin
parses' 2'
to extract the number, it skips and ignores the leading spaces. – Stéphane Chazelas Jul 24 '18 at 21:38