3

Because the input to join must be sorted, often the command is called similarly to:

join <(sort file1) <(sort file2)

This is not portable as it uses process substitution, which is not specified by POSIX.

join can also use the standard input by specifying - as one of the file arguments. However, this only allows for sorting one of the files through a pipeline:

sort file1 | join - <(sort file2)

It seems there should be a simple way to accomplish sorting of both files and then joining the results using POSIX-specified features only. Perhaps something using redirection to a third file descriptor, or perhaps it will require created a FIFO. However, I'm having trouble visualizing it.

How can join be used POSIXly on unsorted files?

Wildcard
  • 36,499

1 Answers1

5

You can do it with two named pipes (or of course you could use one named pipe and stdin):

mkfifo a b
sort file1 > a &
sort file2 > b &
join a b

Process substitution works essentially by setting up those fifos (using /dev/fd/ instead of named pipes where available). For example, in bash:

$ echo join <(sort file1) <(sort file2)
join /dev/fd/63 /dev/fd/62

Note how bash has substituted the process with a file name, in /dev/fd. (Witout /dev/fd/, new enough versions of zsh, bash, and ksh93 will used named pipes.) It's left those open when invoking join, so when join opens those, it'll read from the two sorts. You can see them passed with some lsof-trickery:

$ sh -c 'lsof -a -d 0-999 -p $$; exit' <(sort file1) <(sort file2)
COMMAND  PID    USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
sh      1894 anthony    0u   CHR  136,5      0t0      8 /dev/pts/5
sh      1894 anthony    1u   CHR  136,5      0t0      8 /dev/pts/5
sh      1894 anthony    2u   CHR  136,5      0t0      8 /dev/pts/5
sh      1894 anthony   62r  FIFO   0,10      0t0 237085 pipe
sh      1894 anthony   63r  FIFO   0,10      0t0 237083 pipe

(The exit is to prevent a common optimization where the shell doesn't fork when there is only one command to run).

derobert
  • 109,670