1

From coreutils' manual about join

-e string

Replace those output fields that are missing in the input with string. I.e., missing fields specified with the -12jo options.

I don't understand the option at all. What do the following mean

  • "those output fields that are missing in the input"

  • "missing fields specified with the -12jo options"?

Thanks.

Tim
  • 101,790

1 Answers1

2

The slightly cryptic string -12jo refers to the four separate options -1, -2, -j and -o, of which the first three has to do with selecting what field in each file to join on and the last has to do with what fields from each file should be outputted. The -j option is an extension in GNU join and and -j n is the same as -1 n -2 n (where n is some integer).

The -e option comes into effect when you, with -a, request to get unpaired lines from one or both of the files that you join. An unpaired line will have missing data, as the line from one file did not correspond to a line in the other file. The -e option replaces those fields with the given string. Likewise, if you request, with -o, a field that does not exist on a particular line in a file, you would use -e to replace the empty values with a string.

Example: Two files that contain manufacturing costs and sales income for a number of products. Each file has the fields

  1. Product ID
  2. Product name
  3. Some number
$ cat expenses.txt
1   teacup  5
2   spoon   7
3   bowl    10

$ cat sales.txt
1   teacup  30
2   spoon   24

To get the expenses and sales for all products, while replacing the number (from either the first or second file) that may be missing with the string NONE, I would do

$ join -a1 -a2 -o0,1.2,1.3,2.3 -e NONE expenses.txt sales.txt
1 teacup 5 30
2 spoon 7 24
3 bowl 10 NONE

Here, I use the -a option twice to request all lines from both files (a "full outer join" in SQL speak). The -o option is used to get specific fields from each file (field 0 is the join field, which is the first field in each file by default), and -e to specify the string NONE to replace missing value with.

As you can see, we get NONE as the "sales value" since the product with ID 3 was not mentioned in that second file.

Kusalananda
  • 333,661
  • 1
    Note that it's not only about -a. join -o 1.2 -e NONE <(echo a) <(echo a) would also output NONE – Stéphane Chazelas Jul 24 '18 at 06:59
  • Thanks. Why does join -a 1 -a 2 -t " " -e "NULL" -1 1 -2 1 <(sort file1) <(sort file2) not output NULL, suppose file2 has unpairable line? – Tim Jul 24 '18 at 13:21
  • file1 and file2 are here https://unix.stackexchange.com/q/458068/674 – Tim Jul 24 '18 at 13:27
  • @Tim join -a 2 -o0,1.2,2.2 -t " " -e "NULL" <(sort file1) <(sort file2) – Kusalananda Jul 24 '18 at 13:31
  • Thanks. Wondering why the command doesn't output NULL? When does -e work? Does -e only work with -o? – Tim Jul 24 '18 at 13:33
  • @Tim Correct. join does not output missing values unless you explicitly request those fields with -o. Missing values will then be blank, unless you use -e. – Kusalananda Jul 24 '18 at 13:35
  • Thanks. What does "missing fields specified with the -12jo options" in the manual mean? Does it mean those missing fields which are specified with -12jo options? How are missing fields specified with -12j options? -12j options are used only to specify join fields, not missing fields. – Tim Jul 25 '18 at 05:35
  • @Tim The join field itself, which is affected by those options, may be missing for one or several entries in one of the files. – Kusalananda Jul 25 '18 at 06:01
  • Thanks. https://unix.stackexchange.com/questions/458344/what-happens-if-i-specify-nonexisting-join-fields-for-one-or-both-of-the-input – Tim Jul 25 '18 at 12:36