You can use join
for that. From man join
:
For each pair of input lines with identical join fields, write a line to standard output. The default join
field is the first, delimited by blanks.
Note that the the first line mustn't be sorted.
TLDR :
head -n 1 file1.csv; join -1 2 <(cat -n sort_keys.txt | sort -k 2) <(tail -n +2 file1.csv | sort) | sort -n -k 2 | awk '{ print $1, $3 }'
will do the job.
Explanations
We'll basically :
- extract the first line of file1.csv
- join the remaining of file1.csv with sort_keys, on the first field
- sort the result on the order of sort_keys
Additionaly, join
need the files to be sorted.
This will lead us with :
- as first input, number (in a prepended field) the sort_keys file (to be able to resort on this original order at the end), and sort on the 2nd field
cat -n sort_keys.txt | sort -k 2
3 apple
1 cat
4 cow
2 dog
- as second input, we take the csv file, skipping the first line, and sort it on the first field.
tail -n +2 file1.csv | sort
apple fruit
cat animal
cow animal
dog animal
- we can then join this alltogether, using the second field for the first process (
-1 2
) :
join -1 2 <(cat -n sort_keys.txt | sort -k 2) <(tail -n +2 file1.csv | sort)
apple 3 fruit
cat 1 animal
cow 4 animal
dog 2 animal
- the join result can now be sorted on the second field, numericaly (in case sort_keys has more than 9 entries), and we keep only 1st and 3rd fields
`... | sort -n -k 2 | awk '{ print $1, $3 }'
cat animal
dog animal
apple fruit
cow animal
- finally, prepend that with the first line of file1.csv
head -n 1 file1.csv; join -1 2 <(cat -n sort_keys.txt | sort -k 2) <(tail -n +2 file1.csv | sort) | sort -n -k 2 | awk '{ print $1, $3 }'
COLUMN1 COlUMN2
cat animal
dog animal
apple fruit
cow animal
Going further
Depending on your real data, you will have to adjust the field numbers, and the field separator.
You may also want to keep data lines whose key is not in sort_keys and or keep lines of sort_keys having no corresponding data lines (see -a
option of join).
Enjoy using join
!