Searching in a CSV for multiple values and get the associated values from it

Question

I have a file.csv

"ItemNo","Name","Weight"
"a001","Item a","1.1"
"a002","Item x","1.2"
"a003","Item_4","1.0"
"a004","Item b","1.1"
"a005","Itemb2","2.0"
"a006","a004","2.0"

Also I have a few itemno.csv

"a003"
"a001"
"a004"

I am looking for a command to create a list of the "Names" that are associated with the "ItemNo"...

so my output.csv should be

"Item_4"
"Item a"
"Item b"

Can somebody help?

each item.no of file.csv in the first column is unique. But there are a001, a001-b1, a001-b2 and so on.. but if you search for "a001", "a001-b1", ... everything should be unique again.

I don't need a specific tool.. every useful solution would be ok. But it would be good if only the first row is searched (assuming the Item.Name of ItemNo "a006" (column 1) is "a004" (column 2)).

I tried the grep command

grep -f itemno.csv file.csv | awk -F, '{print $2}'

But the result was only the output from the last line:

"Item b"

I tried the awk command

awk -F, 'NR==FNR{a[$1]; next} $1 in a{print $2}' itemno.csv file.csv

But the result was only the output from the last line:

"Item b"

Maybe a loop command is a better idea?

So I tried this loop

while read -r line; do
    grep "${line}" file.csv | awk -F "," '{print $2}';
done < itemno.csv

But there was no output... It seems like after each line there is another \r

So I tried this command

while read line; do
    grep $(printf ${line} | sed 's/\r//g') file.csv | awk -F "," '{print $2}';
done < itemno2.csv

with this itemno2.csv

"a003"
"a001"
"a002"
"a004"

And the output was:

"Item a"
"Item x"

Only with this strange loop command I manage to search for the ItemNumbers (and this command is ignoring the first and last line).

Welcome to the site. Is it guaranteed that there is a unique 1-to-1 mapping between item numbers and names (i.e. no name applies to more than one item number, and vice versa)? Do you require the solution to use a specific tool (such as awk or grep)? — AdminBee, Jan 26 '23 at 13:18
Were your .csv files created on a dos/windows system? i.e. do they have CRLF line-endings rather than just LF? if so, then convert them to unix style text files (LF only). e.g. with fromdos. The itemno.csv file in particular needs to be converted to LF-only line-endings (otherwise the ^M in each itemno.csv line will prevent the matches from working) — cas, Jan 27 '23 at 05:29
if you don't have fromdos, you can use sed -i.bak -e 's/\r\n/\n/' itemno.csv file.csv. Or just tell awk to use CRLF as the record separator - i.e. add BEGIN{RS="\r\n"}; to the start of the awk script. or BEGIN{RS="\r?\n"}; to work with both dos and windows text files. — cas, Jan 27 '23 at 05:31
@cas SORRY... you were right! I don't know how but my itemno.csv had \r on each ending of a line... With a normal itemno.csv awk and grep are working fine. — R 9000, Jan 27 '23 at 07:47
Regarding Maybe a loop command is a better idea? - no, that'd be extremely slow and fragile, see why-is-using-a-shell-loop-to-process-text-considered-bad-practice. — Ed Morton, Jan 27 '23 at 16:27

Kusalananda · Answer 1 · 2023-01-27T06:55:00.367

Your input data is one CSV file and one header-less CSV file.

Starting by adding a header to the header-less CSV file so that we later may refer to the field we're joining with by name as ItemNo. We do this with Miller (mlr), reading the data as headerless using --implicit-csv-header and then employing the label sub-command to add the ItemNo label to the first column.

$ mlr --csv --implicit-csv-header label ItemNo itemno.csv
ItemNo
a003
a001
a004

By using --implicit-csv-header, Miller internally labels the first field as 1 instead of picking the label from the first line. The label sub-command then changes this to ItemNo.

The fact that the data in the output is unquoted does not matter as it doesn't need to be quoted (it contains no embedded delimiters or newlines etc.) Miller automatically quotes fields that need quoting.

We can then use this in a join operation in Miller:

$ mlr --csv --implicit-csv-header label ItemNo itemno.csv | mlr --csv join -f file.csv -j ItemNo
ItemNo,Name,Weight
a003,Item_4,1.0
a001,Item a,1.1
a004,Item b,1.1

This does a relational "inner join" operation between the ItemNo fields of the data in file.csv and the data coming from the first mlr command in the pipeline.

We may then string on a cut operation that extracts the Name field:

$ mlr --csv --implicit-csv-header label ItemNo itemno.csv | mlr --csv join -f file.csv -j ItemNo then cut -f Name
Name
Item_4
Item a
Item b

Adding --headerless-csv-output we can get header-less CSV output as in the question, and with --quote-all we can force Miller to quote all output fields, even though it's not necessary to do so:

$ mlr --csv --implicit-csv-header label ItemNo itemno.csv | mlr --csv --headerless-csv-output --quote-all join -f file.csv -j ItemNo then cut -f Name
"Item_4"
"Item a"
"Item b"

Miller does not care whether the input files are DOS or Unix text files, and it's capable of parsing CSV files with complex fields.

Ed Morton · Answer 2 · 2023-01-27T16:29:22.910

1

Using any awk:

$ awk -F, '{sub(/\r$/,"")} NR==FNR{a[$1]; next} $1 in a{print $2}' itemno.csv file.csv
"Item a"
"Item_4"
"Item b"

I added that sub() at the front since you updated your question to tell us you have DOS line endings.

edited Jan 27 '23 at 16:29

answered Jan 26 '23 at 23:50

Ed Morton

31,617

score 0 · Accepted Answer · answered Jan 27 '23 at 08:14

Now I know what my problem was.

The itemno.csv was:

"a003"\r
"a001"\r
"a004"

Without the \r at the end of each line, the command

grep -f itemno.csv file.csv | awk -F, '{print $2}'

is working better and the output is

"Item_4"
"Item a"
"Item b"
"a004"

Without the \r at the end of each line, the command

awk -F, 'NR==FNR{a[$1]; next} $1 in a{print $2}' itemno.csv file.csv

is working better and the output is

"Item_4"
"Item a"
"Item b"

Without the \r at the end of each line, the command the loop command

while read line; do
    grep "${line}" file.csv | awk -F "," '{print $2}';
done < itemno.csv

gives this output

"Item_4"
"Item a"

Because the while read command is not reading the last line.

Searching in a CSV for multiple values and get the associated values from it

Maybe a loop command is a better idea?

3 Answers3