There are various ways of solving this problem with shell scripts, but I prefer to reach for a tool that is not quite standard yet: Miller. You can install it with apt install miller
on Ubuntu/Debian. I find that Miller's verbs are a more natural tool for thinking about this kind of problem than bash or awk.
If the data specified in the question is stored in INPUT_FILE
:
A,val1
A,val2
A,val3
B,val1
B,val2
B,val3
Then Miller's nest
verb can be used to pack multiple records (rows) into a single record with multiple values in field 2, and expand field 2 into multiple fields:
mlr --ocsv --headerless-csv-output \
nest --implode --values --across-records -f 2 then \
nest --explode --values --across-fields -f 2 INPUT_FILE
This produces the output you want:
A,val1,val2,val3
B,val1,val2,val3
There's probably an even simpler way to do this in Miller, but that was the first solution I found.
A
, then the lines starting withB
and so on? Do you want exactly the same order of thevals
within the output line as they were within the input file, or is additional sorting required? – AdminBee Sep 15 '20 at 10:27