I wish to keep only certain columns of a CSV file, based on the structure of the header line.
Description of data:
- In the header line (fields of country), several fields are empty.
- The number of columns per country vary. There can be 3 columns or 10 columns per country.
- The number of columns in the header line, which starting from the position where the fields are not empty is dynamic. It can have 2 columns or 100 columns.
The objective is to keep the first field of each country if countries exists on first line.
How can I do this using awk
please?
The example is like this:
- input: file.csv
,,,fr,fr,fr,ch,ch,ch num,nom,date reg,match flag,date1,date2,match flag,date1,date2 0001,AA,2020-05-15,reg1,2019-02-03,2019-02-05,reg2,2019-05-06,2019-06-10 0002,AAA,2020-05-20,,,,reg3,2020-05-06,2020-06-10
- Desired output: file1.csv
,,,fr,ch num,nom,date reg,match flag_fr,match flag_ch 0001,AA,2020-05-15,reg1_fr,reg2_ch 0002,AAA,2020-05-20,,reg3_ch
Thank four your help.
abc,"def,ghi",jkl
– Chris Davies Jul 09 '20 at 08:23