Using csvkit
,
$ csvsql -H --query 'SELECT a,min(b),max(c),d FROM file GROUP BY a' file.csv
a,min(b),max(c),d
164318,1449,1457,1922
841422,1221,1228,1860
842179,2115,2118,1485
846354,1512,1513,1590
This would load the CSV data into a temporary database (SQLite by default I believe), and then apply the given SQL query to it. The table will by default have the same name as the input file (sans suffix) and since the data lacks column headers, the default field names will be alphabetical.
The -H
options tells csvsql
that the data has no column headers.
To delete the generated header in the output, pipe the result through something like sed '1d'
.
To get zero-filled integers:
$ csvsql -H --query 'SELECT printf("%07d,%06d,%06d,%06d",a,min(b),max(c),d) FROM file GROUP BY a' file.csv
"printf(""%07d,%06d,%06d,%06d"",a,min(b),max(c),d)"
"0164318,001449,001457,001922"
"0841422,001221,001228,001860"
"0842179,002115,002118,001485"
"0846354,001512,001513,001590"
Here, the lines gets quoted since we're actually only requesting a single output field for each result record (and it contains commas). Another way to do it, which involves a bit more typing, but does not generate extra double quotes:
$ csvsql -H --query 'SELECT printf("%07d",a),printf("%06d",min(b)),printf("%06d",max(c)),printf("%06d",d) FROM file GROUP BY a' file.csv
"printf(""%07d"",a)","printf(""%06d"",min(b))","printf(""%06d"",max(c))","printf(""%06d"",d)"
0164318,001449,001457,001922
0841422,001221,001228,001860
0842179,002115,002118,001485
0846354,001512,001513,001590
Again, the output header can be removed by piping the result through sed '1d'
.
csvkit
based answers you have) OR 2) manipulate your file using something that is not SQL, nor an RDBMS (e.g. shell constructs,awk
, ...) in a way equivalent to the SQL command you provided. – fra-san Jan 17 '19 at 13:53