I went through the answers in this helpful thread, but my problem seems to be different enough that I can't think of good answer (at least with sed
).
I have a large CSV file (200+ GB) with rows that look like the following:
<alphanumerical_identifier>,<number>
where <alphanumerical_identifier>
is unique across the entire file. I would like to create a separate file that replaces the first column by an index, i.e.
<index>,<number>
so that we get:
1, <number>
2, <number>
3, <number>
Can awk
generate an increasing index without loading the full file in memory?
Since the index increases monotonically, it may be even better to just drop the index. Would the solution for that be that different?, i.e.:
<number>
<number>
<number>
perl -pe '$_ = "$.,$_"' bigfile.csv > newfile.csv
but it would load full file in the memory, not sure – Raza Sep 18 '14 at 18:45awk -F, '{print ++n, $2}'
would work. Orawk -F, '{print $2}'
for the second variation. – G-Man Says 'Reinstate Monica' Sep 18 '14 at 18:52FNR
would serve just as well as++n
– iruvar Sep 18 '14 at 18:56