I have a large CSV file and want to empty certain columns if they were seen before.
So I have (to illustrate my problem):
Category | Subcategory
---------+------------
foo | bar
foo | bar
foo | foobar
foo | foobar
And I want:
Category | Subcategory
---------+------------
foo | bar
|
| foobar
|
The whole CSV is sorted (with sort --strong -k 1,2
), so I just need a way to do my task with one column and can later use the same method with the other column.
Basically: delete every occurence of "foo" except the first.
It is similar to this question, but I don't want to remove the complete line ..
I'm not sure how to do this, since I'm not that into awk. Can anyone help me?