-3

I have a large CSV file and want to empty certain columns if they were seen before.

So I have (to illustrate my problem):

Category | Subcategory
---------+------------
foo      | bar
foo      | bar
foo      | foobar
foo      | foobar

And I want:

Category | Subcategory
---------+------------
foo      | bar
         | 
         | foobar
         |

The whole CSV is sorted (with sort --strong -k 1,2), so I just need a way to do my task with one column and can later use the same method with the other column. Basically: delete every occurence of "foo" except the first.

It is similar to this question, but I don't want to remove the complete line ..

I'm not sure how to do this, since I'm not that into awk. Can anyone help me?

Anonym
  • 21
  • 1
  • 3

1 Answers1

1

Suppose this is usual task for print unique fild in awk

awk -F"[| ]+" -v OFS=" |" '
NR==1 {
    for (i=0;i<length($1);i++)
        blank=" " blank
}
{
    if (($1,$2) in b) 
        $2=""
    else 
        b[$1,$2]=1
    if ($1 in a) 
        $1=blank
    else 
        a[$1]=1
    print 
 }' large.csv
Costas
  • 14,916