Empty columns which were seen before

Question

I have a large CSV file and want to empty certain columns if they were seen before.

So I have (to illustrate my problem):

Category | Subcategory
---------+------------
foo      | bar
foo      | bar
foo      | foobar
foo      | foobar

And I want:

Category | Subcategory
---------+------------
foo      | bar
         | 
         | foobar
         |

The whole CSV is sorted (with sort --strong -k 1,2), so I just need a way to do my task with one column and can later use the same method with the other column. Basically: delete every occurence of "foo" except the first.

It is similar to this question, but I don't want to remove the complete line ..

I'm not sure how to do this, since I'm not that into awk. Can anyone help me?

you should try something before post questions and include your attempts, SO it is not code write resource. — Roman Kiselenko, Jan 23 '15 at 15:08
Additionally, I assume that construction you're showing is just for illustration, right? Because that's definitely not a csv. — HalosGhost, Jan 23 '15 at 15:18
@Зелёный: Yes, you're right, sorry. ... and I forgot to ask a question :( — Anonym, Jan 23 '15 at 15:25

Costas · Answer 1 · 2015-01-23T16:29:50.503

1

Suppose this is usual task for print unique fild in awk

awk -F"[| ]+" -v OFS=" |" '
NR==1 {
    for (i=0;i<length($1);i++)
        blank=" " blank
}
{
    if (($1,$2) in b) 
        $2=""
    else 
        b[$1,$2]=1
    if ($1 in a) 
        $1=blank
    else 
        a[$1]=1
    print 
 }' large.csv

edited Jan 23 '15 at 16:29

answered Jan 23 '15 at 15:16

Costas

14,916

Empty columns which were seen before

1 Answers1