Remove duplicates from particular columns

Question

I have a file in below format, where columns are separated with comma.

[1], Value1,   UAC,                 AB
[2.2], Check1, BOH D2A D2A BOH,     SD
[63], name2,   MFB MFB,              k
...

I want to remove duplicate values from column (say 3^rdcolumn) like below:

[1], Value1,   UAC,             AB
[2.2], Check1, BOH D2A ,        SD
[63], name2,   MFB,              k
...

How to use uniq or AWK for particular column.

order of entries after removing supplicates in 3rd column is matter? — αғsнιη, Apr 11 '19 at 09:14

score 0 · Accepted Answer · answered Apr 11 '19 at 12:35

with awk:

awk -F, '{
    printf $1 FS $2 FS; 
    split($3, arr, / +/); for(val in arr) !uniq_arr[ arr[val] ]++;
    for (key in uniq_arr) { 
        printf (key!="")? SPACE key:""; SPACE=" "; delete uniq_arr[key]
    };
    printf FS $4"\n"
}' infile

[1], Value1, UAC, AB
[2.2], Check1, D2A BOH, SD
[63], name2, MFB, k

This split($3, arr, / +/) splits column#3 into the array arr based on space separator (there may one-or-more spaces will be there as separator).
In for(val in arr) !uniq_arr[ arr[val] ]++, we are creating a new array with removed duplicated values taken from array arr; so the final values in array uniq_arr are only unique values in each column3.
next we are just printing saved values in uniq_arr and delete that key after it printed; note that values of column#1, #2 & #4 were printed separately.

Remove duplicates from particular columns

2 Answers2