The following does not sort the data across the columns, just extracts the unique values. It's unclear whether the sorting is needed.
Using awk
:
$ awk '{ n=split($0,a,FS); $0=""; j=1; delete u; for (i=1; i<=n; i++) if (!u[a[i]]++) $(j++) = a[i]; print }' <file
1 2 5
1 5 3
1 5
5 2
2 4 3
The program, laid out nicely, with comments:
{
# split the current record into fields in the array a
n = split($0, a, FS)
# empty the current record
$0=""
# j is the next field number that we are to set
# in the record that we are building
j=1
# seen is an associative array that we use to
# keep track of whether we've seen a bit of
# data before from this record
delete seen
# loop over the entries in a (the original
# fields of the input data)
for (i=1; i<=n; i++)
# if we haven't seen this data before,
# mark it as seen and...
if (!seen[a[i]]++)
# add it to the j:th field in the new record
$(j++) = a[i]
print
}
The idea I've gone with here is to build an output record with the unique fields from the original data, for each line of input.
"Record" is synonymous with "line" by default, and "field" is synonymous with "column" (these are just more general words that depend on the current values in RS
and FS
).