get unique value for each line with unix command

Question

I have a list as below:

I would like to sort each line and obtain the unique values as below, something similar to sort | uniq :

I have been searching the net for the solution but I could only find the solution to sort by column. How can I get the output? Thanks in advance.

Does the resulting columns need to be sorted row-wise? – Kusalananda Oct 31 '18 at 10:28 — Kusalananda, Oct 31 '18 at 10:28
Similar: Remove duplicates values within a field – Stéphane Chazelas Oct 31 '18 at 10:41 — Stéphane Chazelas, Oct 31 '18 at 10:41

fra-san · Answer 1 · 2019-03-03T20:33:37.540

Since sorting rows is easier than sorting columns in a row, one approach can be to transpose each line (so that each field becomes a line), apply sort and uniq and then traspose them back.

Here is a naive implementation, assuming GNU tools:

$ while read -r line; do echo "$line" | grep -o '[^ ]*' | sort -h | uniq | paste -s; done <file

It loops through the file and, for each line:

grep with the -o option (print only the matching part of each line) splits its input into n lines, one for each matching substring. Here we are matching everything except for white spaces.
Split lines are sorted with the -h option, that compares human readable numbers (if you want to sort your fields as alphanumeric strings, remove -h).
The uniq command removes duplicates.
paste -s prints each line from standard input as fields of a single line separated by tabs. You can append a final | tr '\t' ' ' to change tabs into spaces.

Note, however, that using loops to process text is usually considered bad practice.

Kusalananda · Answer 2 · 2018-10-31T10:32:39.113

The following does not sort the data across the columns, just extracts the unique values. It's unclear whether the sorting is needed.

Using awk:

$ awk '{ n=split($0,a,FS); $0=""; j=1; delete u; for (i=1; i<=n; i++) if (!u[a[i]]++) $(j++) = a[i]; print }' <file
1 2 5
1 5 3
1 5
5 2
2 4 3

The program, laid out nicely, with comments:

{
    # split the current record into fields in the array a
    n = split($0, a, FS)

    # empty the current record
    $0=""

    # j is the next field number that we are to set
    # in the record that we are building
    j=1

    # seen is an associative array that we use to
    # keep track of whether we've seen a bit of
    # data before from this record
    delete seen

    # loop over the entries in a (the original
    # fields of the input data)
    for (i=1; i<=n; i++)
        # if we haven't seen this data before,
        # mark it as seen and...
        if (!seen[a[i]]++)
            # add it to the j:th field in the new record
            $(j++) = a[i]

    print
}

The idea I've gone with here is to build an output record with the unique fields from the original data, for each line of input.

"Record" is synonymous with "line" by default, and "field" is synonymous with "column" (these are just more general words that depend on the current values in RS and FS).

score 2 · Answer 3 · answered Oct 31 '18 at 12:21

2

With Perl:

perl -MList::Util=uniq -alne 'print join " ", sort { $a <=> $b } uniq @F' file
1 2 5
1 3 5
1 5
2 5
2 3 4

answered Oct 31 '18 at 12:21

steeldriver

81,074

score 0 · Answer 4 · answered Oct 31 '18 at 18:09

Try this awk approach to sort and uniquify:

awk '
        {MX = 0                                                 # reset MAX
         split ("", C)                                          # reset C array
         for (i=1; i<=NF; i++)  {C[$i]++                        # for each number encountered, set C element to "true"
                                 if ($i > MX) MX = $i           # record MAX for this line
                                }
         for (i=1; i<=MX; i++) if (C[i]) printf "%s ", i        # only print the index of elements being "true", sorted
         printf ORS                                             # print end-of-line
        }
' file
1 2 5 
1 3 5 
1 5 
2 5 
2 3 4

steve · Answer 5 · 2018-10-31T18:50:01.357

0

Another bash approach, similar to @fra-san's.

while read X;do tr<<<$X ' ' \\n|sort -u|paste -sd" ";done<file
1 2 5
1 3 5
1 5
2 5
2 3 4

edited Oct 31 '18 at 18:50

answered Oct 31 '18 at 18:25

steve

21,892

get unique value for each line with unix command

5 Answers5