0

I have a list as below:

1 2 5 2
1 5 5 3
1 5 5 5
5 2 2 2
2 2 4 3

I would like to sort each line and obtain the unique values as below, something similar to sort | uniq :

1 2 5
1 3 5
1 5
2 5
2 3 4

I have been searching the net for the solution but I could only find the solution to sort by column. How can I get the output? Thanks in advance.

bison72
  • 161

5 Answers5

5

Since sorting rows is easier than sorting columns in a row, one approach can be to transpose each line (so that each field becomes a line), apply sort and uniq and then traspose them back.

Here is a naive implementation, assuming GNU tools:

$ while read -r line; do echo "$line" | grep -o '[^ ]*' | sort -h | uniq | paste -s; done <file

It loops through the file and, for each line:

  • grep with the -o option (print only the matching part of each line) splits its input into n lines, one for each matching substring. Here we are matching everything except for white spaces.
  • Split lines are sorted with the -h option, that compares human readable numbers (if you want to sort your fields as alphanumeric strings, remove -h).
  • The uniq command removes duplicates.
  • paste -s prints each line from standard input as fields of a single line separated by tabs. You can append a final | tr '\t' ' ' to change tabs into spaces.

Note, however, that using loops to process text is usually considered bad practice.

fra-san
  • 10,205
  • 2
  • 22
  • 43
2

The following does not sort the data across the columns, just extracts the unique values. It's unclear whether the sorting is needed.

Using awk:

$ awk '{ n=split($0,a,FS); $0=""; j=1; delete u; for (i=1; i<=n; i++) if (!u[a[i]]++) $(j++) = a[i]; print }' <file
1 2 5
1 5 3
1 5
5 2
2 4 3

The program, laid out nicely, with comments:

{
    # split the current record into fields in the array a
    n = split($0, a, FS)

    # empty the current record
    $0=""

    # j is the next field number that we are to set
    # in the record that we are building
    j=1

    # seen is an associative array that we use to
    # keep track of whether we've seen a bit of
    # data before from this record
    delete seen

    # loop over the entries in a (the original
    # fields of the input data)
    for (i=1; i<=n; i++)
        # if we haven't seen this data before,
        # mark it as seen and...
        if (!seen[a[i]]++)
            # add it to the j:th field in the new record
            $(j++) = a[i]

    print
}

The idea I've gone with here is to build an output record with the unique fields from the original data, for each line of input.

"Record" is synonymous with "line" by default, and "field" is synonymous with "column" (these are just more general words that depend on the current values in RS and FS).

Kusalananda
  • 333,661
2

With Perl:

perl -MList::Util=uniq -alne 'print join " ", sort { $a <=> $b } uniq @F' file
1 2 5
1 3 5
1 5
2 5
2 3 4
steeldriver
  • 81,074
0

Try this awk approach to sort and uniquify:

awk '
        {MX = 0                                                 # reset MAX
         split ("", C)                                          # reset C array
         for (i=1; i<=NF; i++)  {C[$i]++                        # for each number encountered, set C element to "true"
                                 if ($i > MX) MX = $i           # record MAX for this line
                                }
         for (i=1; i<=MX; i++) if (C[i]) printf "%s ", i        # only print the index of elements being "true", sorted
         printf ORS                                             # print end-of-line
        }
' file
1 2 5 
1 3 5 
1 5 
2 5 
2 3 4 
RudiC
  • 8,969
0

Another bash approach, similar to @fra-san's.

while read X;do tr<<<$X ' ' \\n|sort -u|paste -sd" ";done<file
1 2 5
1 3 5
1 5
2 5
2 3 4
steve
  • 21,892