Sorting blocks of lines

Question

I have a file that contains 4n lines. Here is an excerpt from it containing 8 lines

6115 8.88443
6116 6.61875
6118 16.5949
6117 19.4129
6116 6.619 
6117 16.5979 
6118 19.4111
6115 8.88433

What I want to do is sort a block, where each block consists of 4 lines based on the first column. The output for the excerpt should look as shown below.

6115 8.88443
6116 6.61875
6117 19.4129
6118 16.5949
6115 8.88433 
6116 6.619 
6117 16.5979 
6118 19.4111

iruvar · Answer 1 · 2017-05-04T16:30:43.210

17

One options is to use awk to add an initial serial number prefix every N lines (N=4 in your case). Then feed the prefix as the primary sorting column into sort.

Example with N=4:

awk '{print int((NR-1)/4), $0}' file.txt | sort -n -k1,1 -k2,2 | cut -f2- -d' '

edited May 04 '17 at 16:30

answered Nov 09 '13 at 16:50

iruvar

16,725

Anthon · Answer 2 · 2013-11-09T17:05:45.997

If this is a one off and you don't want to learn python, perl or awk, you can go with the basic split and sort commands.

First split the file in 4 line chunks with the -l option:

split -a 6 -l 4 input_file my_prefix_
for fn in my_prefix_*; do
    sort -n -o $fn $fn
done
cat my_prefix_* > output_file
rm my_prefix_*

The sort -n sorts by numerical value of the first column (999 before 1234). -a 6 should take care of a file with 26^6*4 lines. my_prefix_ should be something unique to the directory you work with.

Joseph R. · Answer 3 · 2013-11-09T16:51:17.493

You can do it with Perl:

perl -nle '
   push @a,$_;
   unless($. % 4){
       print join "\n",sort {$a <=> $b} @a; # Sort @a, and print its contents
       @a = (); # Empty @a to start a new block
   }
' your_file

How this works

-n --> run the code for each input line (and put the current line in $_)
-l --> append a newline to the output of any print
-e --> execute the following string as Perl code
Each line is appended to the array @a.
$. holds the current line number and unless that number is not congruent to zero modulo 4, then we keep working. If it is congruent to zero modulo 4, we have reached a line whose number is a multiple of 4 (the end of a block), in which case, we sort the entries in @a in ascending numerical order and print the entries in the sorted array joined by a newline to standard output.

score 2 · Answer 4 · answered Nov 10 '13 at 05:23

Using a Bourne-like shell,

while read a ; do                                           # Try reading a line.
    read b ; read c ; read d                                # OK, read 3 more.
    printf '%s\n%s\n%s\n%s\n' "$a" "$b" "$c" "$d" | sort -n # Sort them.
done < data

score 2 · Answer 5 · answered May 04 '17 at 18:31

Here are some "pure" awk solutions:

If the indexes are always the same incrementing integer sequence (6115-6119), as in your sample-data, you can use an algorithmic "shortcut":

awk '{a[$1]=$0} !(NR%4){for(i=6115;i<6119;print a[i++]);}'

This does

Add all lines to the array a, distributed at index positions 6115-6119
On every 4th line (!(NR%4)) , loop through the array contents to print in the desired order.

If your numeric indexes are always the four same ones, but not an incrementing integer sequence, you'll have to sort:

awk '{a[$1]=$0} !(NR%4){asort(a,b); for(i=1;i<5;print b[i++]);}'

Note: This is with GNU awk, others may not support asort.

If every block-of-four could have different numeric IDs:

awk '{a[$1]=$0} !(NR%4){asort(a); for(i=1;i<5;print a[i++]); delete a}'

Note: TIL from @Gilles self-answer(+2) this use of delete is not (yet) POSIX, but universally supported.

A version with the correct™ use of delete:

awk '{a[$1]=$0} !(NR%4){asort(a); for(i=1;i<5;delete a[i++]){print a[i]}}'

A version without delete, using more memory and dimensions:

awk '{a[n][$1]=$0} !(NR%4){asort(a[n]); for(i=1;i<5;print a[n][i++]); n++}

score 1 · Answer 6 · edited May 23 '17 at 11:33

You can get a clean solution with R. If the table above is in a file called "table.txt", then perform the following steps. The desired result will be in the file "tableout.txt".

> x = read.table("table.txt", col.names=c("a", "b"))
> x
     a        b
1 6115  8.88443
2 6116  6.61875
3 6118 16.59490
4 6117 19.41290
5 6116  6.61900
6 6117 16.59790
7 6118 19.41110
8 6115  8.88433
> x["index"] = c(rep(1, 4), rep(2, 4))
> x
     a        b index
1 6115  8.88443     1
2 6116  6.61875     1
3 6118 16.59490     1
4 6117 19.41290     1
5 6116  6.61900     2
6 6117 16.59790     2
7 6118 19.41110     2
8 6115  8.88433     2     
> xord = x[with(x, order(index, a)), ]
> xord
     a        b index
1 6115  8.88443     1
2 6116  6.61875     1
4 6117 19.41290     1
3 6118 16.59490     1
8 6115  8.88433     2
5 6116  6.61900     2
6 6117 16.59790     2
7 6118 19.41110     2
> write.table(xord[,1:2], "tableout.txt", row.names=FALSE, col.names=FALSE)

See also How to sort a dataframe by column(s) in R.

Sorting blocks of lines

6 Answers6