Collapsing a series of comma separated numbers in a sequence to Beginning-End

Question

Problem I am trying to solve/enhance a BASH script that provides a number sequence: I am using a topologically aware tool (lstopo-no-graphics) to extract physical processor numbers to use for input to numactl for processor binding.

Example output for L3 L#4 shared cache physical core

lstopo-no-graphics --no-io|sed -n "/L3 L#3/,/L3/p"|grep -v "L3\|L2"|tr -s '[:space:]'|cut -d " " -f4|grep -o "[0-9]*"|sort -g|tr '\n' ','|sed '$s/,$//'

results in the number series string:

32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103

All well and good, I use this series for the numactl --physcpubin=32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103، I would would like to be able to collapse the sequence down to ‌numactl --physcpubin=32-39,96-103, looking to collapse multiple comma separated number sequences to an "a-n" series when sequential, with each sequence comma separated.

I don't have a problem with the existing bash script, just looking for a cleaner implementation if anyone has any ideas?

Is anything here adaptable to your case? How to collapse consecutive numbers into ranges? — steeldriver, Apr 27 '21 at 17:07
hmmm, maybe. I didn't think about using dc. :) Researching time. Thanks — Goomba1050, Apr 27 '21 at 17:18

cas · Answer 1 · 2021-05-08T14:55:56.983

Using a perl one-liner with the Set::IntSpan module:

$ perl -MSet::IntSpan -l -e 'print Set::IntSpan->new(shift)' 32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103
32-39,96-103

This takes one argument, a comma-separated list of integers on the command -line. You can enclose it in quotes if there are spaces or tabs or other whitespace in the list. Set::IntSpan is extremely forgiving of whitespace anywhere in a list of numbers, it ignores it all.

If the list already contains a mixture of ranges and integers, it will deal with them seamlessly:

$ perl -MSet::IntSpan -l -e 'print Set::IntSpan->new(shift)' 32,33,34-38,39,96-100,101,102,103
32-39,96-103

Set::IntSpan is packaged as libset-intspan-perl on Debian and related distros like Ubuntu, and as perl-Set-IntSpan on Fedora. For other systems, if you can't find a package, it can be installed with cpan.

To use this in your script, you can use command substitution:

numactl --physcpubin=$(perl -MSet::IntSpan -l -e 'print Set::IntSpan->new(shift)' 32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103)

This is fine, if you only use it once in a script, but tedious and decreases readability otherwise. So wrap it in a function in your bash script (with a small improvement to optionally work with multiple args on the command-line, useful if you want, e.g., to populate an array with cpu-sets):

collapse () {
 perl -MSet::IntSpan -le 'for (@ARGV) {print Set::IntSpan->new($_)}' "$@"
}

and then use it as:

cpus=$(collapse 32,33,34-38,39,96-100,101,102,103)
numactl --physcpubin="$cpus"

or

numactl --physcpubin=$(collapse 32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103)

Here's a fancier stand-alone script version that can take multiple args directly from the command-line, from files listed on the command line, or from stdin. Or any combination thereof. Multiple args are processed in the order provided, with STDIN processed last. Input from files and STDIN is processed one line at a time.

#!/usr/bin/perl
use strict;
use Set::IntSpan;
my @series = ();
take args from files and from command line
foreach my $arg (@ARGV) {
  if ( -e $arg ) { # if the arg is a filename, slurp it in
    open(my $fh, "<", $arg) || die "couldn't open $arg: $!\n";
    while(<$fh>) { push @series, $_; }
  } else { # otherwise, treat the arg as a series
    push @series, $arg;
  }
};
take args from stdin too, if stdin isn't a terminal
if (! -t STDIN) { while(<STDIN>) { push @series, $_; } };
foreach (@series) {
  print Set::IntSpan->new($_) . "\n";
};

Save as, e.g. collapse.pl, make executable with chmod +x collapse.pl and run like:

$ printf '1,2,3\n4,5,6' | ./collapse.pl 7,8,9 32-39,50,51,52,53
7-9
32-39,50-53
1-3
4-6

score 2 · Accepted Answer · answered Apr 27 '21 at 18:44

2

Save this as range.awk.

{
    for(i=2;i<=NF+1;i++){     #Visit each number from the 2nd on
        if($i==$(i-1)+1){
            if(f=="")f=$(i-1) #Candidate to first number of a range
            continue
        }
        printf("%s%s%s%s", f, (f!="" ? "-" : ""), $(i-1), (i>NF ? RS : FS))
        f="" #Unset the candidate
    }
}

Run it: awk -F, -f range.awk.

Or copy-paste the collapsed one-liner:

awk -F, '{for(i=2;i<=NF+1;i++){if($i==$(i-1)+1){if(f=="")f=$(i-1);continue}printf("%s%s%s%s",f,f!=""?"-":"",$(i-1),i>NF?RS:FS);f=""}}'

I did not to hardcode the field separator so it must be specified with -F.

Sample outputs:

$ awk -F, -f range.awk <<< 32,33,34,35,36,37,38,39,96,97,98,99,100,101,102,103
32-39,96-103
$ awk -F, -f range.awk <<< 0,1,2,5,8,9,11
0-2,5,8-9,11
$ awk -F, -f range.awk <<< 4
4

answered Apr 27 '21 at 18:44

Quasímodo

18,865
4
36
73

+1 nice ternary operators inside the printf cmd. – Cbhihe Apr 28 '21 at 08:31
1

@Cbhihe Thank you. Since I'm here, I'll also add that the the f!="" and f=="" tests cannot be written as f and !f because a input of 0 would trigger them wrong. – Quasímodo Apr 28 '21 at 10:56
1

And if you want to make it slightly more robust against malformed csv sequences: awk 'BEGIN {FS=",[ ]*,*"}{for(i=2;i<=NF+1;i++){if($i==$(i-1)+1){if(f=="") f=$(i-1);continue} printf ("%s%s%s%s",f,f!=""?"-":"",$(i-1),i>NF?RS:",");f=""}}' <<< "0,1,,2,5,6, 7,33" – Cbhihe Apr 28 '21 at 17:33
Wow, excellent! Works for me. – Goomba1050 Apr 28 '21 at 19:20

Collapsing a series of comma separated numbers in a sequence to Beginning-End

2 Answers2

take args from files and from command line

take args from stdin too, if stdin isn't a terminal