2

I have a comma delimited string of numbers as follows:

1,2,3,5,6,7,8,9,12,14

I'm looking for a command to use in a bash script that can combine adjacent numbers into range/hyphenated entries as follows:

1-3,5-9,12,14

The initial string is guaranteed to be sorted in ascending order.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287

6 Answers6

1

Using perl:

perl -pe 's/\b(\d+)(?{$q=$1+1})(?:,(??{$q})\b(?{$p=$q++})){2,}/$1-$p/g'

This is using a regex with embedded perl code via the (?{...}) and (??{...}) expressions; the first just evaluates the embedded code, while the second uses the value returned by it as a pattern. See perlre(1) for a complete description.

Replace the {2,} quantifier with + if you also want ranges of just two numbers (eg. 1,2,7 -> 1-2,7).

0

Here's a short awk script that walks through a comma-delimited list of sorted integers and populates two arrays, a and b, while doing so.

The a array will contain the start integer for each range of monotonically increasing integers, while b will contain the corresponding end integer. The variable n in the code holds the number of found ranges.

BEGIN {
    OFS = FS = ","
}

{
    n = 0

    a[++n] = $1
    for (i = 1; i < NF; ++i)
        if ($i != $(i+1) - 1) {
            b[n] = $i
            a[++n] = $(i+1)
        }
    b[n] = $NF

    $0 = ""

    for (i = 1; i <= n; ++i)
        if (a[i] == b[i])
            $i = a[i]
        else
            $i = sprintf("%d-%d", a[i], b[i])

    print
}

The output is created by iterating over the n found distinct ranges and constructing a record where each field is either a single integer (for ranges of length 1) or a string representing the start and end of the range.

Testing this on the data that you have supplied, reading the data from a file:

$ awk -f script.awk file
1-3,5-9,12,14

You could obviously feed this with a string from standard input, like so:

$ awk -f script.awk <<<"1,2,3,5,9,10,11,12,13"
1-3,5,9-13
Kusalananda
  • 333,661
0

With zsh instead, you could define a function like:

reduce() {
  local i=1
  argv=(${(nus:,:)1}) # split $1 on ",", numerically sort and remove dups
  while ((i < $#)) {
    if ((${argv[i]#*-} + 1 == ${argv[i+1]%-*})) {
      argv[i]=${argv[i]%-*}-${argv[i+1]#*-}
      argv[i+1]=()
    } else {
      ((i++))
    }
  }
  print ${(j:,:)@}
}

Which would also accept ranges on input:

$ reduce 1,2,3,5,6,7,8,9,12,14
1-3,5-9,12,14
$ reduce 1,2,3,5-7,8,9-11,12,13-20
1-3,5-20
$ reduce 5,2,4,5,6
2,4-6

Note that it won't work properly if the input has overlapping ranges:

$ reduce 1-3,2
1-3,2
$ reduce 1-3,2-4
1-3,2-4

From bash, you'd define the function as:

reduce() { zsh -c '
  i=1
  argv=(${(nus:,:)1}) # split $1 on ",", numerically sort and remove dups
  while ((i < $#)) {
    if ((${argv[i]#*-} + 1 == ${argv[i+1]%-*})) {
      argv[i]=${argv[i]%-*}-${argv[i+1]#*-}
      argv[i+1]=()
    } else {
      ((i++))
    }
  }
  print ${(j:,:)@}' zsh "$@"
}
  • ZSH is not an option for me in this situation. I'm modifying an existing BASH script that already implements the reverse problem (converting "1-3,5" to "1,2,3,5") but your handling of additional ranges within the original input is a nice touch. – David Christensen Jan 15 '20 at 23:12
  • @DavidChristensen, note that there's nothing stopping your from calling another language interpreter like zsh in your bash script. After all, it's a shell's role to execute other commands. Calling awk, perl, sed, bc (other language interpreters) in bash is common. – Stéphane Chazelas Jan 15 '20 at 23:16
  • True, but the script is being called on a test system and zsh is not installed by default in the distro being used, it would create an additional software dependency that doesn't currently exist. Not too difficult to overcome but not my first choice either. – David Christensen Jan 15 '20 at 23:36
0

I would be very surprised if this can be done in sed.  You could write a pure bash script, but this is fairly easy in awk.  This program is somewhat similar to Kusalananda’s, but doesn’t use an array.

awk -F, '
    {
        begin=$1
        prev=$1
        for (i=2; i<=NF; i++) {
                if ($i == prev+1) {
                        prev=$i
                        continue
                }
                if (begin==prev) {
                        printf "%s,",    begin
                } else {
                        printf "%s-%s,", begin, prev
                }
                begin=$i
                prev=$i
        }
        if (begin==prev) {
                printf "%s",    begin
        } else {
                printf "%s-%s", begin, prev
        }
        print ""
    }'
  • -F, sets the field separator, so we can treat each number as a field.
  • begin is the first number in the current range of consecutive numbers.
  • prev is always the most recent number we looked at.  We could just say $(i-1); I thought it was clearer to give it a name.
  • If the current number is one more than the previous one, we are just adding one more number to a range of consecutive numbers, so make a note of it and move on.
  • Otherwise, we’re starting a new range.  Print the range we just finished.  If the range begins and ends with the same number, just print that number, and don’t print it twice with a hyphen.  Print a comma and no newline.
  • Repeat the above logic at the end of the line.  Since this is the last range (or number) on the line, don’t print a comma after it; rather, use print "" to print a newline.
0

I ended up with the following BASH function using an array to carry the identified ranges. The input string is the first argument to the function and the result is passed back through the second argument:

function compact_range {
  arr=()
  start=""
  for cpu in ${1//,/ }; do
    # Start a new range definition if necessary
    [ -z "$start" ] && start=$cpu && range=$cpu && last=$cpu && continue
    prev=$(( $cpu - 1 ))
    # If the current CPU is not adjacent to the last CPU, start a new range
    [ "$prev" -ne "$last" ] && arr+=($range) && start=$cpu && range=$cpu && last=$cpu && continue
    # Current CPU is adjacent to an existing range, expand the current range
    range="${start}-${cpu}" && last=$cpu
  done
  # Append the last range to the array of ranges
  arr+=($range)
  # Return a comma delimited list of ranges
  eval $2=$(IFS=,;printf "%s" "${arr[*]}")
}

Thanks everyone for the ideas.

0

A very short awk answer for a single line input that handling all cases below (number scanning is from left-to-right direction in all cases) unlike other currently existing answers that failing on these all and they mostly handles only case#2:

$ awk -v RS=, 'function prnt(){ printf sep start (end==start?"":"—"end) ; sep=RS }
    end!="" && ( end==$0-1 || end==$0+1) { end=$0; next }
    end!=""                              { prnt() }
                                         { start=end=$0 }
END{ prnt() }'

1. numbers are in descending order:

$ awk -v RS=, 'function prnt(){ printf sep start (end==start?"":"—"end) ; sep=RS }
    end!="" && ( end==$0-1 || end==$0+1) { end=$0; next }
    end!=""                              { prnt() }
                                         { start=end=$0 }
END{ prnt() }' <<<'14,13,12,11,9,8,7,3,2,1,0,-1'
14—11,9—7,3—-1

2. numbers are in ascending order:

$ awk -v RS=, 'function prnt(){ printf sep start (end==start?"":"—"end) ; sep=RS }
    end!="" && ( end==$0-1 || end==$0+1) { end=$0; next }
    end!=""                              { prnt() }
                                         { start=end=$0 }
END{ prnt() }' <<<'1,2,3,5,6,7,8,9,12,14'
1—3,5—9,12,14

3. numbers are in unsorted order:

$ awk -v RS=, 'function prnt(){ printf sep start (end==start?"":"—"end) ; sep=RS }
    end!="" && ( end==$0-1 || end==$0+1) { end=$0; next }
    end!=""                              { prnt() }
                                         { start=end=$0 }
END{ prnt() }' <<<'10,3,4,5,6,2,1,5,6,7,9,7,2,3'
10,3—6,2—1,5—7,9,7,2—3

To applying on a file containing multiple of these lines like:

$ cat infile
14,13,12,11,9,8,7,3,2,1,0,-1
1,2,3,5,6,7,8,9,12,14
-1,0,1,2,4,3,2,1,0,-1,-2,-2,-2,-2,-4
10,3,4,5,6,2,1,5,6,7,9,7,2,3

and adjusting the script a bit:

$ awk -v RS=, 'function prnt(){ printf sep start (end==start?"":"—"end) ; sep=RS }
   /\n/{
         printf "%s\n", sep start ((end==$1-1 || end==$1+1)?"—":sep) $1;
         sep=""; start=end=$2; next
       }
   end!="" && ( end==$0-1 || end==$0+1) { end=$0; next }
   end!=""                              { prnt() }
{ start=end=$0 }' infile

Output:

14—11,9—7,3—-1
1—3,5—9,12,14
-1—2,4—-2,-2,-2,-2,-4
10,3—6,2—1,5—7,9,7,2—3
αғsнιη
  • 41,407