Brace compression

Question

I love brace expansion (e.g. rm file.{0..5}). I find it a lot easier to read than the expanded version. Is there a quick and easy way of doing the opposite?

For example given the input www.example.com www1.example.com www2.example.com the output would be www{,1,2}.example.com.

This is one of those problems that are way more difficult to do in reverse. It's basically artificial intelligence, and doesn't even necessarily have an unique answer. So: definitely not in bash! Maybe an elaborate perl script. — orion, Mar 19 '14 at 09:05
The point of brace expansion is to reduce typing. What use do you see for this kind of brace "collapsing"? — chepner, Mar 21 '14 at 02:26
"I find it a lot easier to read than the expanded version." -- so anything where making something easier to read may be useful. If my example wasn't a big enough clue, my current problem is analysing a messy hierarchy of DNS servers. I have output with a lot of very similar looking (but not quite identical) names. — Samuel Harmer, Mar 21 '14 at 08:31

score 2 · Answer 1 · answered Mar 19 '14 at 09:11

2

The short answer is no, this is not possible but it seems related to the longest common substring problem which you can look into as a starting point if you are interested in coding this yourself.

answered Mar 19 '14 at 09:11

Adrian Frühwirth

1,698

score 2 · Answer 2 · edited Apr 13 '17 at 12:36

Finding a nice-looking expression in general is a difficult and poorly defined problem (what does nice-looking mean?). If you're just looking for a single brace expression, i.e. you have a bunch of strings and you want to express them in the form PREFIX{MIDDLE1,MIDDLE2,...,MIDDLEn}SUFFIX with maximal PREFIX and SUFFIX, then the problem is well-defined and there is a simple algorithm:

Find the longest common prefix.
Find the longest common suffix.
Split up the strings.

I'll reuse my longest_common_prefix function.

longest_common_prefix () {
  prefix=
  ## Truncate the two strings to the minimum of their lengths
  if [[ ${#1} -gt ${#2} ]]; then
    set -- "${1:0:${#2}}" "$2"
  else
    set -- "$1" "${2:0:${#1}}"
  fi
  ## Binary search for the first differing character, accumulating the common prefix
  while [[ ${#1} -gt 1 ]]; do
    n=$(((${#1}+1)/2))
    if [[ ${1:0:$n} == ${2:0:$n} ]]; then
      prefix=$prefix${1:0:$n}
      set -- "${1:$n}" "${2:$n}"
    else
      set -- "${1:0:$n}" "${2:0:$n}"
    fi
  done
  ## Add the one remaining character, if common
  if [[ $1 = $2 ]]; then prefix=$prefix$1; fi
}

I use rev for the suffix because I don't feel like writing the corresponding common suffix search. I assume that the strings don't contain any newlines.

first=$1; shift
prefix=$(rev <<<"$first")
for x; do
  longest_common_prefix "$prefix" "$(rev <<<"$x")"
done
suffix=$(rev <<<"$prefix")
first=${first%"$suffix"}
prefix=$first
for x; do
  longest_common_prefix "$prefix" "${x%"$suffix"}"
done
printf '%s{%s' "$prefix" "${first#"$prefix"}"
for x; do
  x=${x%"$suffix"}
  printf ',%s' "${x#"$prefix"}"
done

Note that if the strings may contain the characters ,{}, you'll need to figure out some form of quoting.

Brace compression

2 Answers2

Linked