5

I'm trying to use sed to pull out a brace-delimited config block like this from a long file (Junos config):

                group foo {
                    command;
                    setting {
                        value;
                    }
                    command;
                }

The trick is to stop at the } that's indented the same as the first line.

I learned how to use sed to match from one pattern to another, and tried this:

$ sed -rn '/^( *)group foo/,/^\1\}/p' config.txt
sed: -e expression #1, char 41: Invalid back reference

Is the problem that /^( *)group foo/ and /^\1\}/ are two separate patterns, and backreferences won't work between them? If so, how can I accomplish this?

Kusalananda
  • 333,661
Jacktose
  • 441

3 Answers3

3

You are right: Although backreferences are defined in basic regular expressions (BRE) (and since each sed address is a BRE, it supports back-references), a backreference cannot retrieve the capture group defined in another BRE. So the capture group in the address /^( *)group foo/ cannot be retrieved by the other address /^\1\}/.

This test.awk does it by counting opening and closing braces:

brk && /\{/{brk++} #Increment brk if brk is not zero and line contains {
brk && /\}/{brk--} #Decrement brk if brk is not zero and line contains }
/^[[:blank:]]*group foo \{/{brk=1;prt=1} #Set brk and prt if match initial pattern
prt                #Print line if prt is set
!brk && prt{prt=0} #If brk is zero and prt is not, set prt=0
$ cat file
foo bar
        foo bar2
        }
                group foo {
                    command;
                    setting {
                        value;
                    }
                    command;
                }
        dri {
    }
end
$ awk -f test.awk file
                group foo {
                    command;
                    setting {
                        value;
                    }
                    command;
                }

Another less elegant option which relies on counting the empty spaces, as was the idea behind your attempt. It breaks if the indenting has tabs.

/^ *group foo \{/{
    match($0,/^ */) #Sets RLENGTH to the length in characters of the matched string
    i=RLENGTH
}
i                   #If i is set, the current line is printed
i&&/^ *\}$/{
    match($0,/^ */)     #Again, sets RLENGTH to the length of the matched string
    if(RLENGTH==i){i=0} #If the value is equal to the one from group foo line, unset i
}
Quasímodo
  • 18,865
  • 4
  • 36
  • 73
  • 1
    That is not true about back-references. In sed they work in // too, besides s. Ex.: seq 99|sed -rn '/(.)\1/p'. – seshoumara Jun 24 '20 at 07:38
  • 1
    @seshoumara Thank you for the correction, updated answer to make it accurate. – Quasímodo Jun 24 '20 at 11:40
  • +1 Bracket counting is the better solution indeed. With some effort it can be done in sed too. A potential issue would be if a bracket is inside quotes/values. Then a specific parser would be needed. – seshoumara Jun 24 '20 at 17:31
  • 1
    Thanks, counting {} is better, and I was able to extend your awk example to fit my needs. – Jacktose Jun 30 '20 at 21:59
2

Back-references can be used in /pattern/, but they are not remembered from one such expression to the other.

There are many solutions in sed, for example (using GNU sed):

sed -rz 's@.*\n(( *)group foo.*\2}).*@\1@;s@^(( *).*)@\1\2@;s@(\n( *)}).*\2$@\1\n@' config.txt

The -z flag is used to load the entire config in pattern space. The first s deletes everything before the start of group foo and after the last closing bracket (greedy *) with the appropriate indentation.

The second s copies that indentation to the end. The last s deletes everything after the first closing bracket with the appropriate indentation. These last two commands are only needed when there are multiple config blocks at the same level of indentation as the one of interest.

seshoumara
  • 862
  • 5
  • 7
  • If you allow a suggestion, I would avoid : as the delimiter as it could be mistaken by a label. Also, I can't grasp the machinery behind it, maybe a little explanation of what it does could help both me and the OP :) – Quasímodo Jun 24 '20 at 20:11
1

sed does not give you the facility of using backreferences across patterns, but it does allow you to bring the two lines into a single pattern space and then look for backreferences.

$ sed -Ene '
    /^\s+group foo \{$/,$!d
    p;/^\s+group foo \{$/h;/\}/!d
    G;/^(\s+)\S.*\n\1\S/q
' file

Sed commands used :

  • p print the contents of the pattern space.
  • $!d means to delete the line so long as its not the last However, here it is with a range operator so it means delete all those lines that fall out of range. The range is the group foo line till the eof. So basically it is skipping all lines before the first group foo line.
  • G append the contents of the hold space to the pattern space.
  • q means quit processing any further. Akin to exit.

An alternative approach is to first identify the starting line, then keep printing AND bookkeeping the nesting depth of the trailing braces, stop when tge nesting depth reaches zero.

$ sed -ne '/^\s*group foo \{$/,${
    p;// {x;s/.*//;x;}
    /\{/ {x;s/^/./;x;}
    /\}/ {x;s/^.//;x;}
    /\}/G;/\n$/q
}' file

With perl it is almost trivial when you want to match spaces.

$ perl -lne 'print if /^(\s+)(?{ $k=$1 })group\s+foo\s+\{/x ... /^$k\}/' file