5

Preface: Every couple of days there comes a question of this type, which is easy to solve with sed, but takes time to explain. I'm writing this question and answer, so I can later refer to this generic solution and only explain the adaption for the specific case. Feel free to contribute.

I have files with variable definitions. Variables consist of uppercase letters or underscore _ and their values follow after the :=. The values can contain other variables. This is Gnom.def:

NAME:=Gnom
FULL_NAME:=$FIRST_NAME $NAME
FIRST_NAME:=Sman
STREET:=Mainstreet 42
TOWN:=Nowhere
BIRTHDAY:=May 1st, 1999

Then there is another file form.txt with a template form:

$NAME
Full name: $FULL_NAME
Address: $STREET in $TOWN
Birthday: $BIRTHDAY
Don't be confused by $NAMES

Now I want a script which replaces the variables (marked with $ and the identifier) in the form by the definitions in the other file, recursively, if necessary, so I get this text back:

Gnom
Full name: Sman Gnom
Address: Mainstreet 42 in Nowhere
Birthday: May 1st, 1999
Don't be confused by $NAMES

The last line is to ensure that no substrings of variables get replaced accidentally.

Philippos
  • 13,453

2 Answers2

8

The basic idea to solve problems like this is to pass both files to sed. First the definitions, which are stored in the hold space of sed. Then each line of the other file gets the hold space appended and each occurrence of a variable which can be found repeated in the appended definitions gets replaced.

Here is the script:

sed '/^[A-Z_]*:=.*/{H;d;}
  G
 :b
 s/$\([A-Z_]*\)\([^A-Z_].*\n\1:=\)\([^[:cntrl:]]*\)/\3\2\3/
 tb
 P
 d' Gnom.def form.txt

And now the detailed explanation:

/^[A-Z_]*:=.*/{H;d;}

This collects the definitions to the hold space. /^[A-Z_]*:=.*/ selects all lines starting with a variable name and the sequence :=. On these lines the commands in {} are performed: The H appends them to the hold space, the d deletes them and starts over, so they won't get printed.

If you can't assure that all lines in the definition file follow this pattern, or if lines in the other file could match the given pattern, this part needs to be adapted, like explained later.

G

At this point of the script, only lines from the second file are processed. The G appends the hold space to pattern space, so we have the line to be processed with all definitions in the pattern space, separated by newlines.

:b

This starts a loop.

 s/$\([A-Z_]*\)\([^A-Z_].*\n\1:=\)\([^[:cntrl:]]*\)/\3\2\3/

This is the key part, the replacement. Right now we have something like

At the $FOO<newline><newline>FOO:=bar<newline>BAR:=baz
       ----==================---  ###

in the pattern space. (Detail: there are two newlines before the first definition, one produced by appending to the hold space, another by appending to the buffer space.)

The part underlined with ---- matches $\([A-Z_]*\). The \(\) makes it possible to backreference to that string later on.

\([^A-Z_].*\n\) matches the part underlined with ===, which is everything up to the backreference \1. Starting with a no n-variable character ensures we don't match substrings of a variable. Surrounding the backreference with a newline and := makes sure that a substring of a definition will not match.

Finally, \([^[:cntrl:]]*\) matches the ### part, which is the definition. Note, that we assume the definition has no control characters. If this should be possible, you can use [^\n] with GNU sed or do a workaround for POSIX sed.

Now the $ and the variable name get replaced by the variable value \3, the middle part and definition are left as they were: \2\3.

 tb

If a replacement has been made, the t command loops to mark b and tries another replacement.

 P

If no further replacements were possible, the uppercase P prints everything upto the first newline (thus, the definition section will not get printed) and

 d

will delete the pattern space and start the next cycle. Done.

Limitations

  • You can do a nasty thing like including FOO:=$BAR and BAR:=$FOO in the definition file and make the script loop forever. You can define a processing order to avoid this, but is will make the script more difficult to understand. Leave this away, if your script doesn't need to be idiot proof.

  • If the definition can contain control characters, after the G, we can exchange newline with another character like y/\n#/#\n and repeat this before printing. I don't know a better workaround.

  • If the definition file can contain lines with different format or the other file can contain lines with definition format, we need a unique separator between both files, either as last line of the definition file or as first line of the other file or as separate file you pass to sed between the other files. Then you have one loop to collect the definitions until the separator line is met, then do a loop for the lines of the other file.

Philippos
  • 13,453
0

For comparison with the sed script, here's a POSIX awk script:

$ cat tst.awk
BEGIN { FS=":=" }
NR==FNR {
    map["$"$1] = $2
    next
}
{
    mappedWord = 1
    while ( mappedWord ) {
        mappedWord = 0
        head = ""
        tail = $0
        while ( match(tail,/[$][[:alnum:]_]+/) ) {
            word = substr(tail,RSTART,RLENGTH)
            if ( word in map ) {
                word = map[word]
                mappedWord = 1
            }
            head = head substr(tail,1,RSTART-1) word
            tail = substr(tail,RSTART+RLENGTH)
        }
        $0 = head tail
    }
    print
}

$ awk -f tst.awk Gnom.def form.txt
Gnom
Full name: Sman Gnom
Address: Mainstreet 42 in Nowhere
Birthday: May 1st, 1999
Don't be confused by $NAMES

The script doesn't care about whatever characters or strings you use (unlike the sed version which apparently relies on no control characters being present and := not appearing in the 2nd file), just include whichever characters can make up your words to be replaced in the regexp arg to match(), currently [$][[:alnum:]_]+.

The above would fail given recursive definitions but it's an easy tweak to detect, report, and handle it reasonably if you like, e.g.:

$ head Gnom.def form.txt
==> Gnom.def <==
FOO:=$BAR
BAR:=$FOO
NAME:=Gnom
FULL_NAME:=$FIRST_NAME $NAME
FIRST_NAME:=Sman
STREET:=Mainstreet 42
TOWN:=Nowhere
BIRTHDAY:=May 1st, 1999

==> form.txt <== $NAME Full name: $FULL_NAME Address: $STREET in $TOWN Birthday: $BIRTHDAY Don't be confused by $NAMES testing recursive $FOO testing recursive $BAR

$ cat tst.awk
BEGIN { FS=":=" }
NR==FNR {
    map["$"$1] = $2
    next
}
{
    mappedWord = 1
    iter = 0
    delete mapped
    while ( mappedWord ) {
        if ( ++iter == 100 ) {
            printf "%s[%d]: Warning: Breaking out of recursive definitions.\n", FILENAME, FNR | "cat>&2"
            break
        }
        mappedWord = 0
        head = ""
        tail = $0
        while ( match(tail,/[$][[:alnum:]_]+/) ) {
            word = substr(tail,RSTART,RLENGTH)
            if ( word in map ) {
                word = map[word]
                mappedWord = 1
            }
            head = head substr(tail,1,RSTART-1) word
            tail = substr(tail,RSTART+RLENGTH)
        }
        $0 = head tail
        for (word in mapped) {
            mapped[word]++
        }
    }
    print
}

$ awk -f tst.awk Gnom.def form.txt
Gnom
Full name: Sman Gnom
Address: Mainstreet 42 in Nowhere
Birthday: May 1st, 1999
Don't be confused by $NAMES
testing recursive $BAR
testing recursive $FOO
form.txt[6]: Warning: Breaking out of recursive definitions.
form.txt[7]: Warning: Breaking out of recursive definitions.

Note that the above warnings are being printed to stderr, not stdout, so they won't clutter up your output:

$ awk -f tst.awk Gnom.def form.txt 2>err
Gnom
Full name: Sman Gnom
Address: Mainstreet 42 in Nowhere
Birthday: May 1st, 1999
Don't be confused by $NAMES
testing recursive $BAR
testing recursive $FOO

$ cat err
form.txt[6]: Warning: Breaking out of recursive definitions.
form.txt[7]: Warning: Breaking out of recursive definitions.
Ed Morton
  • 31,617