The basic idea to solve problems like this is to pass both files to sed
. First the definitions, which are stored in the hold space of sed
. Then each line of the other file gets the hold space appended and each occurrence of a variable which can be found repeated in the appended definitions gets replaced.
Here is the script:
sed '/^[A-Z_]*:=.*/{H;d;}
G
:b
s/$\([A-Z_]*\)\([^A-Z_].*\n\1:=\)\([^[:cntrl:]]*\)/\3\2\3/
tb
P
d' Gnom.def form.txt
And now the detailed explanation:
/^[A-Z_]*:=.*/{H;d;}
This collects the definitions to the hold space. /^[A-Z_]*:=.*/
selects all lines starting with a variable name and the sequence :=
. On these lines the commands in {}
are performed: The H
appends them to the hold space, the d
deletes them and starts over, so they won't get printed.
If you can't assure that all lines in the definition file follow this pattern, or if lines in the other file could match the given pattern, this part needs to be adapted, like explained later.
G
At this point of the script, only lines from the second file are processed. The G
appends the hold space to pattern space, so we have the line to be processed with all definitions in the pattern space, separated by newlines.
:b
This starts a loop.
s/$\([A-Z_]*\)\([^A-Z_].*\n\1:=\)\([^[:cntrl:]]*\)/\3\2\3/
This is the key part, the replacement. Right now we have something like
At the $FOO<newline><newline>FOO:=bar<newline>BAR:=baz
----==================--- ###
in the pattern space. (Detail: there are two newlines before the first definition, one produced by appending to the hold space, another by appending to the buffer space.)
The part underlined with ----
matches $\([A-Z_]*\)
. The \(\)
makes it possible to backreference to that string later on.
\([^A-Z_].*\n\)
matches the part underlined with ===
, which is everything up to the backreference \1
. Starting with a no
n-variable character ensures we don't match substrings of a variable. Surrounding the backreference with a newline and :=
makes sure that a substring of a definition will not match.
Finally, \([^[:cntrl:]]*\)
matches the ###
part, which is the definition. Note, that we assume the definition has no control characters. If this should be possible, you can use [^\n]
with GNU sed
or do a workaround for POSIX sed
.
Now the $
and the variable name get replaced by the variable value \3
, the middle part and definition are left as they were: \2\3
.
tb
If a replacement has been made, the t
command loops to mark b
and tries another replacement.
P
If no further replacements were possible, the uppercase P
prints everything upto the first newline (thus, the definition section will not get printed) and
d
will delete the pattern space and start the next cycle. Done.
Limitations
You can do a nasty thing like including FOO:=$BAR
and BAR:=$FOO
in the definition file and make the script loop forever. You can define a processing order to avoid this, but is will make the script more difficult to understand. Leave this away, if your script doesn't need to be idiot proof.
If the definition can contain control characters, after the G
, we can exchange newline with another character like y/\n#/#\n
and repeat this before printing. I don't know a better workaround.
If the definition file can contain lines with different format or the other file can contain lines with definition format, we need a unique separator between both files, either as last line of the definition file or as first line of the other file or as separate file you pass to sed
between the other files. Then you have one loop to collect the definitions until the separator line is met, then do a loop for the lines of the other file.