Find variable names in a template file using Bash/Sed/Awk?

Question

I have some template files which I'm currently processing with envsubst, which works great.

<?php

$config['db_host'] = '${DB_HOST}';
$config['db_port'] = '${DB_PORT}';
$config['url'] = 'http://${WEB_HOST}/${WEB_PATH}';

// Please do NOT change this value
$config['maxSize'] = 25;

What I'm trying to find is a way to scan the file with a bash script and generate a list of all the environment variables that need to be set, so I can then dump them into a .env file like so:

DB_HOST=
DB_PORT=
WEB_HOST=
WEB_PATH=

I think it's possible with sed, however all of the examples I've found after 30 minutes of googling have been about how to replace variables inline, and nothing about just printing out the matches.

Are all variables enclosed in braces? Are they the only uppercase words in the file? — jesse_b, Jan 30 '20 at 22:11
Yes, they're all uppercase and enclosed, the only special character would be an underscore. — jwh, Jan 30 '20 at 22:15
Ahh I see what you meant. No, some of the non-variablized config lines have uppercase characters in them, and that explains the results when I tried your awk. Let me update my original example. — jwh, Jan 30 '20 at 22:35

score 5 · Accepted Answer · 2020-01-31T13:57:33.297

Your criteria seems to be: match any number of uppercase letters or underscores contained between ${ and }.

gawk can work for this on it's own, or grep can simplify the pattern matching part (but will need extra formatting afterwards).

GNU `awk`:

gawk -v 'RS=[$]{' -F '}' '$1 ~ /^[A-Z_]+$/ && !a[$1]++ {printf "%s=\n", $1}' FILE

GNU awk can accept a regex for the record separator, so by assigning RS=[$]{, it will split the input FILE up into records wherever the pattern ${ appears
field separator set to } – now the first field of each record can be checked to see if it matches your other criteria: nothing other than one-or-more of [A-Z_]
using && !a[$1]++ will remove duplicates
the print statement adds an equals sign = to the end of each line – to match your desired output
also note: the first part of a file will always be counted as the first record – even if it didn't begin with ${ – this means that if your file began with [A-Z_]+} (unlikely) – those uppercase letters/underscores would "match" and be printed on the first line of output

`grep` + formatting

grep is perhaps easier to understand (thanks to it's -o / --only-matching option):

grep -o '${[A-Z_]\+}' FILE

but this doesn't format the output: a pipe through sed could do that: eg.

grep -o '${[A-Z_]\+}' FILE | sed 's/${\(.*\)}/\1=/'

this doesn't remove duplicates: pipe output through sort -u to do that, or alternatively pipe once through awk:

grep -o '${[A-Z_]\+}' FILE | awk -F '[{}]' '!a[$0]++{printf "%s=\n", $2}'

Thank you! I went with grep+awk, as gawk isn't currently installed on my build machine and I wanted to keep the requirements to a minimum. — jwh, Jan 31 '20 at 20:01
How to make logical expression with find and ||, if there is at least one such variable? — Rafis Ganeev, Mar 31 '22 at 06:53

jwh · Answer 2 · 2020-01-30T23:06:13.667

1

This is what I ended up doing:

cat file | grep -o '${\w*}' | sed -e 's|${||g' -e 's|}|=|g'

DB_HOST=
DB_PORT=
WEB_HOST=
WEB_PATH=

Which seems to work well enough even if it's not pretty!

Or another method (modified from https://unix.stackexchange.com/a/13467/229729):

cat file | sed -n -e 's/.*${\(\w\+\)}.*/\1=/p'

edited Jan 30 '20 at 23:06

answered Jan 30 '20 at 22:49

jwh

33

2

cat file | sed stuff can be replaced with sed stuff file – Chris Davies Jan 30 '20 at 23:31
This is true, however when you're writing/testing a regex it's easier to edit stuff at the end of the line than the beginning :) – jwh Jan 31 '20 at 00:05
1

@jwh Your second sed command will only find the last ${<match>} on each line: it matches ${WEB_PATH}, but in doing so skips ${WEB_HOST} – Jan 31 '20 at 05:01

Find variable names in a template file using Bash/Sed/Awk?

2 Answers2

GNU awk:

grep + formatting

GNU `awk`:

`grep` + formatting