A. sed
and grep
The shell displays the values of some variables on several lines because they contain embedded newlines.
grep
and sed
are designed to search patterns on the same line, line by line (the newline character is used as a hard-coded delimiter).
B. Using awk
Awk can select a pattern on a line but also apply rules conditionally.
1. Select the matching lines
- Display lines starting with the shell words
declare --
/^declare --/
- Display lines that don't begin with the word
declare
!/^declare/
The two preceding rules allow to 1. display the variables that don't have attributes and 2. the values displayed on subsequent lines for multi-line values.
We can use a sample input to show an overview of the pattern matching.
declare -- HOSTNAME="retro-
host"
declare -a GROUPS=()
declare -x GREETINGS="Hello
World!"
declare -i HISTCMD
declare -- PROMPT_COMMAND="printf \"\\033]0;%s@%s:%s\\007\" \"\${USER}\" \"\${HOSTNAME%%.*}\" \"\${PWD/#\$HOME/\\~}\""
The value of the variable HOSTNAME
is successfully displayed because the two rules match consecutively: one rule matches in one line and the other in the next line. However, we also see that the GREETINGS
variable value is partially displayed. Indeed, although the line does not start with the pattern declare --
, the subsequent line (a substring of the value of the variable, c.f. World!"
) is displayed because the line (or "record") matches the second rule !/^declare/
.
declare -- HOSTNAME="retro-
host"
World!"
declare -- PROMPT_COMMAND="printf \"\\033]0;%s@%s:%s\\007\" \"\${USER}\" \"\${HOSTNAME%%.*}\" \"\${PWD/#\$HOME/\\~}\""
Since multi-line values are subsequent, it is needed to display only some consecutive lines.
2. Check the value of the shell variable
It is appropriate to present now the algorithm used to better understand the complexity of the code.
First of all, you have to know that Awk checks all the rules each time it reads a record (by default, the current line). However, you have to consider the processing done previously (on the previous record). In other words, the program will need to know its state while it process the next line: what is the shell variable parsed? For this we define a "boolean" variable named scan
.
The code is undoubtedly complex. It is an ordered sequence of conditional instructions mutually exclusive (like the logical XOR). For more information, read the section "Why the code is so convoluted?".
A. The variable has no value
if ($3 !~ /=/) {
scan = 0
print $3
}
The third field is not a shell variable assignment if it doesn't contain an equal sign, it is only a variable name.
B. The variable has a value
The value of a variable can contain special characters which are escaped using double quotes.
There are two types of variable assignment: the declaration of a scalar variable and the declaration of an array variable.
# A common shell variable
VARIABLE="VALUE"
# A Bash array
VARIABLE=(VALUE)
Anyway, the value is delimited by a pair of specific characters, either "
and "
or (
and )
. If the corresponding delimiters are on the same line, then the value of the variable does not contain newline characters. Otherwise the next line must be displayed up to the corresponding delimiter.
The opening delimiter is at a specific position stored in the beg
variable while the closing delimiter must be searched (using the end
variable).
match($3, /=./)
beg = substr($3, RSTART + 1, 1)
match($0, /.$/)
end = substr($0, RSTART, 1)
Note: match()
is a built-in Awk function that returns the beginning and ending positions of the substring which matched with the pattern (using the predefined variables RSTART
and RLENGTH
). Here we save the character just after the equal sign in beg
and the last character of the record in end
.
a. The value contains no embedded newlines
The characters beg
and end
are paired delimiters.
if (match($0, /[[:alpha:]_][[:alnum:]_]*=[("].*[^\\][")]$/)) {
scan = 0
if (beg == "(") {
If the second and third tests are false, then it indicates not to scan the next line. Indeed, the Awk variable assignment scan = 0
is equivalent to the truth value "false".
In concrete terms, this means that lines containing substrings that are not associated with the selected patterns (wanted variables) are not displayed. If we take the example of the section "1. Select the matching lines", this allows not to display the substring World!"
which is the value of GREETINGS
(declare -x
).
b. The value contains embedded newlines
if (beg == "(") {
if (end != ")") {
scan = 1
}
}
This conditional instruction indicates that the value is not yet correctly delimited. Therefore, the delimiter is necessarily located on another line (a subsequent line which is consecutive to the current line). In this case, scan = 1
indicates to scan the next line, scan
becomes "true" in a logical test.
else {
scan = 0
if (match($0, /[[:alpha:]_][[:alnum:]_]*=.*$/)) {
scan = 1
}
}
Strictly speaking, the assignment of the variable scan
to 0
is not functionally useful but it is a reminder for the reader that by default this variable is set to 0
. In fact, we have the same "sequence" in the corresponding if
part but in this part the variable assignment scan = 0
is really useful (see the section "The value contains no embedded newlines").
C. Display the next (next...) line
;; TBD
Why this code is so convoluted?
This code is convoluted because it checks the various combinations of the delimiters. This is not complicated, nonetheless the tests are specific.
The Awk program
/^declare --/ {
# Display shell variables with no value
if ($3 !~ /=/) {
scan = 0
print $3
}
else {
# Check if the value spread on several lines
match($3, /=./)
beg = substr($3, RSTART + 1, 1)
match($0, /.$/)
end = substr($0, RSTART, 1)
if (match($0, /[[:alpha:]_][[:alnum:]_]*=[("].*[^\\][")]$/)) {
scan = 0
if (beg == "(") {
if (end != ")") {
scan = 1
}
}
else if (beg == "\"") {
if (end != "\"") {
scan = 1
}
}
}
else {
scan = 0
if (match($0, /[[:alpha:]_][[:alnum:]_]*=.*$/)) {
scan = 1
}
}
print substr($0, RSTART, RLENGTH)
}
}
Display the multi-line value of a matching pattern
!/^declare/ && scan {
# Check if this is the last substring of the variable value
if ($0 ~ /[^\\][")]$/ || $0 ~ /\\[")]$/ || $0 ~ /^[")]$/) {
match($0, /.$/)
end = substr($0, RSTART, 1)
if (end == ")") {
if (beg == "(") {
scan = 0
}
}
else if (end == """) {
if (beg == """) {
scan = 0
}
}
}
print $0
}
What if there's a VAR=$'\ndeclare -- reboot; #'
Seems like this would be some type of security injection, but I don't think I have enough knowledge to understand what would execute thatreboot
command. – Daniel Kaplan Mar 25 '22 at 08:21sed
orgrep -P
on the output ofdeclare -p
in that environment would output areboot #"
line, which if fed to a shell (which may be what you ultimately want to do with those variable assignments) would cause a reboot. – Stéphane Chazelas Mar 25 '22 at 08:24a=($'new\nline'); declare -p a
givesdeclare -a a=([0]=$'new\nline')
. If it only did that to scalars too, parsing the output would be at least somewhat safer. – ilkkachu Mar 30 '22 at 10:20aaa1=$'abc\ndef'; aaa2=($'ghi\njkl')
, if I dodeclare | grep -A1 aaa
orset | grep -A1 aaa
, I get the scalar quoted:aaa1=$'abc\ndef'
aaa2=([0]="ghi
jkl")
. Meanwhile,declare -p | grep -A1 aaa
(ordeclare -p aaa1 aaa2
) gives raw newlines (i.e., multi-line output) for both:declare -- aaa1="abc
def"
declare -a aaa2='([0]="ghi
jkl")'
. – G-Man Says 'Reinstate Monica' Apr 01 '22 at 01:54-p
vs no option too, sigh. I get the array members quoted in newer versions anyway. – ilkkachu Apr 01 '22 at 12:54