1

I have files which are of the following style - these are parametrised configuration files; the values within the #characters are replaced with real values from a database depending on environment.

ABC=#PARAMETER_1#:#PARAMETER_2#
SOMETHING_ELSE=#PARAMETER_1#
SOMETHING_NEW=#PARAMETER_2##PARAMETER_3#

I would like to extract from these files the values between the hash/pound (#) characters, so that I can easily identify the parameters required. There is no standard column width or anything like that, the only standard being that anything between two # characters is replaced with a value from the database.

This is the ideal cleaned, deduped output:

PARAMETER_1
PARAMETER_2
PARAMETER_3

I have seen this question, but the crucial difference is that there can be any number of variables on a particular line in my situation.

I have tagged this question with Bash, but it doesn't have to be, it could be perl etc, it just needs to run from the command line in Unix.

Rich
  • 4,529

1 Answers1

5

As a first idea, awk:

awk -vRS='#[^#]+#' 'RT{gsub(/#/,"",RT);p[RT]=1}END{for(i in p)print i}' the_file

But this decision may depend on the other operations you have to perform.


Explanations as requested in comment.

awk -vRS='#[^#]+#' '   # use /#[^#]+#/ as record separator
RT {   # record terminator not empty?
  gsub(/#/,"",RT)    # remove the # parameter delimiter markup
  p[RT]=1   # store it as key in array p
}
END {   # end of input?
  for (i in p) print i   # loop through array p and print each key
}' the_file

The essential part is the use of RT (record terminator) built-in variable:

   RT          The record terminator.  Gawk sets RT to the input text that
               matched the character or regular expression specified by
               RS.
manatwork
  • 31,277