5

I have a challenge to take an output file ("inventory.list") in this format:

hostname1.env1.domain   | abc  | environment1
hostname2.env1.domain   | abc  | environment1
hostname3.env2.domain   | abc  | environment2
hostname4.env2.domain   | abc  | environment2
hostname5.env1.domain   | def  | environment1
hostname6.env2.domain   | def  | environment2
(6 rows)

and write it out into another file in a different format:

[abc.environment1]
hostname1.env1.domain
hostname2.env1.domain

[abc.environment2]
hostname3.env2.domain
hostname4.env2.domain

[def.environment1]
hostname5.env1.domain

[def.environment2]
hostname6.env2.domain

abc and def are roles assigned to servers and there can be multiple servers for each role, as well as same-named roles but in different environments. I have to break out each hostname into unique groups of [role.environment], and additionally, completely delete the final line of the file which is a row count (this file is an output from an sql query).

I can read the file, strip out the pipes and whitespaces and assign/output the role/environment groupings, no problem:

#! /bin/bash
while IFS='| ' read -r certname role env; do
  printf '%s\n' "[""$role"".""$env""]"
done < "/tmp/inventory.list"

...which neatly gives me the role/environment group names:

[abc.environment1]
[abc.environment2]
[def.environment1]
[def.environment2]

but I cannot figure out how to print out the hostnames linked to each role/environment group underneath each group name, neither can I work out how to get my script to ignore the last row count line. I'm guessing I have to further assign my role and environment fields (second and third fields) to its own array to then refer to it to grab out the hostnames linked to each unique grouping, but I have no idea how to achieve this. Can anyone advise, please?

HelenH
  • 157

4 Answers4

6

I'd use a text utility to process text rather than using a shell loop to process text (though here, IFS=' |' is well adapted to your case). Like:

awk -F ' *[|] *' '
  NF == 3 {host[$2"."$3] = host[$2"."$3] $1 "\n"}
  END{for (i in host) print "[" i "]\n" host[i]}' < file

Note that the order of entries is not guaranteed. With GNU awk, add a BEGIN{PROCINFO["sorted_in"] = "@ind_str_asc"} to sort based on keys.


Depending on which RDBMS you're using, you could also have it display that in the correct format (like using GROUP_CONCAT in mysql or string_agg in postgre).

For a start, you could also probably ask your RDBMS query-utility to format the output in a format more suitable for post-processing (strip headers, footers, and use tab-separated-values for instance).

  • It's postgres that I'm running the query in, and that's a great idea, perhaps I can edit the query so that it doesn't output the header and footer lines, saving me the need to then strip them out. Thanks! – HelenH Jan 12 '16 at 14:41
  • Stephane, I notice you put the space first when you mention IFS=' |' is well suited for this. Is there any functional difference when the space is the first character vs. being later in the value of IFS? – Wildcard Jan 12 '16 at 15:20
  • 1
    @Wildcard, it matters for the expansions of "$*" or "${array[*]}". I didn't put it first on purpose though. – Stéphane Chazelas Jan 12 '16 at 15:25
  • @HelenH to help turn off the header and footer lines from postgres you can use the -t flag (assuming you're doing it from the command line) – Eric Renouf Jan 12 '16 at 15:31
  • Thanks very much @EricRenouf, I'd found that and the \pset --tuples-only from within the query. – HelenH Jan 12 '16 at 15:45
5

Use an associative array to store the certificate names per role and environment.

#! /bin/bash

unset -v envs
declare -A envs

while IFS='| ' read -r certname role env; do
    envs["$role.$env"]+="$certname"$'\n' 
done < /tmp/inventory.list

for e in "${!envs[@]}" ; do
    printf '%s\n' "[$e]" "${envs[$e]}"
done

To sort the sections, you can print the keys, sort them, and then read them back and output the associated values:

for e in "${!envs[@]}" ; do
    printf '%s\n' "$e"
done | sort | while read -r e ; do
    printf '%s\n' "[$e]" "${envs[$e]}"
done
choroba
  • 47,233
  • thank you so much! So the 'for e in ...' section of the new sort replaces the original 'for e in ...' section, such that there's no longer a printf statement? – HelenH Jan 12 '16 at 17:11
  • May I ask please, what in your array code is the bit that says 'only unique output'? I'm trying to adapt your array to work on a second file that holds the data in a different format, without the three-letter role, and out of that I have to output it all in format

    [environment1] hostname1.env1.domain hostname2.env1.domain (but the hostnames should be underneath the environment1 group name, I just can't work out how to display it right here!) But I can't work out how you're handling 'only print out unique entries as group names' ... ?

    – HelenH Jan 12 '16 at 17:17
  • @HelenH: Associative array keys are always unique, so it's the declare -A that handles it. – choroba Jan 12 '16 at 17:18
  • hmm thanks, I thought that might be it so I tried adding a new array name, including using the declare -A, but I keep getting "bad subscript" errors. Wish I knew what I was doing! – HelenH Jan 12 '16 at 17:24
  • @HelenH: Associative array is like a table: the subscripts are strings. – choroba Jan 12 '16 at 18:27
2

Perl magic:

$ perl -lne '@F=split(/\s*\|\s*/); push @{$k{"$F[1].$F[2]"}},$F[0]; 
     END{foreach (keys(%k)){print "[$_]"; print join "\n",@{$k{$_}}} }' file
[abc.environment1]
hostname1.env1.domain
hostname2.env1.domain
[def.environment1]
hostname5.env1.domain
[abc.environment2]
hostname3.env2.domain
hostname4.env2.domain
[def.environment2]
hostname6.env2.domain
terdon
  • 242,166
2

One way using awk:

$ sed 's/ //g' file | awk -F"|" '{x="["$2"."$3"]";}!(x in a){print x;a[x];}{print $1}'
Guru
  • 5,905