My awk
proposition:
awk -F'|' '{ if (a[$1$2] == "") {
a[$1$2] = $0
}
else {
a[$1$2] = a[$1$2]","$3"|"$4"|"$5"|"$6
}
}
END {
for (key in a) {
print a[key]
}
}' <input.txt | sort
Explanation
The -F'|'
option defines the field separator (what awk
uses to parse fields within a line) as character '|', since this is how your file is formatted.
a[...]
is an array. Arrays in awk
function a bit like python dictionnaries, instead of indexes you can have keys that are in fact strings. For each line of the input file, the test if (a[$1$2] == "")
checks for the key that corresponds to the first 2 fields ($1$2 = A1
for the first line for example) if there is an entry. If not (first A|1|...
line read), the whole line is stored at this key (a[$1$2] = $0
). If there is already something (another A|1|...
line has already been stored), then we concatenate the entry with a comma and the fields from 3 to 6 separated by "|" (a[$1$2] = a[$1$2]","$3"|"$4"|"$5"|"$6
).
Finally, when we have gone through the file, we need to output the entries for each key. We do this in an END
block (those instructions are executed once all the file has been read). To do this we simply go through all the keys of the array (for (key in a)
) and print the entry for each one.
The final output is then piped to sort
, because awk
will not run through array keys in alphanumeric order, so it's cleaner to sort the output so that you get the A|1|...
line followed by A|2|...
and so on.
Your last edit made the whole thing a bit trickier. The awk
instructions needed are going to get a bit furry, so I would advise that you create a awk
script file (create a text file with .awk
extension e.g. myScript.awk
). Copy the following script inside it:
BEGIN { FS="|" }
$3 == "DLT" {
dlt[$1"|"$2]=$3"|"$4"|"$5"|"$6
a[$1"|"$2]++
}
$3 == "STG" {
stg[$1"|"$2]=$3"|"$4"|"$5"|"$6
a[$1"|"$2]++
}
$3 == "MAIN" {
main[$1"|"$2]=$3"|"$4"|"$5"|"$6
a[$1"|"$2]++
}
$3 == "UNLD" {
unld[$1"|"$2]=$3"|"$4"|"$5"|"$6
a[$1"|"$2]++
}
END {
for (key in a) {
if (dlt[key] == "") dlt[key]="|||"
if (stg[key] == "") stg[key]="|||"
if (main[key] == "") main[key]="|||"
if (unld[key] == "") unld[key]="|||"
print key"|"dlt[key]"|"stg[key]"|"main[key]"|"unld[key]
}
}
To use it :
awk -f myScript.awk <input.txt | sort
If you understood the explanation to my initial answer, you should be able to understand this algorithm. This time we make an array for every data type (dlt, stg, main and unld) and store their values at the key corresponding to the first two fields. Array "a
" is used to keep track of all possible keys. At the end we go through the keys of array a
, and if one of the data arrays is empty at this key, fill it "|||" as you wanted, so that every line ends up with 18 fields.