I want to write a data parser script. The example data is:
name: John Doe
description: AM
email: john@doe.cc
lastLogon: 999999999999999
status: active
name: Jane Doe
description: HR
email: jane@doe.cc
lastLogon: 8888888888
status: active
...
name: Foo Bar
description: XX
email: foo@bar.cc
status: inactive
The key-value pairs are always in the same order (name
, description
, email
, lastLogon
, status
), but some of the fields may be missing. It is also not guaranteed that the first record is complete.
The expected output is delimiter-separated (e.g. CSV) values:
John Doe,AM,john@doe.cc,999999999999999,active
Jane Doe,HR,jane@doe.cc,8888888888,active
...
Foo Bar,XX,foo@bar.cc,n/a,inactive
My solution is by using a while read
loop. The main part of my script:
while read line; do
grep -q '^name:' <<< "$line" && status=''
case "${line,,}" in
name*) # capture value ;;
desc*) # capture value ;;
email*) # capture value ;;
last*) # capture value ;;
status*) # capture value ;;
esac
if test -n "$status"; then
printf '%s,%s,%s,%s,%s\n' "${name:-n\a}" ... etc ...
unset name ... etc ...
fi
done < input.txt
This works. But obviously, very slow. The execution time with 703 lines of data:
real 0m37.195s
user 0m2.844s
sys 0m22.984s
I'm thinking about the awk
approach but I'm not experienced enough using it.
:
in them? For exampledesc
– ibuprofen Jun 22 '21 at 07:49