Can awk † find the nth iteration of a "{
" and return everything up to the next "}
" character?
[EDIT: yes... solution from Ed Morton at bottom]
† I've been assuming awk is the correct tool for this job. Other ideas are welcome.
I need to isolate blocks of text in hundreds of files. Some files have only one block, but others contain dozens.
sample:
$ cat samp2.txt
//////////////////////////////////
// North Carolina office
// satellite branch
//////////////////////////////////
{
first "John"
last "Doe"
address "163 Main Street"
age "25"
gender "male"
}
It may be best to >
the current block into a temp file so the script can operate on it before addressing the next. They'll end up in separate files anyway.
I suspect awk can be given an index to find the nth match. The bash script can manage the loop and iteration.
I've gotten close
$ awk '/\{/{flag=1;next}/\}/{flag=0}flag' samp2.txt
first "John"
last "Doe"
address "163 Main Street"
age "25"
gender "male"
However, since the above operates on the entire file it doesn't work for files containing more than one block (e.g. below). Irrespective of how many blocks in any file, I need every block separated to be processed individually.
Some files contain comments, but many do not--with no standard. I discard them, but the inconsistency means comments can't be relied upon for tracking where we are. The only given is the curly braces (and the line separation).
The text is always newline-separated, but not always a blank line between blocks. The data pairs vary, so this can't be a simple grep 5 lines and proceed
solution.
$ cat samp3.txt
//GROUP1
{
first "John"
address "124 Main Street"
last "Jones"
special "supervisor"
age "35"
gender "male"
}
//The fourth group
{
first "John"
address "125 Main Street"
last "Jacob"
age "30"
gender "male"
}
{
first "John"
address "523 Main Street"
last "Jingle"
age "40"
gender "male"
}
My above awk statement runs through all groups, mashing them all into one large paragraph.
$ awk '/\{/{flag=1;next}/\}/{flag=0}flag' samp3.txt
first "John"
address "124 Main Street"
last "Jones"
special "supervisor"
age "35"
gender "male"
first "John"
address "125 Main Street"
last "Jacob"
age "30"
gender "male"
first "John"
address "523 Main Street"
last "Jingle"
age "40"
gender "male"
I need to tell awk to look for the nth "{
" and then dump to the nth "}
" separately, like this instead:
first "John"
address "124 Main Street"
last "Jones"
special "supervisor"
age "35"
gender "male"
(awk exits, bash script does its thing)
first "John"
address "125 Main Street"
last "Jacob"
age "30"
gender "male"
(awk exits, bash script does its thing)
first "John"
address "523 Main Street"
last "Jingle"
age "40"
gender "male"
(awk exits, bash script does its thing)
[etc]
The intent is similar to a non-greedy regex match of the nth "{ .+ }
" .
With that, there may be a perl solution that's smarter?
TIA.
This code got me what I need. Adapted from Ed Morton's answer.
awk -v n=$LoopVariable -v RS='}' 'NR==n{gsub(/.*\{\r?\n|\n$/,""); print}' $SourceFile
EDITS: Input really helped me isolate my question to what I need. Thank you for that.
I've found a few SE questions that seem similar, but if these contain my solution I'm not well-versed enough in awk to see the connection.
RS
to either the empty string or}
. – icarus Aug 09 '21 at 02:24//GROUP
lines should be ignored, right?) Can there be{
and/or}
anywhere other than at the beginning or end of a block? And, yeah, it would help if you gave a *hint* at what ‘processing’ you want to do, and/or show us the code you have written. … (Cont’d) – Scott - Слава Україні Aug 09 '21 at 04:53