Reading specific lines from input file

Question

I have an input file with this general structure. I only want to extract the values step and weight from the hill blocks and put them into a output file using awk/sed/grep. Hill blocks are arranged in similar fashion throughout in the input file.

Edit : I am using MAC OSX.

 configuration {
 step         5000
 dt 2.000000e+00
 }

colvar {
name d1
x  1.70882305580118e+01
v  0.00000000000000e+00
}

1.85104129628346e-02 9.71380137561312e-02 4.00538287370335e-02
1.25662994200839e-02 9.88655406140091e-02 1.41657757894898e-01

hill {
step            0
weight    1.00000000000000e-01
centers   1.23563844380284e+02
widths    1.25331413731550e+00
}
 hill {
 step          100
 weight    1.00000000000000e-01
centers   1.19065310650377e+02
widths    1.25331413731550e+00
}

Through some other answers I manage to found some help :-

 sed 's/^.*weight//' diol_colvar.colvars.state > hill.txt
 sed 's/^.*step//' diol_colvar.colvars.state > hill.txt

Sadly this is not working as I wanted.

I want my output something like this :-

  0     1.00000000000000e-01
  100   1.00000000000000e-01

Please help me sort this issue.

Thanks,

from what I can understand, you want step and weight values only from hill blocks and put the values side by side? is that correct? — Sundeep, Sep 03 '16 at 09:06
no issues.. but some more details will help in simpler solution... are step and weight are guaranteed to be next to each other? is there a chance of them appearing next to each other outside hill blocks? — Sundeep, Sep 03 '16 at 09:23
@sp asic yes, all hill blocks are arranged in similar fashion and step and weight are always appearing similarly. — Grayrigel, Sep 03 '16 at 09:31
alright, please add these details to question, while I will try to give an answer — Sundeep, Sep 03 '16 at 09:39

score 1 · Accepted Answer · edited Apr 13 '17 at 12:36

1

1) With sed

Assuming step and weight occur in consecutive lines,

$ sed -nE '/step/{N;s/.*step\s+(\S+).*\n.*weight\s+(\S+).*/\1\t\2/p}' ip.txt 
0   1.00000000000000e-01
100 1.00000000000000e-01

-nE do not print lines by default and use extended regex
/step/ match if lines containing step
N to get the next line

Note:

The above was tested on GNU sed 4.2.2. The below might help on OS X and other versions. See this Q&A on SO for details, main point being \s might not work same as GNU sed

sed -nE '/step/{N;s/.*step[[:space:]]+([^[:space:]]+).*\n.*weight[[:space:]]+([^[:space:]]+).*/\1\t\2/p}' ip.txt

2) With awk

$ awk 'a ~ /step/ && /weight/{print v"\t"$2} {a=$0; v=$2}' ip.txt 
0   1.00000000000000e-01
100 1.00000000000000e-01

{a=$0; v=$2} saves the line and second field
a ~ /step/ && /weight/ match if previous line contains step and current line contains weight

Assuming the hill blocks are all similar to input given, we can match three consecutive lines to restrict the match only to hill blocks

awk 'b ~ /hill/ && a ~ /step/ && /weight/{print v"\t"$2} {b=a; a=$0; v=$2}' ip.txt

To save the results, add > output_filename to end of command

Reference:

sed pattern matching on consecutivelines

edited Apr 13 '17 at 12:36

Community

1

answered Sep 03 '16 at 09:47

Sundeep

12,008

awk works fine for me. Thanks a lot man, I wish I could have liked it (less reputation). However, sed solution shows this error (sed: 1: "/step/{N;s/.*step\s+(\S ...": bad flag in substitute command: '}' ) – Grayrigel Sep 03 '16 at 10:05
@VikasDubey, glad to hear it works.. see What should I do when someone answers my question? for next steps :) – Sundeep Sep 03 '16 at 10:06
@VikasDubey, what is your sed version? it works for me on GNU sed 4.2.2 – Sundeep Sep 03 '16 at 10:19
does sed work on OS x ? – Grayrigel Sep 03 '16 at 15:04
@VikasDubey, see http://unix.stackexchange.com/questions/13711/differences-between-sed-on-mac-osx-and-other-standard-sed and http://stackoverflow.com/questions/30003570/how-to-use-gnu-sed-on-mac-os-x – Sundeep Sep 03 '16 at 15:17
1

@vikasDubey, see the answer and ensuing comment by same author to https://stackoverflow.com/questions/12178924/os-x-sed-e-doesnt-accept-extended-regular-expressions. sp asic's sed one-liner is correct, but does not apply to "OS X", because -E means extended regex, but not enhanced extended as OS X would need it to be. In OS X extended regex does not include the syntax \s to account for space as it normally does for GNU sed on a non-Apple GNU linux OS. So it's normal to see that sed one-liner fails for you. See what you get if you replace \s by [:space:] ... – Cbhihe Sep 04 '16 at 16:35
sp asic: Can you modify your answer to reflect the above ? --- @vikasDubey: Can you include OS X somewhere in OP or in yr title, so people are not thrown off by subtle differences such as described above ? After modification of sp asic's answer, you could perhaps do everyone a favor by accepting it as the good answer. For that use the green check mark left of his/her answer. ;-) – Cbhihe Sep 04 '16 at 16:39
1

@Cbhihe, thanks for the pointer on OS X issue, updated the answer – Sundeep Sep 04 '16 at 17:01
@VikasDubey: In my comment above, you might want to try [[:space:]] rather than [:space:]. I am not clear on which of the two you should use for lack of an appropriate OS X box on which to check. I think [[:space:]] should work better. Buuuut... check man sedon your box in any case. Sorry for not being completely water-tight on that one... – Cbhihe Sep 04 '16 at 17:03

score 0 · Answer 2 · answered Sep 03 '16 at 13:15

More easy to use awk twice: first time to extract hill { } block and second to extract step/weight values.

awk '/hill *{/,/}/ {print}' \
   | awk '$1 == "step" { st = $2 }; $1 == "weight" { print st "\t" $2}'

This command will work only if weight comes after step but not necessary on consecutive lines.

Reading specific lines from input file

2 Answers2