0

I have a bunch of .txt files in a directory which have information regarding dipole moment. This is how it looks like:

Dipole Moment: [D]
     X:     1.2808      Y:     0.2908      Z:     1.0187     Total:     1.6622
lorem ipsum
text 
that is 
not 
relevant 
Dipole Moment: [D]
     X:     1.2808      Y:     0.2908      Z:     1.0187     Total:     1.6622
more text

I want to extract the total dipole moment from these files. I am running the following script:

awk '/Dipole Moment: \[D\]/{found=1;next} found{print $NF;found=""}' *.txt > dipole_bma.txt

This script prints out 1.6622 twice. And every other Total dipole moment in each file twice. I see that it prints it out twice because the regex appears twice in the file.

My question is, how do I print total dipole moment only once from each file?

megamence
  • 103
  • 4
    I’m voting to close this question because I answered this question for the OP earlier today at https://stackoverflow.com/a/65793414/1745001. – Ed Morton Jan 19 '21 at 23:25

2 Answers2

1

Use nextfile statement:

awk '/Dipole Moment: \[D\]/{found=1;next} found{print $NF;nextfile;}' *.txt
pLumo
  • 22,565
  • Thanks for your response @pLumo. However, it doesn't seem to work... I am getting blank lines after the first output... – megamence Jan 19 '21 at 18:29
  • with your example it works for me ... do you have some non-unix line endings? – pLumo Jan 19 '21 at 18:30
  • Technically, the files I am using are .out files, so the final .txt becomes .out. I dont know how that could make a difference... – megamence Jan 19 '21 at 18:36
  • no, the extension does not make any difference – pLumo Jan 19 '21 at 18:39
  • @megamence sounds you don't have GNU awk that supports nextfile; instead use another second flag for that like awk 'FNR==1 {once=0; }; /Dipole Moment: \[D\]/ { found=1; next}; found && !once{ print $NF; found=0; once=1; }' file*, but however it still continue reading whole file while it only print once. – αғsнιη Jan 19 '21 at 18:59
  • that link says that nextfile is Posix since 2012 ... – pLumo Jan 19 '21 at 19:12
  • @pLumo please see https://unix.stackexchange.com/q/588906/72456 – αғsнιη Jan 19 '21 at 19:20
0

Gnu sed can do as follows:

sed -ns '
  /^Dipole Moment: \[D]/!d
  $!N;/\n/s/.* //p;:n;n;bn
' ./*.txt
  • -s option (nonPosix) to treat files as separate streams.
  • -n will inhibit default print of the pattern space prior to fetching the next record.
  • After the dipole moment line we stick the next line to the patter space.
  • Remove till the last space (assuming no trailing whitespace) this prints the last field, i.e., the value of dipole moment.
  • then we skid to the end of current file. The whole process repeats for the next file.
guest_7
  • 5,728
  • 1
  • 7
  • 13