0

I have the file of about 300 hundreds lines

TITLE      cargas
REMARK   1 File created by GaussView 5.0.9
HETATM    1  O           0       0.957  -0.000  -0.000                       O
HETATM    2  H           0       0.000   0.000   0.000                       H
HETATM    3  H           0       1.197   0.927  -0.000                       H
HETATM    4  O           0      -1.664  -0.019   0.488                       O
HETATM    5  H           0      -2.210   0.327   1.194                       H
HETATM    6  H           0      -2.260  -0.104  -0.257                       H
HETATM    7  O           0       2.189  -2.104   1.321                       O
HETATM    8  H           0       1.559  -1.476   0.968                       H
HETATM    9  H           0       1.764  -2.955   1.216                       H
  ...

and I would like to have the following form of the previous file

TITLE      cargas
REMARK   1 File created by GaussView 5.0.9
HETATM    1  O   LIG     1       0.957  -0.000  -0.000                       O
HETATM    2  H   LIG     1       0.000   0.000   0.000                       H
HETATM    3  H   LIG     1       1.197   0.927  -0.000                       H
HETATM    4  O   HOH     2      -1.664  -0.019   0.488                       O
HETATM    5  H   HOH     2      -2.210   0.327   1.194                       H
HETATM    6  H   HOH     2      -2.260  -0.104  -0.257                       H
HETATM    7  O   HOH     3       2.189  -2.104   1.321                       O
HETATM    8  H   HOH     3       1.559  -1.476   0.968                       H
HETATM    9  H   HOH     3       1.764  -2.955   1.216                       H
  ...

The first three rows have to say LIG and all the others HOH. The numeration of column 5 varies from 1 to 100 by three rows each number.

Thanks in advance for any help.

αғsнιη
  • 41,407
  • 1
    for the follow up question https://unix.stackexchange.com/q/644599/72456، didn't you learn the idea from the previously given answers? – αғsнιη Apr 13 '21 at 11:38
  • Well it is hard to admit, I learned a bit but you are right. It was not enough to do this myself, I am sorry for being a bit slow ... – patprovasi Apr 13 '21 at 12:17
  • 1
    @patprovasi don't worry, we all have to start somewhere. It's just that we expect the people who ask questions to put in some effort first. Presumably, you must have tried to figure this out and you didn't just come here to ask for others to do your work for you, so just add some of your attempts to the question. That way, we know not to repeat the solutions you have already tried that didn't work, and we also see that you have tried something and are not using us as a free script writing service. – terdon Apr 13 '21 at 12:51
  • Thanks a lot, yes indeed I had tryed. But I realise that most people also say what they tryed to do and I didn't – patprovasi Apr 13 '21 at 16:14

1 Answers1

0
awk '
    (NR-2)%3==1 { inc++ }
    NR>2        { $4=(inc==1)?"LIG":"HOH"; $5=inc }1' infile

Above awk command is consist of two conditions followed by the block of actions enclosed within brackets like condition{ "actions" } (the awk common syntax).

In awk, the dollar sign $ is the operator that return the column/field content of its parameter (note that by default awk consider sequences of Tabs/Spaces as the field separator).

based on above explanation exceptionally $0 represent the current line/record content and$1 for first field, $2 second field, $3 third field and so on.

NR is the "The total number of input records seen so far." (from man awk) and so that represent the current line number awk has read for processing.

In this condition (NR-2)%3==1 we increment the variable inc++ for each 4th line starting from the 3rd line (NR-2 skip the first two lines); like consider for example in (NR-0)%3==1 we do checks each 4th line but start from 1st line, or in (NR-1)%3==1 doing the same but start from the second line; generally with (NR-#)%3==1 we checks it's every 4th line but skipping first # number of lines.

do test with awk '(NR-2)%3==1' infile to see which lines it does print.

in the second block, i.e: NR>2{ $4=(inc==1)?"LIG":"HOH"; $5=inc }; we updates the content of the fields #4 and #5 only for the lines that have line number >2 NR>2 (skipping first two line).

this $4=(inc==1)?"LIG":"HOH" sets the field #4 value to "LIG" until var inc still has value=1 else it will take "HOH" value; $5=inc is also take the value of the inc value accordingly.

the 1 at the end is awk idiom and always true condition to printout the current line; see What is the meaning of '1' at the end of an awk script for details.


At the end, to keep the intention between fields, do:

awk -F'( )' '
    (NR-2)%3==1 { inc++ }
    NR>2        { $9=(inc==1)?"LIG":"HOH"; $14=inc }1' infile

or pass the first awk script output to awk ... |column -t.

αғsнιη
  • 41,407