0

Here is what am trying to do: My data is in sets of rows within same file (the variables vaa vbb are looping over it and am able to control it as required)
What i need is to add an extra column before the data is written to file which is basically a counter that iterates with each set of data.

Eg: Set1 of data row 5-8 , Set2 of data row 14-29, etc

Required output:

1,row5 data
1,row6 data
1,row7 data
1,row8 data
2,row14 data
2,row15 data
.
.
.
2,row29 data
.
.
.

====== Code below ======

awk -v vaa=$varAA -v vbb=$varBB -v vcc=$varC 'NR>=vaa&&NR<=vbb' $I >> part_${I%.*}.csv

I am writing the file to a csv file. I am able to handle the sets of row and counter variable. But unable to formulate the piece of code which can add the extra column feeding data using the variable $varC (which has the incrementing counter)

I have browsed through several forums and the usage/examples are either simply for printing or just adding a column in existing dataset.

I am new to bash coding so unable to understand how to accomplish this. All help is appreciated.

Thanks.

Edit: output_A.csv contains data output_A.txt contains the info regarding what region to what region is the sets of data. (some arithmetic is required which is task specific and i have taken care of) eg of txt file data: 100 200 xyz Here is the complete code for reference:

for I in 'output_A.csv';
do

varC=0
while read line
do

varC=$(( varC + 1 ))

varA=${line%%,*}
varB=$(echo "$line" | cut -d',' -f2- | rev | cut -d',' -f2- | rev)

varAA=$(echo "$varA * 100" | bc -l)
varBB=$(echo "$varB * 100" | bc -l)

#echo -e $varA ' \t' $varB ' \t' $line

awk -v vaa=$varAA -v vbb=$varBB -v vcc=$varC 'NR>=vaa&&NR<=vbb' $I >> part_${I%.*}.csv

done < ${I%.*}.txt

done
  • 1
    How should one determine when the first set of data has been exhausted and the counter should be increased? Please post a sample of the input file. – kos Jun 22 '15 at 06:26
  • Where in your example are vaa, vbb, vcc ? – Costas Jun 22 '15 at 06:44
  • just added the entire code in edits - apologies for missing it out in the first place – NishantNath Jun 22 '15 at 06:48
  • It would be much better if you supply with samples of data files and preferrable output format – Costas Jun 22 '15 at 06:54
  • @Costas my doubt is more syntactical in nature. I can figure out the task specific solution. The examples I see are something like: awk '{print "$variable name",$0}'

    I understand $0 is all columns of the row - How do I relate that to usage of NR controls which in my code controls the range of rows.

    – NishantNath Jun 22 '15 at 06:57
  • The data in output_A.csv is of format: 0.123,1.234,....(about 200 columns)

    The data in output_A.txt is of format: 11.2345,11.2565,xyz

    The data in output_A.txt means 1123-1125 lines are 1 set. (hence the multiplication by 100 to obtain the row number in varAA & varBB

    The desired output is: (set no. eg; 1),0.123,1.234,....(about 200 columns)

    – NishantNath Jun 22 '15 at 07:08

2 Answers2

0

According to format files explanation

awk -F',' 'FNR==NR{
               for(i=($1*100),i<=($2*100);i++)
                   portion=FNR
               next }
           { print portion[FNR], $0 }
          ' output_A.txt output_A.csv >> output_A.result
Costas
  • 14,916
  • thanks this gives me an idea to improve some parts but my issue is with handling the incremental counter varC which increments with each set of rows. – NishantNath Jun 22 '15 at 07:34
0

As I understand it you need to change your awk command from:

'NR>=vaa&&NR<=vbb'

to

'NR>=vaa&&NR<=vbb { print vcc "," $0 }'
meuh
  • 51,383