5

During my workflow I have created this file:

AAGGAGGGAGCTGCATGGAACCTGTGGATATATACACACAAGGTTAACCTCTGTCCTGTAAA  8  
GGAGTTCAGATGTGTGCTCTTCCGATCTGGAGGTCTCTGCTGGGGCCACCCTGTCCTCTCAG  30     
GAGAGAGGAAAGGAAGCGATTGCAGAACTTTCCACAAGGCTTTAGATTCCCCTGTCACAGAG  15  
GGAGGAGAAAGAATCAACTTTATAGCATCAGCCCCTTGTTTATTTTAAGTTCAGGGTTTAAG  13  
GGGAGAACATTTCCCTCCTTGTCCTCTCCTATCTCACTTACTACATTCCCACTGGTCACTGT  7  
GGGACATTTGTGATTACATGGTTGCAGTATTCTTTTTGTTCTTAGTCAGACTGTATAATTGG  4  

I would like to select from each text of the first column the first number of letters as present in the amount of the second column. Like first 8 character of the first row, first 30 character of the second row etc..

Like the first as example the output would be something like this:

AAGGAGGG  
GGAGTTCAGATGTGTGCTCTTCCGATCTGG

Any idea would be really appreciated.

don_crissti
  • 82,805
fusion.slope
  • 684
  • 5
  • 17

2 Answers2

8

With awk:

awk '{ $0 = substr($1, 0, $2) } 1' file.txt

With GNU sed:

sed -r 's/.* ([0-9]+).*/s!^(.{\1}).*!\\1!/' file.txt | \
    cat -n | \
    sed -r -f - file.txt

(GNU sed because it can read script files from stdin).

With perl:

perl -lpe 's/.*?([ACTG]+)\s+(\d+).*/ substr($1, 0, $2)/e' file.txt

Another way with perl:

perl -lape '$_ = substr($F[0], 0, $F[1])' file.txt
Satō Katsura
  • 13,368
  • 2
  • 31
  • 50
1

Without sed:

while read -r d n;do echo ${d:0:$n};done < file.txt 
Ipor Sircer
  • 14,546
  • 1
  • 27
  • 39