3

I want all this combinations but I don't have enough memory. How can I release memory in my script?

use strict;
use warnings;

use Algorithm::Combinatorics 'variations_with_repetition';

my @let = qw/ A G C T /;
my @cad = variations_with_repetition(\@let, 24);
print "@$_\n" for @cad;
zorbax
  • 340

2 Answers2

2

The solution is to simply use iterators. By assigning the results of variations_with_repetition to a scalar, it generates an iterator that you can interrogate each time to get the next element. By doing so, you don't keep the entire list in memory and you have access to the first elements immediately. This is a lovely concept called lazy evaluation. Here is the code for your case:

use strict;
use warnings;
use Algorithm::Combinatorics 'variations_with_repetition';

my @let = qw / A G C T/;
my $cad = variations_with_repetition(\@let,24);
while(my $c = $cad->next)
{
    print "@$c\n";
}

Just notice that the iterator actually returns a reference to an array that you have to dereference first and then join or do whatever operation you like on it.

Testing Results: I couldn't run the initial code on my machine (memory usage grows indefinitely as expected), but using iterators, I immediately started getting the output rows, with perl hardly consuming any memory.

Bichoy
  • 3,106
  • 2
  • 21
  • 34
  • I was trying with other code but your explanation seems clear: while (my $p = $iter->next) { my $result = join '', values $p; print "$result\n"; } – zorbax Apr 22 '15 at 01:06
  • Hah, much more straightforward than my solution. All along I was thinking, "hey, this library uses iterators internally, wouldn't it be nice if only it exposed them?" :). – dhag Apr 22 '15 at 20:55
  • Well, your solution is a very smart one in my opinion. Also, I didn't know how to use the iterator version beforehand, I found about it when I was trying to solve this problem. – Bichoy Apr 22 '15 at 23:27
0

Well, enumerating words written on the alphabet (A, G, C, T) is much the same as counting in base four. Knowing this (remove the call to head; it's only there to truncate the very long output while testing):

{ echo 4o; seq 0 $((4 ** 24 - 1)) | sed 's/$/p/'; } | dc | awk '{ printf "%024d\n", $1 }' | tr 0-4 AGCT | head

Explanation:

  • echo 4o is a command that instructs dc to output in base four;

  • seq is asked to count over the entire range that 24-digit base-four number covers;

  • sed appends a p to each line to ask dc to print each number (in base four, remember);

  • awk prepends enough zeros to make the number print 24 digits;

  • tr translates digits (0, 1, 2, 3) to the alphabet (A, G, C, T).

dhag
  • 15,736
  • 4
  • 55
  • 65