Interleave seven fastQ files

Question

I have 7 FastQ files and I want to merge them into one in the following way:

File1 line1
File1 line2
File1 line3
File1 line4
File2 line1
File2 line2
File2 line3
File2 line4
File3 line1
File3 line2
File3 line3
File3 line4
.
.
.
File7 line1
File7 line2
File7 line3
File7 line4

I have tried the paste command but that gives me the following:

File1 line1
File2 line1
File3 line1
.
.
File7 line1

It does not take four lines from each file as I need.

I don't see any interleaving here... It looks like you want to concatenate only the first N lines from each file. Please edit your post to clarify - see also the editing help. — don_crissti, Dec 21 '17 at 13:01

ilkkachu · Answer 1 · 2017-12-21T14:16:10.163

1

I'm not sure what you mean with interleaving, but if you just want the first four lines of each file concatenated, as your example shows, loop over them and use head:

for f in ./File[1-7] ; do
    head -n 4 "$f"
done > output.file

(If you use something like File* as the source pattern, don't name the output File.out. If the name of the output matches the glob pattern in the loop, it's also taken as a source file, which gets you the first file's lines twice.)

As @steeldriver noted in a comment, with GNU coreutils the loop is unnecessary and you can just do:

head -qn 4 ./File[1-7]

(-q isn't standard.)

edited Dec 21 '17 at 14:16

answered Dec 21 '17 at 13:56

ilkkachu

138,973

Couldn't you just do head -qn4 File[1-7] (and avoid the loop altogether)? – steeldriver Dec 21 '17 at 14:09
@steeldriver, err, yeah, that should work with GNU head. But apparently not with FreeBSD or POSIX head. – ilkkachu Dec 21 '17 at 14:17
Thanks. But I don't want jut the first four lines. I need the complete files to be merged taking four lines from each file at once. Exactly like the example but for the complete files. – Shounak Chakraborty Dec 21 '17 at 16:18
@ShounakChakraborty, well, that's the thing, your question wasn't very clear on that. Perhaps you'd like to clarify the question a bit? – ilkkachu Dec 21 '17 at 16:53

score 0 · Answer 2 · answered Dec 22 '17 at 05:42

The following perl script opens each file specified on the command line, storing the filehandle for each in an array. Then it repeatedly reads and prints up to 4 lines at a time from each file (checking for EOF each time, decrementing a counter $numopen each time it reaches the EOF of a file) until there are no files left with unread lines.

It doesn't bother closing the file handles because perl automatically closes all open files on exit.

#!/usr/bin/perl

use strict;

my @filehandles=();
my $files=0;

# open each input file
foreach my $filename (@ARGV) {
  open($filehandles[$files++], "<", $filename) || 
    die "Couldn't open '$filename': $!";
}

$files--;
my $numopen = $files;

# print up to 4 lines at a time from each file
while ($numopen > 0) {
  for my $i (0..$files) {
    if (!eof($filehandles[$i])) {
      for (1..4) {
        if (!eof($filehandles[$i])) {
          print scalar readline($filehandles[$i]);
        } else {
          $numopen--;
        }
      }
    }
  }
}

Save this script as, e.g., interleave4.pl make it executable with chmod +x interleave4.pl and run it as, ./interleave4.pl File[1-7]

This script has been tested by creating 7 files with the following bash one-liner.

for i in {1..7}; do printf "File$i %s\n" {1..10} > "File$i"; done

Some of the files were then edited so that they didn't all have the same number (10) of lines, to make sure the script would cope gracefully with that situation (it does - it just moves on to the next file without complaint). Similarly, it also has no problem dealing with input files with line counts that aren't evenly divisible by 4.

Note: this script could easily be modified so that the number of lines to print on each pass through the main loop wasn't a hard-coded 4, but was taken as an option on the command line.

Interleave seven fastQ files

2 Answers2