I have a folder with almost 100 files, organized in groups of 16 files each. I need to concatenate each of the 16 files of each group into a single file. For example, one group of file names is:
randomString_$groupName-
I have a folder with almost 100 samples, the sample are run on the Nextseq500 and are single stranded. Each sample is run on 4 Flowcells for the Nextseq500 having 4 lanes. So per sample 16 fastq files are generated (see example below). Now I want to concatenate all these files and generated one output with name 102697-001-001_R1.fastq.gz
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz
HJJMYBGXX_102697-001-001_ATTACTCG-GCCTCTAT_L001_R1.fastq.gz
HJJMYBGXX_102697-001-001_ATTACTCG-GCCTCTAT_L002_R1.fastq.gz
HJJMYBGXX_102697-001-001_ATTACTCG-GCCTCTAT_L003_R1.fastq.gz
HJJMYBGXX_102697-001-001_ATTACTCG-GCCTCTAT_L004_R1.fastq.gz
All of the files above should be concatenated into a single file named 102697-001-001_R1.fastq.gz
(so keeping the string between the two first _
and after the last _
as the name).
I have tried:
$ cat HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGTLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGVLWBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L001_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L002_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L003_R1.fastq.gz \
HGWWHBGXX_102697-001-001_ATTACTCG-AGGCTATA_L004_R1.fastq.gz > 102697_001_001_R1.fastq.gz
and it works, but as I have a lot of files, I don't want to do manually.
gzip
, or are they compressed bybgzip
and indexed withtabix
? (i.e. do you also have to regenerate any Tabix indexes?) – Kusalananda Sep 26 '17 at 08:16