I'm working on a project for which I need to collate specific lines of data from multiple files into one new text file. For example, say I had 3 files that each contain a matrix of values:
Text File 1
Obs. TGCP_WM23 STT_WM189 MPO_WM496 PTP_WM724
TGCP_WM23 0.000000 0.174510 0.153292 0.177030
STT_WM189 0.174510 0.000000 0.077663 0.203359
MPO_WM496 0.153292 0.077663 0.000000 0.183706
PTP_WM724 0.177030 0.203359 0.183706 0.000000
Text File 2
Obs. TGCP_WM15 STT_WM187 MPO_WM485 PTP_WM725
TGCP_WM15 0.000000 0.157164 0.145516 0.168991
STT_WM187 0.157164 0.000000 0.051973 0.187443
MPO_WM485 0.145516 0.051973 0.000000 0.171824
PTP_WM725 0.168991 0.187443 0.171824 0.000000
Text File 3
Obs. TGCP_WM1 STT_WM184 MPO_WM489 PTP_WM721
TGCP_WM1 0.000000 0.166831 0.161654 0.192732
STT_WM184 0.166831 0.000000 0.059373 0.202718
MPO_WM489 0.161654 0.059373 0.000000 0.185286
PTP_WM721 0.192732 0.202718 0.185286 0.000000
I want to automate reading the 3 files and printing the second line from each into sequential lines of one new text file, such that the new text file contains:
New Text File
TGCP_WM23 0.000000 0.174510 0.153292 0.177030
TGCP_WM15 0.000000 0.157164 0.145516 0.168991
TGCP_WM1 0.000000 0.166831 0.161654 0.192732
Is there a relatively straightforward way to do something like this using the Terminal on a Mac? As it stands, I'm looking at 2,200 files from which I need to extract and format data so that I can run some downstream analyses. I would like to avoid having to manually open all those files, copy text and paste into a new file where the values are formatted in a more useful fashion.
Edit: All of the files I'm working with are text files outputted from a program called Genodive. Half of the files are Fst matrix files that look like the examples shown above; the other 1,100 files are genetic diversity output files, the contents of which look like...
___________________________________________________________________
GenoDive 3.01, 2019-12-12 23:28:01 +0000
Genetic Diversity: Nei 1987.
File: TrkNbr_1083n1282_L1n2_PrelimPops_02SubSampPops_Rep001.txt
8 of 8 individuals included, 6843 of 6843 loci included
– Summary of indices of genetic diversity
Statistic Value Std.Dev. c.i.2.5% c.i.97.5% Description
Num 1.418 0.006 1.405 1.428 Number of alleles
Eff_num 1.086 0.002 1.082 1.088 Effective number of alleles
Ho 0.092 0.002 0.089 0.096 Observed Heterozygosity
Hs 0.098 0.002 0.094 0.101 Heterozygosity Within Populations
Ht 0.114 0.002 0.110 0.117 Total Heterozygosity
H't 0.122 0.002 0.117 0.125 Corrected total Heterozygosity
Gis 0.055 0.013 0.030 0.079 Inbreeding coefficient
Standard deviations of F-statistics were obtained through jackknifing over loci.
95% confidence intervals of F-statistics were obtained through bootstrapping over loci.
– Indices of genetic diversity per population
Population Num Eff_num Ho Hs Gis
TGCP_WM3 1.261 1.183 0.142 0.141 -0.003
STT_WM186 1.186 1.132 0.088 0.108 0.183
MPO_WM483 1.194 1.136 0.097 0.109 0.110
PTP_WM732 1.095 1.068 0.056 0.051 -0.097
___________________________________________________________________
I don't need to process the Fst files and the genetic diversity files all at once, I want to extract different data from each type of file.
The naming convention of the two file types is as follows:
Fst files are named
TrkNbr_1083n1282_L1n2_PrelimPops_02SubSampPops_Rep001_FstRslts
Genetic diversity files are named
TrkNbr_1083n1282_L1n2_PrelimPops_02SubSampPops_Rep001_GenDivRslts
The distinguishing part of the file names is the '##SubSampPops_Rep###' portion. There's 1,100 'FstRslts' files, and those 1,100 files are subdivided into 11 groups of 100 files...
02SubSampPops_Rep001
02SubSampPops_Rep002
02SubSampPops_Rep003
.
.
.
02SubSampPops_Rep100
04SubSampPops_Rep001
04SubSampPops_Rep002
04SubSampPops_Rep003
.
.
.
04SubSampPops_Rep100
Similarly, there's 1,100 'GenDivRslts' files organized in the same fashion.