I have 50 sets of files containing 9 columns (sample shown in the attached picture).
The files are named as ( 1) inputfile_1.assoc.logistic (2) inputfile_2.assoc.logistic etc….
The values in Columns 1, 2 and 3 are identical in all 50 files
I want to be able to grep columns 7,8 and 9 from all 50 files and add to a single .txt file, to look like this (field to be tab separated and the columns 7,8 and 9 to be labelled as shown)
I have been using grep loop (shown below) to extract the columns individually, save as text file, import the .txt file into stata to merge them , but it is taking considerable time (as I have over 7 million rows) and I need to this for several analyses.
for i in $(seq 1 50); do
gawk -F" " '{print $2, $7, $8, $9}' inputfile_${i}.assoc.logistic >>/mnt/jw01-aruk-home01/projects/jia_mtx_gwas_2016/common_files/output/imputed_dataset/all_50_mi_datasets/acr30R_vs_acr30NR_combined_coefficients/outputfile_${i}.txt
done
Can this be made more efficient and incorporated in a shell loop ?