How to collapse KO categories with the same name and print gene names that were assigned to each category in array, as in example below.
I have this:
K00002 gene_65472
K00002 gene_212051
K00002 gene_403626
K00003 gene_666
K00003 gene_5168
K00003 gene_7635
K00003 gene_12687
K00003 gene_175295
K00003 gene_647659
K00003 gene_663019
K00004 gene_88381
K00005 gene_30485
K00005 gene_193699
K00005 gene_256294
K00005 gene_307497
And want this:
K00002 gene_65472 gene_212051 gene_403626
K00003 gene_666 gene_5168 gene_7635 gene_12687 gene_175295 gene_647659 gene_663019
K00004 gene_88381
K00005 gene_30485 gene_193699 gene_256294 gene_307497
The following command worked (taken from roaima's answer):
tr -d '\r' < file| awk '$1 != p { if (p>"") {printf "\n"} printf "%s",$1; p=$1 } { printf "\t%s",$2 } END { if(p>"") {printf "\n"} }' > output
head -n 5 your_file | hexdump -C
. This will help us detecting non-printable characters. – fra-san Jan 16 '19 at 15:49tr -d '\r' < file | awk '$1 != p { if (p>"") {printf "\n"} printf "%s",$1; p=$1 } { printf "\t%s",$2 } END { if(p>"") {printf "\n"} }'
, without the finalfile
. – fra-san Jan 16 '19 at 16:10