I have a file containing SNP data called snp.bed
, which looks like this:
head snp.bed
Chr17 214708483 214708484 Chr17:214708484
Chr17 214708507 214708508 Chr17:214708508
Chr17 214708573 214708574 Chr17:214708574
I also have a file called intersect.bed
, which looks like this:
head intersect.bed
Chr17 214708483 214708484 Chr17:214708484 Chr17 214706266 214710783 gene50573
Chr17 214708507 214708508 Chr17:214708508 Chr17 214706266 214710783 gene50573
Chr17 214708587 214708588 Chr17:214708580 Chr17 214706266 214710783 gene50573
I want to print out a modified version of snp.bed
which contains an extra column appended to each row. If a row in snp.bed
matches the first 4 columns of a row in intersect.bed
, then I want to print the entire row from snp.bed
with an extra column obtained by adjoining the last column from the corresponding row in intersect.bed
(the gene name). Alternatively, if a row from snp.bed
does not match any row from intersect.bed
then adjoin an extra column consisting of the string "NA" instead of the gene name.
This is my desired output:
head snp.matched.bed
Chr17 214708483 214708484 Chr17:214708484 gene50573
Chr17 214708507 214708508 Chr17:214708508 gene50573
Chr17 214708573 214708574 Chr17:214708574 NA
How can I do this?
join
... man page for join – RubberStamp Nov 06 '17 at 23:16HanXRQ
prefix in the fourth column of output come from? – MiniMax Nov 06 '17 at 23:23