I previously posted requesting help with counting occurrences of a string. I'm now hoping to search for the occurrence of a string within a range of values and print out a similarly formatted file (the ranges below are sorted by the initial number in the range).
500506 genome 71445 71461 0
500506 genome 308369 308384 0
500506 genome 335450 335533 0
500506 genome 425268 425293 0
500506 genome 623326 623715 0
502289 genome 308370 308384 0
502289 genome 335462 335689 0
502289 genome 425268 425290 0
and I want to get a list showing the range, the number of times I see that range in my file, and which of the line identifiers has that range
71445-71461 1 500506
308369-308369 1 500506
308370-308384 2 500506,502289
335450-335461 1 500506
335462-335533 2 500506,502289
335534-335689 2 500506,502289
425268-425290 2 500506,502289
425291-425293 1 500506
In the example above, 502289 could be either exactly matching the same range as 500506, or may fall somewhere within that range, or vice versa. Will this be do-able with a simple script? Or should I be using something like a perl script instead?
range
because 500506 has 308369-308384 but 502289 has 308370-308384. Please indicate a way of choice. – Costas Jan 20 '15 at 22:26