Yes, I know it sounds odd. Sequential and Binary-splits don't mix.. That is unless the sequence is the byte offset within the file itself...
I've scrambled together a binary split search in bash script, using dd
. It has Search-and-Find times of 3-9 seconds for an 8 GB file.. so it works (but slower than I know it can be)... I'd really prefer to not have to polish this wheel; It just took my fancy to do it as an exercise in bash (having a project is the best way to learn a language, etc). I think this would be pretty straight forward in C/++, etc... I'm curious to see some other examples (particularly bash ones.
head -n $(($(cat file | wc -l) / 2)) file
for the first half andtail ...
for the second half. – Marcin Mar 09 '11 at 02:05head
/tail -c
. It slows down dramatically to 2min for just one access at offset=8GB, and that's without usingwc
.wc
alone takes 2min on 8GB (see 3rd comment here: http://unix.stackexchange.com/questions/8444/is-it-possible-in-bash-to-start-reading-a-file-from-an-arbitary-byte-count-offse/8498#8498)... So head/tail split may tehcnically work, but simpleawk
/sed
search is faster.. 'dd's skip avoids most sequentail reads. My max 9sec comes from a file with each Key having 86400 duplicates for which I used a combo of split and seq-reads) – Peter.O Mar 09 '11 at 03:08