Is it beneficial to artificially prime the buffer cache when dealing with larger files?
Here's a scenario: A large file needs to be processed, line by line. Conceptually, it is easy to parallelize the task to saturate a multi-core machine. However, since the the lines need to be read first (before distributing them to workers round-robin), the overall process becomes IO-bound and therefore slower.
Is it reasonable to read all or portions of the file into the buffer cache in advance to have faster read times, when the actual processing occurs?
Update: I wrote a small front-end to the readahead
system call. Trying to add some benchmark later...
mmap
, you can usemadvise
to mark it as sequential read so the kernel will do the readaehead as well and discard old pages. http://man7.org/linux/man-pages/man2/madvise.2.html – Ulrich Dangel Jul 17 '14 at 10:58posix_fadvise
instead? This should be posix compliant and still achieve the same result without relying on dd – Ulrich Dangel Jul 17 '14 at 12:49