I have a very large disk drive (2TB), but not very much RAM (8GB). I'd like to be able to run some big data experiments on a large file (~200GB) that exists on my disk's filesystem. I understand that will be very expensive in terms of disk bandwidth, but I don't mind the high I/O usage.
How could I load this huge file into a C++ array, so that I could perform reads and writes to the file at locations of my choosing? Does mmap work for this purpose? What parameter options should I be using to do this? I don't want to trigger the OOM killer at any point of running my program.
I know that mmap supports file-backed and anonymous mappings but I'm not entirely sure which to use. What about between using a private vs shared mapping?