Mmaping tremendously large files

Question

I have a very large disk drive (2TB), but not very much RAM (8GB). I'd like to be able to run some big data experiments on a large file (~200GB) that exists on my disk's filesystem. I understand that will be very expensive in terms of disk bandwidth, but I don't mind the high I/O usage.

How could I load this huge file into a C++ array, so that I could perform reads and writes to the file at locations of my choosing? Does mmap work for this purpose? What parameter options should I be using to do this? I don't want to trigger the OOM killer at any point of running my program.

I know that mmap supports file-backed and anonymous mappings but I'm not entirely sure which to use. What about between using a private vs shared mapping?

Joseph Sible-Reinstate Monica · Accepted Answer · 2020-02-01T04:37:44.120

2

It only makes sense to use a file-backed mapping to mmap a file, not an anonymous mapping. If you want to write to the mapped memory and have the changes get written back to the file, then you need to use a shared mapping. With a file-backed, shared mapping, you don't need to worry about the OOM killer, so as long as your process is 64-bit, there's no problem with just mapping the entire file into memory. (And even if you weren't 64-bit, the problem would be lack of address space, not lack of RAM, so the OOM killer still wouldn't affect you; your mmap would just fail.)

edited Feb 01 '20 at 04:37

answered Nov 29 '19 at 03:44

Joseph Sible-Reinstate Monica

3,323

Thanks for the response. How does one ensure/verify that the generated program is 64-bit? – user308485 Nov 29 '19 at 12:02
1

@user308485 How to check if my software is 32-bit or 64-bit – Joseph Sible-Reinstate Monica Nov 29 '19 at 15:30

Mmaping tremendously large files

1 Answers1