file access time after loading file into the cache

Question

I read from here that I could load file into RAM for faster accessing using the below command.

cat filename > /dev/null

However, I wanted to test if the above statement is really true. So, I did the below testing.

Create a 2.5 GB test file as below.

dd if=/dev/zero of=demo.txt bs=100M count=10

Now, calculated the file access time as below.

mytime="$(time ( cat demo.txt ) 2>&1 1>/dev/null )"
echo $mytime
real 0m19.191s user 0m0.007s sys 0m1.295s

As per the command suggests, now I needed to add the file to cache memory. So, I did,
```
cat demo.txt > /dev/null
```
Now, I assume the file is loaded into the cache. So I calculate the time to load the file again. This is the value I get.
```
mytime="$(time ( cat demo.txt ) 2>&1 1>/dev/null )"
echo $mytime
real 0m18.701s user 0m0.010s sys 0m1.275s
```

I repeated step 4 for 5 more iterations to calculate the time and these are the values I got.

real 0m18.574s user 0m0.007s sys 0m1.279s
real 0m18.584s user 0m0.012s sys 0m1.267s
real 0m19.017s user 0m0.009s sys 0m1.268s
real 0m18.533s user 0m0.012s sys 0m1.263s
real 0m18.757s user 0m0.005s sys 0m1.274s

So my question is, why the time varies even when the file is loaded into the cache? I was expecting since the file is loaded into the cache, the time should come down in each iteration but that doesn't seem to be the case.

score 3 · Accepted Answer · answered Sep 18 '14 at 10:49

Nope nope nope!

This is not how it is done. Linux (the kernel) can choose to put some files in the cache and to remove them whenever it wants. You really can't be sure that anything is in the cache or not. And this command won't change that (a lot).

The advice in the link you provided is so wrong in so many ways!

The cache is an OS thing. You don't need to cat the file to /dev/null to take advantage of this. This is actually a very stupid thing to do because you are forcing Linux to read the file one extra time. For instance, if you plan to read one file 4 times. If you don't care about it, the first reading will be quite slow, the 3 subsequent ones should be faster (because of caching). If you are using this "trick", the first reading will be quite slow, all the 4 subsequent ones should be faster (but not null). Just let Linux handle it.
This command is only useful if you want to make sure that Linux keep it in RAM. So you have to perform it often when your system is idle. However, as I said, this is also stupid because you can never be sure that Linux actually cached the file in RAM and even if it did, you would spend time to read it in RAM or on disk (if it was not cached or already removed from the cache).
By doing this repetitively on a big file, you basically trick Linux into thinking that this file should be in RAM at the expense of other files that you actually use more often.

So the conclusion here: don't do this kind of tricks, this is usually counterproductive.

However, if you know that some small files (compared to your RAM size) would really benefit from being accessed from RAM, you can use a tmpfs mount and store your file there. On modern distribs, the /tmp folder is usually a tmpfs one.

Another alternative that I personally found worthy is to compress your file at the FS level with BTRFS for instance or manually (but this one requires that the program that access the file has the ability to decompress it). Of course, your files should benefit from compression, otherwise this is useless. This way, you could be much more confident that Linux keeps your compressed file in RAM (since it's smaller) and if your application is IO bound, loading 100MB from disk instead of loading 10GB should be much faster.

mgutt · Answer 2 · 2020-10-10T16:52:50.947

I repeated your test and executed the command as follows:

dd if=/dev/zero of=/mnt/disk8/Marc/2GB.bin bs=100M count=20

Now, look how fast the file was generated, although the target was an HDD:

20+0 records in
20+0 records out
2097152000 bytes (2.1 GB, 2.0 GiB) copied, 0.6319 s, 3.3 GB/s

What happened:

The file wasn't written to disk, instead it was written to RAM. Reason: vm.dirty_ratio has a default value of 20 and this means it uses 20% of the free RAM as write cache
After some time I was able to see a write transfer rate to the HDD through my server's dashboard. Reason: vm.dirty_expire_centisecs is set to 1500 (default of my Unraid server, default of Linux is 3000) and this means the write to the HDD happens time shifted.

Now lets measure the time that is needed to read the file:

mytime="$(time ( cat /mnt/disk8/Marc/2GB.bin ) 2>&1 1>/dev/null )"
echo $mytime
real 0m0.193s user 0m0.012s sys 0m0.181s

What happened:

The file is still in the Linux Page Cache

Now we clear the cache:

sync; echo 1 > /proc/sys/vm/drop_caches

Next benchmark is slow:

real 0m8.330s user 0m0.017s sys 0m0.753s

We clear the cache again (as our benchmark filled it), open the file again, while moving the content into the trash (you're described "trick"):

cat /mnt/disk8/Marc/2GB.bin > /dev/null

Next benchmark is fast and works as expected:

real 0m0.233s user 0m0.008s sys 0m0.225s

Reasons, why it didn't work for you:

While testing, you were (almost) out of free RAM, so most of the file couldn't be cached
Other read operations overwrote your cached file

Conclusion: You need enough RAM and this "trick" is not persistent. Is it overall useful to cache files manually? It depends. Let's say you are using a Media Server Software like Plex, Emby or Jellyfin. All of them need to provide the Movie Covers for the Clients. Having them in the RAM will cause faster loading times, so its a good idea to cache them. Linux does this automatically and holds them in the active list if they are loaded often. But, and now it could be good idea to use the trick, the cache will be totally overwritten if you request a file that is equal or even bigger than your free RAM. Linux does not skip huge files. Now your nice cached files aren't cached anymore until a Client loads the Movie Covers again and the game with active and inactive lists starts all over. That's the reason why it could be good idea to use O_DIRECT for requesting huge files or instead of using the trick, use vmtouch to lock them in the cache.

file access time after loading file into the cache

2 Answers2