I'm using a system with multiple storage devices with different write throughput. As explained in question Why were "USB-stick stall" problems reported in 2013? Why wasn't this problem solved by the existing "No-I/O dirty throttling" code? Linux allows a single slow device to use nearly all of disk cache for write buffering by default, which results in poor performance for all processes, even when those processes write to other devices. And the situation gets even worse if any process uses void sync(void)
which stalls for writing the whole cache. If a process is e.g. writing an ISO image to slow USB memory stick, the whole image may be in kernel cache from the start of the process so sync()
would end up waiting for the whole ISO image being written to the slow memory stick.
Even without processes calling void sync(void)
, all programs will suffer slowdown because instead of using background writing as usual, all the programs are forced to use syncronous writing if current system wide write cached exceeds /proc/sys/vm/dirty_background_bytes
bytes.
I would want to avoid this by preventing a single (or multiple running in parallel) slow storage devices from slowing down the whole system. As far as I know, this requires limiting the cache usage for write buffering allowed a single slow device. The cache should be big enough that the slow and fast device are the bottleneck for any process using those devices but those processes are not limited by other devices, as happens with default configuration.
I've figured out that if I limit /proc/sys/vm/dirty_background_bytes
to 50 MB and /proc/sys/vm/dirty_bytes
to 200 MB then the latency of the system never gets truly bad but I still get some slowdown when I write to slow devices. I think this happens because all writing is forced syncronous when more than 50 MB dirty cache is in use. If we assume a situation where process writing to the slow memory has 52 MB in the cache and another process wants to write 4 KB file to another fast SSD device, that 4 KB write has to be syncronous, too, which causes slowdown to SSD device speeds instead of running at RAM speeds. On the other hand, the max 200 MB write cache may be too little when writing to truly fast SSD devices because the process generating the data may not be fast enough to fill the cache. As a result, heavily limiting those two kernel settings is a tradeoff for avoiding worst case latency but getting non-optimal average performance.
I know that if I set device BDI max_ratio
to value less than 100
, then that will be used as a percentage of the whole write cache that can be used for that device.
However, the default for any device is 100
so any slow device is allowed to force the whole system to slow down. I've tested that setting max_ratio
to value way below 100
works nicely and prevents any slowdown from the slow device because that device cannot waste all of the cache.
How to set max_ratio
to value less than 100
for all devices, including devices connected in the future? I could write a script that runs during system boot and configures all the connected devices but then newly connected storage (be it USB, eSATA or any other connection method) is allowed to take all the write cache.