I would like to understand this email from 2013, by Mel Gorman:
https://lore.kernel.org/lkml/20131030120152.GM2400@suse.de/
There are still problems though. If all dirty pages were backed by a slow device then dirty limiting is still eventually going to cause stalls in dirty page balancing. [...] Consciously or unconsciously my desktop applications generally do not fall foul of these problems. [...] I'm probably unconsciously avoiding doing any write-heavy work while a USB stick is plugged in.
It suggests the approach below. This has not been implemented yet, i.e. in Linux 5.1.
I still suspect that we will have to bite the bullet and tune based on "do not dirty more data than it takes N seconds to writeback" using per-bdi writeback estimations. It's just not that trivial to implement as the writeback speeds can change for a variety of reasons (multiple IO sources, random vs sequential etc). Hence at one point we think we are within our target window and then get it completely wrong. Dirty ratio is a hard guarantee, dirty writeback estimation is best-effort that will go wrong in some cases.
What were the "outstanding problems" in the kernel code, that the email talks about? Why did Gorman expect to see stalls with the existing code?
E.g. do we know if they could have caused a contemporary PC to appear completely hung for more than 5-10 seconds at a time? Or alternatively, could have caused open windows to appear hung (while still allowing moving the cursor and moving the windows)?
Is there any part of the email which is known to be wrong?
Do the same problems still exist in current Linux kernels?
I have seen a few reports of very similar symptoms, e.g. by Chris Siebenmann in 2017.
Mel Gorman's email is part of an archived discussion. LWN.net wrote up the discussion as "The pernicious USB-stick stall problem". Please avoid simply parroting that article. It is discussed below.
Testing ("what have you tried?")
I run kernel 5.1.6-200.fc29.x86_64
. I have an 8G Imation USB drive. dd if=/dev/zero of=/var/run/media/alan/imation/test bs=1M status=progress
converges to about 4MiB/s. I.e. roughly 25 times less than my main HDD can achieve. Running it at the moment, grep -E 'Dirty|Writeback' /proc/meminfo
converges to around 1GiB of cached writes at any given time. I have 8GiB of physical RAM.
(Writeback
shows only one or two MiB at a time. Note also, the Dirty level can vary. It used to be set as an approximate proportion of MemTotal i.e. physical RAM. Currently, Dirty is set as an approximate proportion of MemAvailable).
During the USB write, my system remains responsive. I can move windows, I can continue using my web browser...
Before, and during, the USB write, I ran the following latency test on my main HDD. It takes the same time in both cases: about 3-4 seconds:
mkdir "$HOME"/tmp
cd "$HOME"/tmp
time bash -c "for (( i=0; i < 100; i++ )); do dd if=/dev/zero bs=16k count=1 of=latencytest status=none; sync latencytest; mv latencytest latencytest2; done"
"The pernicious USB-stick stall problem" - ?
Unusually for LWN, they did not read carefully enough. The LWN article about this conflated two different problems. I cited this claim exhaustively in following link. To try and reduce reader exhaustion, I picked out relevant details below.
The archived emails discuss stalls lasting many seconds, due to having built up a corresponding amount of "dirty" (unwritten) page cache. But there was a bit of confusion about the specifics.
The original complaint was quite clear.
Linus Torvalds re-iterated this complaint using more specific examples. E.g. drag and drop a random 1GB linux.iso
file into a mounted USB flash drive. Say the drive can write 10MB/s. You run time sync
to flush the cached writes to the filesystem, and you find it takes 100 seconds. This feels like an over-large write cache!
This result depends on you having something like 8GB of RAM or more, and also a 64-bit kernel. The write cache is limited to 20% of RAM. The dramatic part was that if you compare a 32-bit kernel, the limit is calculated based on the first 1GB of RAM only. Therefore you could see 8 times longer waiting in the sync
part (or 16 times, or ...) when you switched to a 64-bit kernel. I can understand being alarmed and dismayed by this :-).
A second complaint was also posted. This was a different, "near-stall" scenario which involved the main system disk only.
The mechanics in the second case are more complex. It filled the request queues inside both the main disk, and the kernel IO scheduler for the main disk. You can have a problem with this even when you do not allow 20 seconds worth or more of "dirty" cache. Although there is some overlap between the two issues.
It seems significantly easier to find problems similar to the second case. Sustained writes to my /home
or /
filesystem seem to be much more dangerous in terms of stalling the system (this includes preventing or delaying switching to a text VT e.g. with Ctrl+Alt+F6). I have seen this recently. Less recently: "my system becomes much, much less responsive, every time I clone a VM image."