System lags when doing large R/W operations on external disks

Question

I am having some issues with system-wide latency/lagging when doing large disk imaging operations on an Ubuntu 18.04 system. Here's the system specs:

Processor: Intel Core i7 (never near capacity on any core)

Memory: 12GB (never near capacity)

System disk: SSD (never near capacity)

External disks: USB 3.0 5400 and 7200RPM spinning disks

These large disk imaging operations are basically:

nice ionice dd if=/dev/usbdisk1 of=/dev/usbdisk2

Since none of my system files are on any USB disks, in theory, this shouldn't introduce much latency. But I find when I'm imaging more than one USB disk, the system just comes to a crawl. Why? My understanding is that each disk has its own IO queue, so what's going on here? How can I remedy it?

Also, FWIW, I don't care at all about the imaging speed of the USB disks, so solutions which slow these operations in favor of the system running smoothly are fine by me.

Hey Mr T, you mention having this problem in Ubuntu, right? I'm also having I/O problems in MX Linux (it's built on top of Debian Buster, with updated kernel 5.8 among other modifications...) I have found out by accident that the same issue DOES NOT happen when I boot the same machine into Manjaro. In Manjaro, all disk operations are instant, no issue whatsoever. Could you boot into Manjaro for a test? I'm trying to find out what's causing this, it seems to me that the Debian guys know about these bugs for years but they gave up on it, they simply close bugs without resolving them. — Winampah, May 17 '21 at 19:40

sourcejedi · Accepted Answer · 2019-03-26T13:01:10.077

How can I remedy it?

When you write a disk image, use dd with oflag=direct. The O_DIRECT writes will avoid writing the data through the page cache. Note oflag=direct will require a larger block size, in order to get good performance. Here is an example:

dd if=/dev/usbdisk1 of=/dev/usbdisk2 oflag=direct bs=32M status=progress

NOTE: Sometimes you might want to pipe a disk image from another program, such as gunzip. In this case, good performance also depends on iflag=fullblock and piping through another dd command. There is a full example in the answer here: Why does a gunzip to dd pipeline slow down at the end?

(An alternative solution is to use oflag=sync instead of oflag=direct. This works by not building up a lot of unwritten cache pages).

My understanding is that each disk has its own IO queue, so what's going on here?

They do. However, the written data is first stored in the system page cache (in RAM), before queuing IO...

EDIT:

Since this answer was accepted, I assume you re-tested with oflag=direct, and it fixes your problem where "the system just comes to a crawl". Great.

The safest option would be to add iflag=direct as well. Without this option, dd is still reading data through the system page cache. I assume you did not add this option without telling me. This is one hint towards your specific problem.

It should be clear that reading too much data through the page cache could affect system performance. The total amount of data you are pushing through the page cache is several times larger than your system RAM :-). Depending on the pattern of reads, the kernel could decide to start dropping (or swapping) other cached data to make space.

The kernel does not have infallible foresight. If you need to use the data that was dropped from the cache, it will have to be re-loaded from your disk/SSD. The evidence seems to tell us this is not your problem.

Dirty page cache limits

However, more likely your problem has to do with writing data through the page cache. The unwritten cache, aka "dirty" page cache, is limited. For example you can imagine the overall dirty page cache is limited to 20% of RAM. (This is a convenient lie to imagine. The truth is messily written here).

If your dd command(s) manage to fill the maximum dirty page cache, they will be forced to "block" (wait) until some of the data has been written out.

But at the same time, any other program which wants to write will also be blocked (unless it uses O_DIRECT). This can stall a lot of your desktop programs e.g. when they try to write log files. Even though they are writing to a different device.

The overall dirty limit is named dirty_ratio or dirty_bytes. But the full story is much more complicated. There is supposed to be some level of arbitration between the dirty cache for different devices. There are earlier thresholds that kick in, and try to limit the proportion of the maximum dirty cache used by any one device. It is hard to understand exactly how well it all works though.

I think you mention you have a problem when imaging "more than one USB disk". For example maybe the per-device thresholds work well when you are trying to write one of your disks, but break down once you are writing more than one at the same time. But that's just a thought; I don't know exactly what's happening.

@Mr.T I'm afraid it is a bit messier now. I edited it because I don't trust the 2013 LWN article any more. — sourcejedi, Jan 04 '19 at 13:23
Thanks for updating this answer. Indeed your original one fixed my issues, but I'm now using direct reads and writes on all the imaging operations and system performance isn't having any issues. — Mr. T, Jan 07 '19 at 07:00
In case anybody's wondering, how heavy IO could possibly influence already running in RAM apps: this mail summarizes it, basically, the reason is that GUI using text, and that text resides in RAM cache (I guess by "text" they mean "fonts"). And when heavy IO happens, that text may get evicted from the cache, so it needs to be read from the storage anew, which results in lags and freezes for GUI. — Hi-Angel, Aug 17 '19 at 21:55
Hey SourceJedi, your answer seems to be the solution for me as well, but I'm having a deeper problem than that: my problem is system-wide, the slowness and unresponsiveness happens everytime Disk Swapping is needed, and I run applications that don't have a direct memory control (Dot Net, C#, etc) which means they RELY on disk swapping to operate normally, it's not avoidable (applications are not mine, I don't have the source) But here's the catch: the same issues DO NOT happen when I reboot into Manjaro. In Manjaro, disk operations are instant. — Winampah, May 17 '21 at 19:45
SourceJedi, another way to reproduce the problem I'm having is running the Swapoff command, or even a DD command to create an empty file on disk. This happens on MX Linux (a modified version of Debian Buster with updated kernel 5.8 among other updates...) Swapoff a swapfile that is occupying less than 1 GB will take more than 5 minutes to complete. But when I do the same tests in Manjaro, they are instant. More details are described in a question I've just created. — Winampah, May 17 '21 at 19:47
SourceJedi, summing it up, if I had to put it into a few words: How would I go around into applying these same changes system-wide, and not only in one command? The problems I'm having are system-wide. — Winampah, May 17 '21 at 19:59

System lags when doing large R/W operations on external disks

1 Answers1

Dirty page cache limits

Related:

Linked