I'm using Debian testing without any problems for ~6 years (I'm just regularly updating it), but recently it started to show a random behaviour that can be summarized as "Low I/O performance which persists until reboot".
The problem is, suddenly all disk reads and writes slow down to ~5MB/sec which results in continuous read and writes. Since the rate is so low, disks are not mechanically challenged or stressed, but everything slows down until I reboot.
I/O subsystem of the computer consists of one OCZ Vertex 3 SSD and two WD Caviar Black HDDs. SSD holds read-heavy part of the OS and a partition on the HDD holds the rest.
To diagnose the problem I tried the following without success:
top
doesn't show any runaway activity neither in CPU nor I/O usage.hdparm
returns normal performance ratings of the disks (I only checked-t
though).smartctl
doesn't show any performance problems in disks. Long tests showed that the disks are as good as new.
System has Z77 Chipset, 16GB of RAM and Intel i7 3770K CPU and the stats show no signs of saturation in RAM, I/O or CPU, but I'm not experienced to debug problems like this (esp. in kernel space). Any help will be appreciated.
Update 1:
- I ran (forced) fsck on every partition as a precaution. All FS are clean.
- Incidentally I found a BIOS upgrade which came out a month ago & applied it.
- No partition is filled more than 50%.
Update 2:
The problem is not surfacing up for two days. Either fsck
or the BIOS update cleaned some clogs in the system. I'm still monitoring the issue and will close the question with a post-mortem answer.
Update 3:
Problem just resurfaced and I did some more digging. Please see the answer.
atop
would tell you how busy the disks are (like when seeking all the time). – Stéphane Chazelas Oct 08 '13 at 12:23noop
. – frostschutz Oct 08 '13 at 12:35hdparm
even in low I/O situation. This makes situation stranger. – bayindirh Oct 08 '13 at 12:57free
) – symcbean Oct 08 '13 at 21:36iowait
shooting up? Either on the whole or on a particular process? If you try to check the logs/schedule self-tests viasmartctl
does that give you more information to work with? – Bratchley Oct 10 '13 at 15:53smartctl
tests are all clean. I've run them during the diagnosis of the problem. Currently I cannot reproduce the issue, but monitoring didn't end yet. – bayindirh Oct 10 '13 at 18:40iowait
etc if you're collecting sar data. I'd enablesysstat
if it isn't already running. You can check withsar -A
most platforms have ten minute sample intervals. – Bratchley Oct 10 '13 at 18:55hdparm -t --direct /dev/sda
says:/dev/sda: Timing O_DIRECT disk reads: 78 MB in 3.07 seconds = 25.41 MB/sec
So it's not a hdd issue, but something about Linux. I suspect this old
– Avio Oct 07 '16 at 14:51ext3
filesystem mounted by the ext4 subsystem is the cause, I'll have to do the tedious job of rsyncing all.ext3
andext4
issue. My problem was the number of files in the cache. Can you please try the solution in my answer and can you take a look whether it, at least temporarily, remedies the problem? Also please take look to output offree
and size of the hard drive cache and the disk activity when things go down. Does your disk constantly writing something small in that case? – bayindirh Oct 10 '16 at 14:07