3

I am running a database server on my computer. Sometimes it takes up so much memory resources that the system stop responding to keyboard and mouse inputs. Although I can move the mouse pointer and turn on and off Caps Lock, I can't do anything else beyond that.

Surprisingly, I am able to SSH to the computer, run the top command, kill a few processes and shutdown the database server to reclaim memory space. But even having done those things, the display remains in a semi-frozen state (mouse still animated).

Having reclaimed most of my memory, is it possible to regain usage of the system without having to reboot?

1 Answers1

3

Freeing up resources should generally return the system to a normal functioning state, so it sounds to me like the system is still struggling to free up resources or, hasn't fully followed through on killing these processes. I'd investigate it further to find out if that's in fact the case. You can see, for example, if something is still writing data to the HDD. There are several tools that can assist with this, I'd start with fatrace to see if you can identify a process that's trying to finish up writing data to the disk.

Example

$ sudo fatrace | head -10
chrome(29486): W /home/saml/.config/google-chrome/Default/Extension State/017912.log
chrome(29486): CW /home/saml/.config/google-chrome/Default/File System/000/p/.usage
chrome(29486): W /home/saml/.config/google-chrome/Default/Extension State/017912.log
chrome(29486): W /home/saml/.config/google-chrome/Default/Extension State/017912.log
chrome(29486): W /home/saml/.config/google-chrome/Default/History-journal
chrome(29486): W /home/saml/.config/google-chrome/Default/History
chrome(29486): W /home/saml/.config/google-chrome/Default/History
chrome(29486): W /home/saml/.config/google-chrome/Default/History
chrome(29486): W /home/saml/.config/google-chrome/Default/History-journal
chrome(29486): W /home/saml/.config/google-chrome/Default/Extension State/017912.log

You'll want to run that command without the | head -10, that's just to show you the example here.

So what's wrong?

If you've ever looked at the output of free you've likely noticed the columns buffers and cache.

$ free 
             total       used       free     shared    buffers     cached
Mem:       7969084    6673652    1295432          0     118588     893916
-/+ buffers/cache:    5661148    2307936
Swap:      8011772    3104804    4906968

These are files that the system is aggressively loading into memory to maximize performance by using as much RAM as it can for this task. When the DB process (or which ever ones is consuming RAM) these files were pushed to swap (I'm assuming) and now cannot come back in since this other task is occupying the HDD's I/O.

What can be done to mitigate this?

One trick is to adjust the VM dirty ratio & VM dirty background ratio, which forces the system to start writing dirty pages of memory out to disk. This activity is often times what's causing a system to seemingly hang, especially in the UI. There are other reasons but this is one of the more frequented ones.

excerpt

By default the VM dirty ratio is set to 20% and the background dirty ratio is set to 10%. This means that when 10% of memory is filled with dirty pages (cached data which has to be flushed to disk), the kernel will start writing out the data to disk into the background, without interrupting processes. If the amount of dirty pages raises up to 20%, processes will be forced to write out data to disk and cannot continue other work before they have done so.

Here's how you can check on your system's current settings:

$ sudo sysctl -a | grep 'dirty.*ratio'
vm.dirty_background_ratio = 10
vm.dirty_ratio = 20

To override these settings you can create the following file, /etc/sysctl.d/dirty_ratio.conf with the following content:

vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

This will cause your system to be more aggressive about writing changes out as the occur. You can activate these changes immediately like so:

$ sudo sysctl -p /etc/sysctl.d/dirty_ratio.conf

Will this resolve the issue?

In my experience you can tweak these values but the true issue is your system is likely just not up to the task(s) you're asking it to perform.

References

slm
  • 369,824
  • Hmmm, command not found. – Question Overflow Mar 08 '14 at 05:47
  • @QuestionOverflow - yum install fatrace. I'd install ftop too, that's basically like top except it centers around file accesses instead of CPU load & memory usage. – slm Mar 08 '14 at 05:48
  • Yes, ok, if it is something to do with disk I/O, why is the graphics still frozen? – Question Overflow Mar 08 '14 at 05:51
  • I think it could be that the graphics process is being swapped to disk instead of on memory. But would killing all processes that showed up on fatrace cause an issue here? – Question Overflow Mar 08 '14 at 05:54
  • @QuestionOverflow - see updates. – slm Mar 08 '14 at 06:11
  • That's a lot of details. Give me some time to digest. Thanks :) – Question Overflow Mar 08 '14 at 06:29
  • @QuestionOverflow - if you're using GNOME3 I have to every day or so restart gnome-shell. You can do this by pressing Alt+F2 and then type the command r to restart it. Are you having this particular issue? Also check out this related Q&A: http://unix.stackexchange.com/questions/31818/what-to-do-when-a-linux-desktop-freezes – slm Mar 08 '14 at 06:41
  • The command to show disk access that I am aware of it iotop. I haven't heard of an ftop, nor has Debian. There is an iotop, but that is for network access. – Faheem Mitha Mar 08 '14 at 08:30
  • @FaheemMitha - ftop is a great little tool, I haven't mentioned it much here. I use it quite a bit on servers when debugging file access I/O issues, you can use lsof as well, I usually mention lsof in these applications b/c it's more widely available. http://code.google.com/p/ftop/. It's typically available in Fedora repos, I usually backport it to CentOS. – slm Mar 08 '14 at 12:58
  • @slm, no, I don't have the issue of having to restart gnome-shell. Yes, I know the command prompt short-cut, but I don't think it will work when the keyboard is not responding. I even tried SysRq to no effect :( And yes, I do need to get more RAM or trim my database. I am just amaze how SSH works when everything else fails. Thanks for your detailed response and patience. I really appreciate it :) – Question Overflow Mar 10 '14 at 15:04
  • @QuestionOverflow -anytime, it was good to work with you as well! – slm Mar 10 '14 at 16:35