0

So I have lately been have some server performance issues. Currently we are running a Fedora server with 4GB and 160gb disk space. We are pretty much capping out the disk, with all of the files we have on board. We are running multiple websites with multiple back ups for each website. Only one site actually gets traffic though. It's an ecommerce site with a good amount of visitors.

As of late there have been slow load times and I notice our free memory getting real low (below a GB). I will restart the server (which I have to do 3 times a day now) and everything will be okay. We start off with 2.2GB of freed up memory, but after 3 or 4 hours you notice the memory is getting soaked up and the load times crawl. I can't figure out where this is coming from or if it's just time we upgrade to a better server. I just don't want to upgrade then realize I am bottle necked somewhere with MySQL requests.

Any ideas or suggestions would be appreciated.

EDIT-

There are 3 vhosts as well and I am well over 60,000 files.

             total       used       free     shared    buffers     cached
Mem:          4003       3372        630          0        398       1717
-/+ buffers/cache:       1256       2746
Swap:         8189          0       8189

21:21:49 up 46 min,  1 user,  load average: 3.75, 4.20, 4.03

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2      0 592728 409640 1838360    0    0   165   411  953  473  9  8 47 36  0

And here is the top snapshot.

 1356 mysql     20   0 1374m 219m 5320 S  5.6  5.5  14:06.21 mysqld
15796 root      20   0  103m  20m  440 D  1.0  0.5   0:04.42 sendmail
 1081 root      20   0  103m  20m  440 D  0.7  0.5   0:21.73 sendmail
24013 root      20   0 97416  22m 2648 D  0.7  0.6   0:15.15 mailq
 1525 root      20   0  247m 7980 3472 S  0.3  0.2   0:06.88 vlogger (access
 1530 apache    20   0  539m  13m 3008 S  0.3  0.3   0:03.56 httpd
 2399 apache    20   0  539m  12m 2748 S  0.3  0.3   0:00.85 httpd
 5763 root      20   0  121m 4932 3868 S  0.3  0.1   0:00.07 sshd
12326 apache    20   0  539m  12m 2992 S  0.3  0.3   0:00.38 httpd
12421 apache    20   0  539m  12m 2988 S  0.3  0.3   0:00.45 httpd
16396 apache    20   0  538m  12m 2284 S  0.3  0.3   0:00.09 httpd
17050 root      20   0 15368 1256  868 R  0.3  0.0   0:00.09 top
    1 root      20   0 37336 4104 1908 S  0.0  0.1   0:02.82 systemd
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/0
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0H
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.11 migration/0
    9 root      RT   0     0    0    0 S  0.0  0.0   0:00.01 watchdog/0
   10 root      RT   0     0    0    0 S  0.0  0.0   0:00.14 migration/1
   12 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/1:0H
   13 root      20   0     0    0    0 S  0.0  0.0   0:00.02 ksoftirqd/1
   14 root      RT   0     0    0    0 S  0.0  0.0   0:00.01 watchdog/1
   15 root      RT   0     0    0    0 S  0.0  0.0   0:00.15 migration/2
   17 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/2:0H
   18 root      20   0     0    0    0 S  0.0  0.0   0:00.03 ksoftirqd/2
   19 root      RT   0     0    0    0 S  0.0  0.0   0:00.01 watchdog/2
   20 root      RT   0     0    0    0 S  0.0  0.0   0:00.11 migration/3
   22 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/3:0H
   23 root      20   0     0    0    0 S  0.0  0.0   0:00.02 ksoftirqd/3
   24 root      RT   0     0    0    0 S  0.0  0.0   0:00.01 watchdog/3
   25 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 cpuset
   26 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 khelper
   27 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kdevtmpfs
   28 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 netns
   29 root      20   0     0    0    0 S  0.0  0.0   0:00.00 xenwatch
0cean_
  • 1
  • 1
    Please post the output of free -m; please detail what kind of web server. How many vhosts, how many files, any WAF measures...please do post ulimit -n ran as a normal user. Also add uptimeand vmstatoutputs please. – Rui F Ribeiro Feb 11 '16 at 21:15
  • 1
    Are any of the sites running home-brew programs that might be leaking memory? Get memory usage by process to see what is eating memory. – vonbrand Feb 11 '16 at 21:32
  • could you please also post a snapshot from the top command ?If possible extend your terminal length so it shows 20-25 top processes in the list. What you are describing is a typical behavior of running multiple java applications, one or more of which not doing the required janitorial sweep of shared memory – MelBurslan Feb 11 '16 at 21:33
  • No, there is not a custom program or anything like that. – 0cean_ Feb 11 '16 at 21:34
  • What do you mean by "load times crawl"? Page load times as you serve pages? What kind of pages? – Andrew Henle Feb 11 '16 at 21:44
  • Loading up a page on the site. It takes some time for the client to load from the server. They are php pages with html and css implemented. They have been running fine for quite sometime. – 0cean_ Feb 11 '16 at 21:54
  • High memory usage does not necessarily imply leakage. If it was a serious leak you would go much closer to 0. You have 0 in swap usage. Have you checked any logs? – Runium Feb 11 '16 at 22:10
  • 1
    Your free -m indicate a lot of your RAM is held up by cache. Which is how it works. It is simply fast access of earlier read files from HDD in RAM. If any process needs more then free RAM the cache willingly giveth away. free + buffers + cache = 1717 + 630 + 398 = 2745 or 2746 as the - / + line say. – Runium Feb 11 '16 at 22:32

1 Answers1

0

Ramp up sar and output the ps table every minute. See my detailed answer here.

The next time the server blows up, use sar -r to help track down when it happened. Now use the output from ps-cronjob or from my perl wrapper for ps on github, to figure out which process may have been the culprit.

Let's say the server blew up between 12:00:00 and 13:00:00. Use sar -r -s 12:00:00 -e 13:00:00. From this you should see a spike in the data. (If it's easier, there's a java-based utility to do graphing, but usually it's not worth the hassle.) Let's say you see a spike (or a trough) at 12:15. Now scan the columnized ps output for a time range between, say, 12:00 and 12:15, sort it by pid and then time, and look to the memory columns:

awk '/^=== .* 12:00:/,/^=== .* 12:16:/' /var/log/sa/ps/today |
 sort -k 1n -k 16 

(The sort options assume the time is in column 16, which may or may not be the case). Now you can filter that output through awk again to find differences between output lines:

... | awk 'lastpid && lastpid==$1 && last != $0 { print} /^[0-9]/ { lastpid=$1;last=$0; }'

That's a pretty crude filter. For some processes (whose command line changes all the time, such as with mysql and postgresql and snmpd), this won't be very helpful, but hopefully you can tweak the awk to help you find the culprit(s).

Otheus
  • 6,138