I'd add the following 3 tools into the mix as well. Assuming you have them installed, if not you should be able to install them via whatever repository is provided to your ec2 instance.
The high load is likely being caused by either disk or network I/O so I'd focus on those 2 areas to start.
nethogs
Networking would be my first suspicion, to diagnose that further, I'd use nethogs
to see what processes are causing it.
Example
Determine your network interface, so you can tellnethogs
which one to watch.
$ ip link show up | awk '/UP/ {print $2}'
lo:
em1:
wlp3s0:
virbr0:
In my case I'm going to watch my wireless device, wlp3s0
.
$ sudo nethogs wlp3s0
NetHogs version 0.8.0
PID USER PROGRAM DEV SENT RECEIVED
2151 saml /opt/google/chrome/chrome wlp3s0 2.117 2.715 KB/sec
3569 saml ..4/thunderbird/thunderbird wlp3s0 0.441 1.496 KB/sec
3144 saml ..aml/.dropbox-dist/dropbox wlp3s0 0.081 0.061 KB/sec
3383 saml pidgin wlp3s0 0.026 0.056 KB/sec
4025 saml ssh wlp3s0 0.000 0.000 KB/sec
? root unknown TCP 0.000 0.000 KB/sec
TOTAL 2.665 4.327 KB/sec
Looking at the output we can see that chrome
is using the bulk of my bandwidth.
iftop
You can see if the traffic is coming from a specific set of sites using iftop
.
195kb 391kb 586kb 781kb 977kb
└───────────────┴───────────────┴───────────────┴───────────────┴───────────────
greeneggs.bubba.net => stackoverflow.com 4.68kb 10.2kb 8.24kb
<= 33.5kb 14.7kb 21.4kb
greeneggs.bubba.net => ord08s12-in-f8.1e100.net 0b 3.90kb 3.99kb
<= 0b 3.61kb 3.72kb
greeneggs.bubba.net => ord08s10-in-f16.1e100.net 5.05kb 4.10kb 5.83kb
<= 2.43kb 2.39kb 2.79kb
greeneggs.bubba.net => stackoverflow.com 1.32kb 3.34kb 4.73kb
<= 1.30kb 1.60kb 2.30kb
greeneggs.bubba.net => cpe-67-253-170-83.rochest 0b 2.19kb 760b
<= 0b 2.60kb 862b
greeneggs.bubba.net => pop1.biz.mail.vip.ne1.yah 5.87kb 1.17kb 301b
<= 17.4kb 3.47kb 889b
greeneggs.bubba.net => 190.93.247.58 480b 2.04kb 2.66kb
<= 0b 1.34kb 1.80kb
greeneggs.bubba.net => ig-in-f95.1e100.net 448b 1.02kb 1.27kb
<= 240b 437b 534b
greeneggs.bubba.net => ord08s12-in-f2.1e100.net 896b 346b 218b
<= 480b 221b 124b
────────────────────────────────────────────────────────────────────────────────
TX: cum: 652kB peak: 85.2kb rates: 20.6kb 29.3kb 30.1kb
RX: 883kB 161kb 57.9kb 31.4kb 40.6kb
TOTAL: 1.50MB 241kb 78.5kb 60.7kb 70.7kb
fatrace
You can use the tool fatrace
to see what processes are causing accesses to the HDD.
$ sudo fatrace
pickup(4910): O /var/spool/postfix/maildrop
pickup(4910): C /var/spool/postfix/maildrop
sshd(4927): CO /etc/group
sshd(4927): CO /etc/passwd
sshd(4927): RCO /var/log/lastlog
sshd(4927): CWO /var/log/wtmp
sshd(4927): CWO /var/log/lastlog
sshd(6808): RO /bin/dash
sshd(6808): RO /lib/x86_64-linux-gnu/ld-2.15.so
sh(6808): R /lib/x86_64-linux-gnu/ld-2.15.so
sh(6808): O /etc/ld.so.cache
sh(6808): O /lib/x86_64-linux-gnu/libc-2.15.so
What else?
I'd take a look at this Unix & Linux Q&A that I answered a while ago for more tools to try. It's titled: Determining Specific File Responsible for High I/O.
Follow up questions from comments
Q1: Does bandwidth shown by nethogs count against IO requests in AWS? I thought that would fall under 'data transfer' which is a separate category. In iotop the biggest percentage usage was root and a command called 'kswapd0'. mysqld had the biggest disk write usage and httpd had the most disk read
I have no idea how this actually is tracked by Amazon. These values are from the perspective of the VM host so they may not correlate even remotely to what Amazon is tracking your VMs usage from their perspective.
By the way, this kswapd0
is likely the source of your high IO requests. This is thrashing because, most likely your VM doesn't have enough RAM to satisfy the size/usage of the applications you're running in the VM. So to try and meet the need your system is resorting to making use of swap.
You can confirm this a bit more via the free
command.
Example
$ free -ht
total used free shared buffers cached
Mem: 7.6G 5.5G 2.1G 0B 446M 2.5G
-/+ buffers/cache: 2.6G 5.0G
Swap: 7.6G 40K 7.6G
Total: 15G 5.5G 9.7G
This shows you how much RAM & swap are in use by your system.
Q2: Oh and one follow up question. How does MB or KB of disk read/write in iotop relate to number of IO requests? For example if mysqld wrote 20 M to disk, is there any easy way to know how many IO requests that generated?
There isn't really any correlation that I'm aware of with respect to the number of IO read/writes and the aggregate amount of data read/written to disk.
Given you're using AWS your actual disk read/writes may very well not even be to a local disk, they could be to storage over the network (SoE - aka. SCSI over Ethernet for example).
Your VM would be completely oblivious to this, since the SoE setup would likely be done at the host level and then exposed as disks to any VMs running on the host.
References
nethogs
andiotop
for a while and did some normal activity on the site like loading pages and some editing in the WP backend. Fromnethogs
the biggest user of bandwidth was sshd with only a few KB coming from apache httpd.Question: Does bandwidth shown by nethogs count against IO requests in AWS? I thought that would fall under 'data transfer' which is a separate category.
In iotop the biggest percentage usage was root and a command called 'kswapd0'. mysqld had the biggest disk write usage and httpd had the most disk read
– Joe M Dec 10 '13 at 18:17swap -t
(no h switch in my version) swap always shows 0. Is this because it's just showing an instantaneous snapshot and not a running total? This instance only has 616 MB of memory so I guess it would make sense that swapping is occurring. Is there a way to limit/optimize how much swapping happens or do I just need more memory? – Joe M Dec 10 '13 at 20:48