How would one go about quickly pulling all of a process's swapped memory out of swap without writing to disk?
The context on this issue is trivial, as the systemic issue necessitating the question is being handled by other parties. However, right now, I have an issue where I frequently have to free up swap space on an OpenVZ node while load and IO wait are extremely high.
The swap is often primarily consumed by a small handful of MySQL and clamd processes running on individual containers. Restarting these services frees the swap and solves the problem on the node, but is undesirable for obvious reasons.
I'm looking for a way to quickly free up the swap from those processes while the node is overloaded and need something faster than my current method:
unswap(){ [[ $1 && $(ls /proc/$1/maps) ]] && ((gcore -o /tmp/deleteme $1 &>/dev/null; rm -fv /tmp/deleteme.$1)&) 2>/dev/null || echo "must provide valid pid";};unswap
This core dump forces all ram to be accessed and thus does the job of pulling it out of swap, but I've yet to find a way to avoid its writing to file. Also, it seems like the process would be faster if I could isolate the address ranges that are currently swapped and just dump that portion to /dev/null, but I've yet to find a way to do that.
This is a huge node, so the usual swapoff/swapon method is prohibitively time consuming, and again, the node's configuration is not under my control, so fixing the root cause is not part of this question. However, any insight into how I could free up a significant portion of swap quickly without killing/restarting anything would be appreciated.
Environment: CentOS 6.7/OpenVZ
Update for anyone that may stumble on this later:
Using Jlong's input, I created the following function:
unswap(){ (awk -F'[ \t-]+' '/^[a-f0-9]*-[a-f0-9]* /{recent="0x"$1" 0x"$2}/Swap:/&&$2>0{print recent}' /proc/$1/smaps | while read astart aend; do gdb --batch --pid $1 -ex "dump memory /dev/null $astart $aend" &>/dev/null; done&)2>/dev/null;};
It's a bit slow, but does exactly what was requested here otherwise. Could probably improve the speed by finding only the largest address ranges in swap, and omitting the iterations for the trivially small areas, but the premise is sound.
Working example:
#Find the process with the highest swap use
[~]# grep VmSwap /proc/*/status 2>/dev/null | sort -nk2 | tail -n1 | while read line; do fp=$(echo $line | cut -d: -f1); echo $line" "$(stat --format="%U" $fp)" "$(grep -oP "(?<=NameS).*" $fp); done | column -t
/proc/6225/status:VmSwap: 230700 kB root mysqld
#Dump the swapped address ranges and observe the swap use of the proc over time
[~]# unswap(){ (awk -F'[ t-]+' '/^[a-f0-9]*-[a-f0-9]* /{recent="0x"$1" 0x"$2}/Swap:/&&$2>0{print recent}' /proc/$1/smaps | while read astart aend; do gdb --batch --pid $1 -ex "dump memory /dev/null $astart $aend" &>/dev/null; done&)2>/dev/null;}; unswap 6225; while true; do grep VmSwap /proc/6225/status; sleep 1; done
VmSwap: 230700 kB
VmSwap: 230700 kB
VmSwap: 230676 kB
VmSwap: 229824 kB
VmSwap: 227564 kB
... 36 lines omitted for brevity ...
VmSwap: 9564 kB
VmSwap: 3212 kB
VmSwap: 1876 kB
VmSwap: 44 kB
VmSwap: 0 kB
Final solution for bulk-dumping just the large chunks of swapped memory:
unswap(){ (awk -F'[ \t-]+' '/^[a-f0-9]*-[a-f0-9]* /{recent="0x"$1" 0x"$2}/Swap:/&&$2>1000{print recent}' /proc/$1/smaps | while read astart aend; do gdb --batch --pid $1 -ex "dump memory /dev/null $astart $aend" &>/dev/null; done&)2>/dev/null;}; grep VmSwap /proc/*/status 2>/dev/null | sort -nk2 | tail -n20 | cut -d/ -f3 | while read line; do unswap $line; done;echo "Dumps Free(m)"; rcount=10; while [[ $rcount -gt 0 ]]; do rcount=$(ps fauxww | grep "dump memory" | grep -v grep | wc -l); echo "$rcount $(free -m | awk '/Swap/{print $4}')"; sleep 1; done
I have yet to determine if this method poses any risk to the health of the process or system, especially when looped over multiple processes concurrently. If anyone has insight into any potential effect this may have on the processes or system, please feel free to comment.
gdb
instances if process to be swapped in has lots of swapped fragments. The script will launch parallergdb
instance for each swapped (big) fragment for top 20 biggest processes. I think one should at least add| tail -n20
after theawk
before passing the results towhile
loop to limit maximum paraller processes to 400. – Mikko Rantalainen Apr 27 '17 at 11:33