8

Is there a way to disable access to memory associated with a given NUMA node/socket on a NUMA machine?

We have a bit of controversy with the database vendor about our HP DL560 machines. The DB sales type’s technical support person was animated that we could not use our DL560s but had to buy new DL360s since they have fewer sockets. I believe their concern is the speed of accessing inter-socket memory. They recommended that if I insisted on keeping the DL560s, I should leave two of the sockets empty. I think they are mistaken (AKA crazy) but I need tests to demonstrate that I am on solid ground.

My configuration:
The machines have four sockets, each of which has 22 hyperthreaded physical cores, for a total of 176 apparent cores with a total of 1.5 T of memory. The operating system is Red Hat Enterprise Linux Server release 7.4.

The lscpu display reads (in part):

$ lscpu | egrep 'NUMA|ore'
Thread(s) per core:    2
Core(s) per socket:    22
NUMA node(s):          4
NUMA node0 CPU(s):     0-21,88-109
NUMA node1 CPU(s):     22-43,110-131
NUMA node2 CPU(s):     44-65,132-153
NUMA node3 CPU(s):     66-87,154-175

If I had access to the physical hardware, I would consider pulling the processors from two of the sockets to prove my point but I don’t have access and I don’t have permission to go monkeying around with the hardware anyway.

The next best thing would be to virtually disable the sockets using the operating system. I read on this link that I can take a processor out of service with

echo 0 > /sys/devices/system/cpu/cpu3/online

and, indeed, the processors the processors are out of service but that says nothing about the memory.

I just turned off all the processors for socket #3 with (using lscpu to find which are for Socket#3):

for num in {66..87} {154..175}
do
    echo 0 > /sys/devices/system/cpu/cpu${num}/online
    cat /sys/devices/system/cpu/cpu${num}/online
done

and got:

$ grep N3 /proc/$$/numa_maps
7fe5daa79000 default file=/usr/lib64/libm-2.17.so mapped=16 mapmax=19 N3=16 kernelpagesize_kB=4

Which, if I am reading this correctly, shows my current process is using memory in socket #3. Except the shell was already running when I turned off the processors.

I started a new process that does its best to gobble up memory and

$ cat /proc/18824/numa_maps | grep N3

Returns no records initially but After gobbling up memory for a long time, it starts using memory on Node 3.

I tried running my program with numactl and binding to nodes 0,1,2 and it works as expected ... except I don’t have control over the vendor's software and there is no provision in Linux to set another process as is done with the set_mempolicy service as used by numactl.

Short of physically removing the processors, Is there a way to force the issue?

3 Answers3

3

I believe their concern is the speed of accessing inter-socket memory. They recommended that if I insisted on keeping the DL560s, I should leave two of the sockets empty.

this would have to do with the number of QPI or UPI links and the Intel Scalability (because you mentioned Xeon) between n CPU's whether it is 4S, S4S, S8S. But the fact that there is 4 sockets, means you should be able to access RAM anywhere to a reasonable degree of speediness (S4S would be faster than 4S), but at this level worst case it would be orders of magnitude faster than accessing disk or some other kind of PCIe storage.

for a given process, running on some specific core on CPU 0, 1, 2, or 3 in a quad socket system, then the fastest RAM access is that pool of RAM chips hanging off that given CPU's memory controller. If it has to hop over a QPI/UPI link to some other cpu to then get to that RAM it would be slower and not optimal. But you have to weigh that all against not having enough shared RAM in the first place.

yes there is a way to force the issue, and it's with

cpuset - confine processes to processor and memory node subsets

The cpuset filesystem is a pseudo-filesystem interface to the kernel cpuset mechanism, which is used to control the processor placement and memory placement of processes. It is commonly mounted at /dev/cpuset.

ron
  • 6,575
  • The sales dudes explained to me in great detail about QPI and how slow it is to access memory on another node, except Intel introduced UPI as a QPI replacement in 2017. With the Gen 10 architecture, accessing memory off chip will be slower... but how much slower is another matter.

    Thanks, cpuset looks like what I will need. I'll mark this up now and when I get it running, I'll set it as an accepted solution.

    – user1683793 Jun 19 '19 at 00:37
  • The sales dudes explained to me in great detail about QPI and how slow it is... wow – ron Jun 19 '19 at 16:22
2

Try using cgroups and define your cpu-sockets as group and assign processes to that group with cgrules (cgred, cgoups rules engine daemon). But at first: process and memory will be allocated within Linux with the numa policy. I think the default policy is to locate the memory of a process to the cpu-socket where the process is running. Whenever you need much more memory, be aware you better use the memory at the other numa node and don't use disk swap space.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
Adrian
  • 21
  • Part of the problem is the database vendor informs me I have no control of their processes. They run where they will and if I don't like it, tough. I was hoping to disable the socket's memory and processors external to their software to prove them wrong. Later, they thought they had us hooked and jacked up their prices so my management told them to buzz off and told me to pursue an open source solution. – user1683793 Nov 10 '20 at 23:25
0

This program is used after the process is already running:

taskset - set or retrieve a process's CPU affinity

The taskset command is used to set or retrieve the CPU affinity of a running process given its pid, or to launch a new command with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs...

...a user must possess CAP_SYS_NICE to change the CPU affinity of a process belonging to another user. A user can retrieve the affinity mask of any process.

http://manpages.ubuntu.com/manpages/jammy/man1/taskset.1.html