9

A simple example. I'm running a process that serves http request using TCP sockets. It might A) calculate something which means CPU will be the bottleneck B) Send a large file which may cause the network to be the bottleneck or C) Complex database query with semi-random access causing a disk bottleneck

Should I try to categorize each page/API call as one or more of the above types and try to balance how much of each I should have? Or will the OS do that for me? How do I decide how many threads I want?

I'll use 2 numbers for hardware threads 12 and 48 (intel xeon has that many). I was thinking of having at 2/3rds of the threads be for heavy CPU (8/32), 1 thread for heavy disk (or 1 heavy thread per disk) and the remaining 3/15 be for anything else which means no trying to balance the network.

Should I have more than 12/48 threads on hardware that only supports 12/48 threads? Do I want less so I don't cause the CPU to go into a slower throttling mode (I forget what it's called but I heard it happens if too much of the chip is active at once). If I have to load and resource balance my threads how would I do it?

intika
  • 14,406

3 Answers3

6

Linux:

The Linux kernel have a great implementation for the matter and have many features/settings intended to manage the ressources for the running process (over CPU governors, sysctl or cgroup), in such situation tuning those settings along with swap adjustment (if required) is recommended, basically you will be adapting the default functioning mode to your appliance.

Benchmark, stress tests and situation analysis after applying the changes are a must especially on production servers. The performance gain can be very important when the kernel settings are adjusted to the needed usage, on the other hand this require testing and a well understanding of the different settings which is time consuming for an admin.

Linux does use governors to load balance CPU ressources between the running application, many governors are available; depending on your distro's kernel some governor may not be available (rebuilding the kernel can be done to add missing or non upstream governors). you can check what is the current governor, change it and more importantly in this case, tune its settings.

Additional documentations: reading, guide, similar question, frequency scaling, choice of governor, the performance governor and cpufreq.

SysCtl:

Sysctl is a tool for examining and changing kernel parameters at runtime, adjustments can be made permanent with the config file /etc/sysctl.conf, this is an important part of this answer as many kernel settings can be changed with Sysctl, a full list of available settings can be displayed with the command sysctl -a, details are available on this and this article.

Cgroup:

The kernel provide the feature: control groups, which are called by their shorter name cgroups in this guide. Cgroups allow you to allocate resources such as CPU time, system memory, network bandwidth, or combinations of these resources among user-defined groups of tasks (processes) running on a system. You can monitor the cgroups you configure, deny cgroups access to certain resources, and even reconfigure your cgroups dynamically on a running system. The cgconfig (control group config) service can be configured to start up at boot time and reestablish your predefined cgroups, thus making them persistent across reboots.

Source, further reading and question on the matter.

Ram:

This can be useful if the system have a limited amount of ram, otherwise you can disable the swap to mainly use the ram. Swap system can be adjusted per process or with the swappiness settings. If needed the ressources (ram) can be limited per process with ulimit (also used to limit other ressources).

Disk:

Disk I/O settings (I/O Scheduler) may be changed as well as the cluster size.

Alternatives:

Other tools like nice, cpulimit, cpuset, taskset or ulimit can be used as an alternative for the matter.

intika
  • 14,406
3

The best answer to this is "suck it and see"... perform some stress tests and see what gives the best results. That's because very minor nuances in the behaviour of your threads can cause differences in performance.


The following largely based on my own experience...

Where to start?

Linux's ability prevent threads getting starved is pretty good. It doesn't necessarily mean that every thread will get an even share of the pie, but all threads will at least get some pie. If you have two threads contending for CPU time... let's say one trying to use 100% CPU and another trying to use only 10%... then don't be surprised if that balances out at 91% and 9% or somewhere around that.

Overall performance can reduce where a particular resource is heavily over subscribed. This is especially true for disk IO on spinning hard disks. The head has to physically move (seek) between places on disk and continual oscillating between different files can cause significant slow down. But this effect is often fairly small if one thread is heavily IO bound and another would like to do a little IO.

Together these two things mean that it is often better to be 20% over subscribed than 20% under subscribed. In other words, don't reserve CPU time for threads which are not trying to use much CPU.

Eg: If you have CPU bound threads and had disk IO bound threads and you have 8 cores and 1 hard disk, then start with 8 CPU bound threads and one hard disk IO bound thread. 7 and 1 might just leave a core idle most of the time. 8 and 1 will almost certainly not starve the HD thread meaning you fully use both HD and CPU.

The danger of short lived threads

Just be wary that Linux can struggle with a lot of short lived threads. This is more obvious with deliberate attempts to damage a system. But continually spawning threads / processes can push Linux to behave badly.

In your question you have described dedicated worker threads which sound like long lived threads. This sounds like the right approach.

The London Bus Effect

You wait for half an hour for a bus then 5 come along at once. This happens because passengers getting on the front bus slow it down. The lack of passengers on the later busses speed them up causing a bunching effect.

The same problem can exist in threading, especially with threads contending for resources. If you have threads predictably alternating between tasks, for example reading from one disk then writing to another, then they may tend to bunch together rather than stochastic disperse as you may expect. So one resource may slow the use of another. For this reason it can sometimes be better to further subdivide the tasks of a thread.

cgroups

I'll avoid going into too much detail. But I should mention that Linux has a capability called "cgroups" which allow you to group processes and limit their collective resources. This can be very useful in further performance tuning.

There's a short discussion of them here. But I would advise you to spend a bit of time on google to see their full capability because they may help you in the long run.

1

You may be going about it the wrong way. Are you doing simple synchronous IO?

Two approaches are:

The Apache way: synchronous IO, one process per connection, process pools to avoid creating and destroying tasks. This is easy to code, allows powerful features, many connections per second, but small number of simultaneous connections.

The Nginx way: asynchronous IO, one process per core. This uses the pole system call, to wait for connections, data packets, and disk IO. If make some functionality harder to code (state must be stored explicitly if needed), but it can have much more simultaneous connections.

Both theses allow the OS to balance the threads, and get the most out of the cores, disk, and network. If you go for one thread per core, and synchronous IO, then you will end up with most of your cores idle, most of the time.

Look up the select, poll and epoll system calls: