Friday, January 27, 2023

CPU affinity in linux with taskset

taskset sets the CPU affinitity on a process. For example, to run bash on only processors 0-7:
$ taskset -c 0-7 bash

Its man page says:

The taskset command is used to set or retrieve the CPU affinity of a running process given its pid, or to launch a new command with a given CPU affinity.

You use this to assign specific programs to specific CPUs on multi-core or multi-cpu systems. If you ran across this command by accident, you’d be forgiven for not thinking much more about it.

Frequency scaling

Where it gets interesting is when you use taskset to (indirectly) control your CPU’s clock frequency.

Modern CPUs have the ability to dynamically overclock themselves to increse performance. Intel calls this Turbo Boost. The Core i7 8700, a (now modest) 6-core CPU from 2017, runs at a base frequency of 3.2GHz.

Using Turbo Boost, it can run as high as 4.6GHz. As an indicator of overall processor performance, clock speed is less important than maybe it once was. But increasing clock speed still does increase performance - and sometimes quite a bit.

But there are tradeoffs. A higher clock rate uses more power and generates more heat. It’s less efficient overall. Hardware hackers in the old days would get around this with custom cooling, like submerging their entire computers in mineral oil.

Fortunately we don’t need to do that kind of thing anymore to take advantage of overclocking. But, there are limits to how much Turbo Boost will overclock your CPU. The 8700 will only give you the full 4.6GHz with only one core at a time. If you start using more CPU cores, the boost gets lower:

Cores Maximum boost
1 core 4.6 GHz
2 cores 4.5 GHz
4-6 cores 4.3 GHz

Here’s a real example to see all of this work. You can see get clock speed with /proc/cpuinfo. This is on a Xeon E5-1620 v3 (a 4-core, 8-thread CPU that boosts to 3.6 GHz):

$ taskset -c 0 python3 -c 'while True: pass' &   # run a busy loop
[1] 3953427
$ grep MHz /proc/cpuinfo
cpu MHz		: 3591.663
cpu MHz		: 1200.000
cpu MHz		: 1200.000
cpu MHz		: 1287.102
cpu MHz		: 1200.000
cpu MHz		: 1197.529
cpu MHz		: 1197.861
cpu MHz		: 1200.000
$ kill %1  # don't forget to turn off the python process!

The idle cores are all running to 1.2GHz to save power (and heat). But notice how the first CPU is running at 3591 MHz (about 3.6GHz)? That’s Turbo Boost in action. The CPU sees a processor-intensive job running on cpu0. And with no other demands on the CPU, it boosts the clock speed on that core to maximum to get the best performance.

In this case, it’s doing nothing really fast. But the CPU doesn’t know that.

Processors, cores, and threads

Now you might be seeing where taskset comes in. Suppose you have a CPU-intensive program, and you want to give it the highest possible performance.

Usage is something like this:

$ taskset -c 0 program

That will make linux schedule program only on processor 0. As program gets more CPU intensive, Turbo Boost will increase the clock frequency accordingly.

On many CPUs, each core has two threads. This means you can usually get more performance for “free” and stay at the same clock speed. There is an additional benefit - both threads on the same core share the same cache, allowing code to run faster. For maximum performance, if you allocate more than one CPU to a program, make sure to allocate threads in pairs.

Figuring out which threads are pairs is not quite as easy as you might think. Fortunately, /proc/cpuinfo can help with this too. In addition to clock speed, it also displays which processor is associated with which core.

Here’s an example on a Core i7 4790K:

$ grep -E 'processor|core id' /proc/cpuinfo | paste -d " " - -
processor	: 0 core id		: 0
processor	: 1 core id		: 1
processor	: 2 core id		: 2
processor	: 3 core id		: 3
processor	: 4 core id		: 0
processor	: 5 core id		: 1
processor	: 6 core id		: 2
processor	: 7 core id		: 3

Check out this stackoverflow answer showing how to join line
pairs using paste.

The 4790K is a 4 core processor with 8 threads. Processor #0 runs on core 0, and it’s hyper-threaded companion is processor #4. With tasket you can specify multiple processors by separating them with commas, like this:

$ taskset -c 0,4 program

If you want to allow your program to use core 0 and core 1 and their respective hyper-threads, you can indicate a range of processors using a dash:

$ taskset -c 0-1,4-5 program

This would use two cores, so the boost would be 4.5 GHz instead of 4.6 GHz.

Be sure to check /proc/cpuinfo, because CPU numbering might not follow the pattern you expect.

On a multi-cpu system, especially a Xeon processor that might have dozens of cores, /proc/cpuinfo might be a little too much output to easily digest. You can get the same information with lscpu. Here’s an example on a machine with two Xeon E5-2660 v4 CPUs:

$ lscpu | grep NUMA
NUMA node(s):                       2
NUMA node0 CPU(s):                  0-13,28-41
NUMA node1 CPU(s):                  14-27,42-55

These are two Xeon CPUs with two threads per core. But cpu0's cores number from 0-13 and 28-41. To restrict a program using both threads on a single core on this CPU, you’d do this:

$ taskset -c 0,28 program

Wrapping up

Knowing how to properly wield taskset will also help you tune performance for any CPU intensive programs - rendering in blender, model training in PyTorch, or dealing with large datasets in QGIS, not to mention games.

In the old days (before 2016), when the majority of consumer CPUs had two or four cores and not all had multiple threads, you could just let the boost do its thing and not worry about it too much.

Today, even consumer CPUs have lots of cores. Newer CPUs might even have “performance” cores and “efficiency” cores. You might find that some applications perform better on fewer cores and a higher clock speed. Or, you might find that by dedicating cores to an application and reserving a few cores for background tasks improves overall performance too.

Also, many modern CPUs are building in cores with different performance classes, efficiency cores and performance cores. In those cases you might keep your programs running on the efficiency cores to use less power, or keep them on the performance cores force a bump in speed.

There will be a lot of variation, as every application has its own unique performance characteristics, and each CPU is different.

Wikichip is a good resource to look up your CPU’s turbo frequencies.


No comments: