86

Looking at the source of strace I found the use of the clone flag CLONE_IDLETASK which is described there as:

#define CLONE_IDLETASK 0x00001000 /* kernel-only flag */

After looking deeper into it I found that, although that flag is not covered in man clone it is actually used by the kernel during the boot process to create idle processes (all of which should have PID 0) for each CPU on the machine. i.e. a machine with 8 CPUs will have at least 7 (see question below) such processes "running" (note quotes).

Now, this leads me to a couple of question about what that "idle" process actually do. My assumption is that it executes NOP operation continuously until its timeframe ends and the kernel assigns a real process to run or assign the idle process once again (if the CPU is not being used). Yet, that's a complete guess. So:

  1. On a machine with, say, 8 CPUs will 7 such idle processes be created? (and one CPU will be held by the kernel itself whilst no performing userspace work?)

  2. Is the idle process really just an infinite stream of NOP operations? (or a loop that does the same).

  3. Is CPU usage (say uptime) simply calculated by how long the idle process was on the CPU and how long it was not there during a certain period of time?


P.S. It is likely that a good deal of this question is due to the fact that I do not fully understand how a CPU works. i.e. I understand the assembly, the timeframes and the interrupts but I do not know how, for example, a CPU may use more or less energy depending on what it is executing. I would be grateful if someone can enlighten me on that too.

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232
grochmal
  • 8,657
  • 20
    I had to resist the temptation to just write "Nothing at all" when I saw the title. – Vality Apr 26 '17 at 03:22
  • 4
    Most modern CPU's will dynamically lower their clock rate and power consumption when idling or under low load (dynamic frequency scaling, e.g. SpeedStep for Intel CPU's). If you overclock a CPU, it'll usually disable this behavior, causing the CPU to maintain max clock rate even when idling. – Nat Apr 26 '17 at 10:55
  • 2
    See also "ACPI power states": there are various ways in which a processor can stop executing instructions but still be wakeable. – pjc50 Apr 26 '17 at 11:18

3 Answers3

104

The idle task is used for process accounting, and also to reduce energy consumption. In Linux, one idle task is created for every processor, and locked to that processor; whenever there’s no other process to run on that CPU, the idle task is scheduled. Time spent in the idle tasks appears as “idle” time in tools such as top. (Uptime is calculated differently.)

Unix seems to always have had an idle loop of some sort (but not necessarily an actual idle task, see Gilles’ answer), and even in V1 it used a WAIT instruction which stopped the processor until an interrupt occurred (it stood for “wait for interrupt”). Some other operating systems used busy loops, DOS, OS/2, and early versions of Windows in particular. For quite a long time now, CPUs have used this kind of “wait” instruction to reduce their energy consumption and heat production. You can see various implementations of idle tasks for example in arch/x86/kernel/process.c in the Linux kernel: the basic one just calls HLT, which stops the processor until an interrupt occurs (and enables the C1 energy-saving mode), the other implementations handle various bugs or inefficiencies (e.g. using MWAIT instead of HLT on some CPUs).

All this is completely separate from idle states in processes, when they’re waiting for an event (I/O etc.).

Stephen Kitt
  • 434,908
  • 3
    Heh, i see it now, thanks. play_dead() is a very nice mnemonic name for executing HALT. Wouldn't there be a risk to send HALT to every CPU and consequently hang? (i.e. reaching that situation, HALT every CPU, would be a bug in the kernel correct?) – grochmal Apr 25 '17 at 19:09
  • 31
    The CPU wakes up from HALT via an interrupt. – Johan Myréen Apr 25 '17 at 19:14
  • 1
    @JohanMyréen - Cool, that makes sense. In such a case even an IRQ interrupt from an input device would wake it back up. Thanks. – grochmal Apr 25 '17 at 20:25
  • 17
    Or more reliably, the timer interrupt... (Tickless handling is another kettle of fish.) – Stephen Kitt Apr 25 '17 at 20:31
  • 1
    The PDP11 had a WAIT instruction that did nothing but wait for an interrupt to happen. – user207421 Apr 26 '17 at 00:07
  • 3
    @EJP indeed, it's a pretty common instruction, even though it has different names in different architectures. – user253751 Apr 26 '17 at 00:31
  • @StephenKitt tickless is not timerless. it's just variable durations, that's why it's called dyntick. – v.oddou Apr 26 '17 at 06:11
  • @v.oddou The Linux kernel can be configured so that all CPUs except the boot CPU omit clock ticks completely if the CPU is idle. Unless some other interrupt wakes up the CPU, it will sleep forever. – Johan Myréen Apr 26 '17 at 07:11
  • @v.oddou dyntick isn’t tickless, that’s why there are three different configurations for NO_HZ. You might like to read NO_HZ.txt in the kernel documentation. – Stephen Kitt Apr 26 '17 at 08:02
  • "In the past, idle tasks ... busy loops" - Do you actually know of any Unixes that ran on CPUs lacking a HALT-equivalent? I'm thinking this "past" might have predated Unix. – Lyle Apr 27 '17 at 16:46
  • What does an idle CPU process do? - The robot devil's work – david.barkhuizen Apr 27 '17 at 16:49
  • @Lyle you’re right, even V1 Unix used a WAIT instruction. I was thinking of other less sophisticated operating systems (DOS specifically, although that didn’t have an idle task, but it did busy-wait when idle). – Stephen Kitt Apr 27 '17 at 16:55
  • "All this is completely separate from idle states in processes, when they’re waiting for an event (I/O etc)" But it is a direct analogue, right? – Lightness Races in Orbit Apr 28 '17 at 12:23
  • @BoundaryImposition what kind of analogy are you thinking of? When a running process becomes idle, that allows other runnable processes to be scheduled if there are any, and if there aren’t, the idle process is scheduled. So I suppose there’s a sort of analogy: a process becoming idle deschedules itself, all processes being idle means the idle process runs and deschedules the CPU. – Stephen Kitt Apr 28 '17 at 12:39
  • @StephenKitt: The analogy is that while a process is "idling", waiting for the OS to wake it back up, it's not doing anything (e.g. spinwaiting), thus saving resources. By comparison, when the OS has a CPU idling, it's waiting for an interrupt to wake it back up and otherwise not doing anything. Same approach for the same effect, just at a different layer of abstraction. You could model the idle CPU in the computer, as an idle process in an OS! – Lightness Races in Orbit Apr 28 '17 at 14:02
  • @BoundaryImposition right, so that is indeed a direct analogy. (The sentence you were reacting to was referring to Johann Myréen’s answer.) – Stephen Kitt Apr 28 '17 at 14:08
  • @StephenKitt Yeah, DOS didn't really have a need to idle. Unix originated on multi-user machines where idle users would be wasting the CPU. DOS was made for desktops and CPUs of the time were designed to run at 100% all the time, so there was little point in anything but busy waiting (of course, the CPU still didn't run at 100% - the busywaits were simple). HLT on Intel CPUs wasn't inteded for power savings or such until 1994; Windows NT/95 already supported proper waits. It's a gotcha when you try to run DOS on a modern machine though - many modern CPUs are very unhappy about running 100% :D – Luaan Nov 26 '19 at 11:15
  • @Luaan indeed, when DOS was designed none of this was much of a concern for a single-tasking system. HLT however did enable power-saving earlier than 1994; with the Intel 386SL (1991), HLT asserts HALT#, and the chipset then asserts STPCLK#, so HLT really stops the CPU and saves power. Also, MS-DOS 6.22 included POWER.EXE, which supported HLT-on-idle, and Windows 95/98 didn’t HLT-on-idle by default ;-) (hence the popularity of Andreas Goetz’s CpuIdle and Marton Balog’s DOSidle). – Stephen Kitt Nov 26 '19 at 13:10
  • @StephenKitt In theory, that worked. In practice, I remember the power savings were something like 5% on an older 386. It wasn't until the DX that it actually became useful (486 DX2 could actually turn off almost completely). Intel CPUs weren't the only ones that behaved this way either; it just wasn't all that useful for desktop computers back then. Hah, yeah, I remember power.exe, though I don't know anyone who actually used it :D – Luaan Nov 26 '19 at 19:55
  • @Luaan right, it wasn’t useful on desktops; but I was thinking of laptops, which is where the 386SL was found (and made a large difference in power consumption). POWER.EXE was quite useful on laptops, especially once APM was available. Powering down CPUs is older than lowered power use on HLT; the 80C88 could do that, and it was put to good use e.g. in the Atari Portfolio... – Stephen Kitt Nov 27 '19 at 21:12
61

In the textbook design of a process scheduler, if the scheduler doesn't have any process to schedule (i.e. if all the processes are blocked, waiting for input), then the scheduler waits for a processor interrupt. The interrupt may indicate input from a peripheral (user action, network packet, completed read from a disk, etc.) or may be a timer interrupt that triggers a timer in a process.

Linux's scheduler doesn't have special code for a nothing-to-do case. Instead, it encodes the nothing-to-do case as a special process, the idle process. The idle process only gets scheduled when no other process is schedulable (it effectively has an infinitely low priority). The idle process is in fact part of the kernel: it's a kernel thread, i.e. a thread that executes code in the kernel, rather than code in a process. (More precisely, there's one such thread for each CPU.) When the idle process runs, it performs the wait-for-interrupt operation.

How wait-for-interrupt works depends on the processor's capabilities. With the most basic processor design, that's just a busy loop —

nothing:
    goto nothing

The processor keeps running a branch instruction forever, which accomplishes nothing. Most modern OSes don't do this unless they're running on a processor where there's nothing better, and most processors have something better. Rather than spend energy doing nothing except heating the room, ideally, the processor should be turned off. So the kernel runs code that instructs the processor to turn itself off, or at least to turn off most of the processor. There must be at least one small part that stays powered on, the interrupt controller. When a peripheral triggers an interrupt, the interrupt controller will send a wake-up signal to the main (part of) the processor.

In practice, modern CPUs such as Intel/AMD and ARM have many, complex power management settings. The OS can estimate how long the processor will stay in idle mode and will choose different low-power modes depending on this. The modes offer different compromises between power usage while idle, and the time it takes to enter and exit the idle mode. On some processors the OS can also lower the clock rate of the processor when it finds that processes aren't consuming much CPU time.

  • 5
    Note that even the most very basic embedded CPUs like AVR-based microcontrollers have a WFI (Wait For Interrupt) instruction, even though that instruction may be equivalent to NOP depending on the exact model. – Jonas Schäfer Apr 26 '17 at 12:30
  • @JonasWielicki I thought you'd usually just go into a tight loop if you had nothing to do, or you could go into a low-power state and wait for the interrupt to knock you out of it (lower power states usually requiring more "metal" interrupts). – Nick T Apr 26 '17 at 20:40
  • @NickT I assume that WFI (if not aliased to NOP in the core, as some smaller models do) has shorter interrupt latencies compared to a busy loop because the CPU can prepare itself for entering the interrupt handler right away (it will be the next thing it does anyways). Probably depends a lot on the use-case: If you’re doing anything real-timey, looping around WFI seems sensible; if you’re after powersaving, entering a sleep state is better. – Jonas Schäfer Apr 27 '17 at 06:02
  • 1
    @JonasWielicki Architectures designed for embedded systems care about power management so WFI is important there. Many older architectures have no such thing. The original 8086 architecture didn't, AFAIR. Does 68k have WFI? Is it a standard feature on MIPS? My familiarity with low-level programming is mostly on ARM, where low power consumption is a matter of course and WFI is only the tip of the power management iceberg. – Gilles 'SO- stop being evil' Apr 27 '17 at 07:56
  • 1
    @Gilles 8086 had a halt instruction. See https://en.m.wikipedia.org/wiki/HLT_(x86_instruction) The instruction included power saving functionality only since 80486 DX4. Looking back into history HLT was already in 8080 and derivates (like Z80). – pabouk - Ukraine stay strong Apr 27 '17 at 12:19
  • 1
    @pabouk HLT could power down SL variants of the 386 and 486, before the DX4 came out (the Wikipedia article is incorrect). – Stephen Kitt May 02 '17 at 18:50
  • I'd like to point out that "power down" is a relative concept. The original 8086 was manufactured using a process Intel called HMOS which originally was just a variant of NMOS. A property of NMOS is that a logic gate dissipates power even in the steady state, so cutting off the clock signal doesn't help much. There was not much that the 8086 HLT could do to enter a power saving mode. – Johan Myréen May 03 '17 at 06:44
  • A perfect answer. This should be the accepted answer. It's quite amazing. – KeyC0de Sep 19 '17 at 11:49
  • Actually, that is not "the textbook design". Douglas Comer's textbooks on Xinu employ a null thread. And Xinu is not the only textbook operating system to do so. – JdeBP Nov 26 '19 at 16:16
  • @Gilles'SO-stopbeingevil' The 68000 does have this with the STOP instruction that halts the processor until an interrupt occurs. See The 68000's Instruction Set, pp. 51-2. – Alex Hajnal Apr 20 '20 at 15:07
-1

No, an idle task does not waste CPU cycles. The scheduler simply does not select an idle process for execution. An idle process is waiting for some event to happen so that it can continue. For example, it can be waiting for input in a read() system call.

By the way, the kernel is not a separate process. Kernel code is always executed in the context of a process (well, except for the special case of a kernel thread), so it's not correct to say "and one CPU will be held by the kernel itself whilst no performing userspace work".

Johan Myréen
  • 13,168
  • 3
    Hmmm... I don't think that is the kind of idle process that is created by CLONE_IDLETASK. Had it been then it would not need to be created at all, i.e. had the scheduler ignored the kernel idle processes on the CPUs it would not need create any processes for them during boot. (the DW is not mine though :) ) – grochmal Apr 25 '17 at 18:27
  • A little googling reveals that CLONE_IDLETASK is a kernel-internal flag that was introduced around kernel version 2.5.14 in 2002 and later removed in 2004. – Johan Myréen Apr 25 '17 at 18:43
  • "an" idle process but not "the" idle process. – user253751 Apr 26 '17 at 00:32
  • 2
    I once had a manager who noticed (on Windows) that the Idle Process was consuming 98% of CPU. She proposed a project to write a more efficient Idle Process. – Paul_Pedant Feb 28 '20 at 11:40