8

As far as I know in Linux kernel,

  • the structure task_struct represents threads i.e. light weight processes, but not processes.

  • processes are not represented by any structure, but by groups of threads sharing the same thread group id.

So is the following from Operating System Concepts correct?

Linux also provides the ability to create threads using the clone() system call. However, Linux does not distinguish between processes and threads. In fact, Linux uses the term task —rather than process or thread— when referring to a flow of control within a program.

What does it mean?

Thanks.

Related How does Linux tell threads apart from child processes?

Tim
  • 101,790
  • 1
    It is a long fight in the Linux world, more exactly: clone() syscall is a Linux invention. It is essentially a fork() while some data structures of the child process (memory map, file descriptors, signal stack and so on) can remain the same (not copy! Same by address!) as the parent. It is a very different thing than the Posix standard says (essentially, there is a separate threads api). Now, after a decade the smoke disappeared from the battlefield of the mailing lists, this is the state what you can see. It is not so bad - what is bad, that threads are still not so lightweight on Linux. – peterh Dec 30 '18 at 22:14
  • Essentially, we still have clone(), the posix threads library was included into the glibc, but there is a minimal thread support in the kernel. As far I know, it is still not N:M threading, which is still bad, bad, bad. Although real N:M threads are probably incompatible with the essence of the Unix API (where kernel calls are still synchronous). – peterh Dec 30 '18 at 22:26
  • I think, there is some... "committee" behind the Posix API, they should embrace the asynchronous kernel calls ASAP. Now they won't do it, saying "we have threads", which is a crap reason :-( I think it is the largest structural danger for the future of the Linux now. – peterh Dec 30 '18 at 22:28
  • 1
    https://meta.stackexchange.com/questions/230676/ – JdeBP Dec 30 '18 at 22:46
  • @JdeBP I didn't post it as an answer, because I am not sure if it answers the question. The comments contain also a lot of my opinions, and I won't face the downs of others having an opposite opinion. My goal was only to give some background to the OP if no useful answer arrives. – peterh Dec 31 '18 at 01:54
  • It's not quite right to say it doesn't distinguish thread and process, you can use PID in many syscall, where non-main TID is not allowed. But Linux surely doesn't have particular structure for process, it DO has the concepts exposed to userland, but internally implements them in a weird way, causing amazing result that so many things can be not shared between threads in one process. – 炸鱼薯条德里克 Dec 31 '18 at 02:38
  • Have a look at this answer for a clear and short explanation of the "current" (pthread) Linux process/thread model that demystifies this matter. – Totor Nov 11 '22 at 13:10

2 Answers2

4

Linux also provides the ability to create threads using the clone() system call. However, Linux does not distinguish between processes and threads. In fact, Linux uses the term task —rather than process or thread— when referring to a flow of control within a program.

We need to distinguish between the actual implementation and the surface you see.

From user (system software developer) point of view there is a big difference: threads share a lot of common resources (e.g. memory mappings - apart from stack, of course - file descriptors).

Internally (warning: imprecise handwaving arguments) the Linux kernel1) is using what it has at hand, i.e. the same structure for processes and for threads, where for threads of a single process it doesn't duplicate some things rather it references a single instance thereof (memory map description).

Thus on the level of directly representing a thread or a process there is not much difference in the basic structure, the devil lies in how the information is handled.

You may as well be interested in reading Are threads implemented as processes on Linux?


1) Remember that "Linux" these days stands mostly for the whole OS, while in fact it only is the kernel itself.

peterph
  • 30,838
  • 1
    Remember that the word “Linux” is confusing, it is a kernel, but sometimes refers to the whole system, and some times to the whole system minus Linux. see WSL for an example. – ctrl-alt-delor Jan 01 '19 at 16:28
3

Linux threads are implemented as a separate process but sharing the same address space as other threads. By default they are hidden in the ps command, but can be seen with the -L flag.

For example:

% ps -fp 2642
UID        PID  PPID  C STIME TTY          TIME CMD
polkitd   2642     1  0 Dec09 ?        00:00:48 /usr/lib/polkit-1/polkitd --no-d

% ps -fLp 2642
UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
polkitd   2642     1  2642  0    7 Dec09 ?        00:00:18 /usr/lib/polkit-1/pol
polkitd   2642     1  2680  0    7 Dec09 ?        00:00:00 /usr/lib/polkit-1/pol
polkitd   2642     1  2683  0    7 Dec09 ?        00:00:30 /usr/lib/polkit-1/pol
polkitd   2642     1  2685  0    7 Dec09 ?        00:00:00 /usr/lib/polkit-1/pol
polkitd   2642     1  2687  0    7 Dec09 ?        00:00:00 /usr/lib/polkit-1/pol
polkitd   2642     1  2688  0    7 Dec09 ?        00:00:00 /usr/lib/polkit-1/pol
polkitd   2642     1  2692  0    7 Dec09 ?        00:00:00 /usr/lib/polkit-1/pol

We can see that polkitd really consists of 7 threads. They all have the same process ID, but different thread IDs (LWP) but show up as different processes in the ps listing, because they are different processes in the kernel.

This can have impact, e.g. on ulimit constraints. A common issue on RedHat and derivatives is that the default PAM configuration limits your processes

% cat /etc/security/limits.d/20-nproc.conf 
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.

*          soft    nproc     4096
root       soft    nproc     unlimited

On heavy java web apps the number of processes can exceed this and cause application failures.

  • The question is not about how ps massages information for user consumption. It is about what the kernel deals in. None of this answers the actual question of whether the book explanation is correct. – JdeBP Dec 30 '18 at 22:51
  • There are potentially three questions in the post; "Does Linux not distinguish between processes and threads?", "So is the following from Operating System Concepts correct?", "what does it mean?". I answered the first question, and used ps to show it, and pointed out a consequence. – Stephen Harris Dec 30 '18 at 23:38
  • I thought they don't have to share the memory space, just sharing the same tgid and namespace. Correct me if I'm wrong. – 炸鱼薯条德里克 Dec 31 '18 at 02:26
  • @炸鱼薯条德里克 They don't have to share anything. clone(2) has a lot of options. CLONE_VM allows parent and child to run in the same memory space. POSIX threads (pthreads(7)) says threads share same global memory space, but their own stack. – Stephen Harris Dec 31 '18 at 02:35
  • Yeah, I know, I'm just talking about Linux, not POSIX. But I vaguely remember that threads in one process can't stay in different cgroups, or different pid_namespace – 炸鱼薯条德里克 Dec 31 '18 at 02:41
  • @炸鱼薯条德里克 It looks like glibc uses const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM | CLONE_SIGHAND | CLONE_THREAD | CLONE_SETTLS | CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID | 0); when creating threads, as per sysdeps/unix/sysv/linux/createthread.c – Stephen Harris Dec 31 '18 at 02:59
  • No, you didn't actually answer that either. You showed what ps does. You did not show what Linux does. The question asked about tasks. There is not one mention of tasks in this answer. – JdeBP Dec 31 '18 at 12:10
  • @JdeBP the first line answers that: "Linux threads are implemented as a separate process but sharing the same address space as other threads. " – Stephen Harris Dec 31 '18 at 13:31