Why is /dev/null a file? Why isn't its function implemented as a simple program?

Question

I am trying to understanding the concept of special files on Linux. However, having a special file in /dev seems plain silly when its function could be implemented by a handful of lines in C to my knowledge.

Moreover you could use it in pretty much the same manner, i.e. piping into null instead of redirecting into /dev/null. Is there a specific reason for having it as a file? Doesn't making it a file cause many other problems like too many programs accessing the same file?

Incidentally, much of this overhead is also why cat foo | bar is much worse (at scale) than bar <foo. cat is a trivial program, but even a trivial program creates costs (some of them specific to FIFO semantics -- because programs can't seek() inside FIFOs, for example, a program that could be implemented efficiently with seeking can end up doing much more expensive operations when given a pipeline; with a character device like /dev/null it can fake those operations, or with a real file it can implement them, but a FIFO doesn't allow any kind of contextually-aware handling). — Charles Duffy, Apr 16 '18 at 19:46
grep blablubb file.txt 2>/dev/null && dosomething could not work with null being a program or a function. — rexkogitans, Apr 16 '18 at 20:39
You might find it enlightening (or at least mind-expanding) to read about the Plan 9 operating system to see where the "everything is a file" vision was going - it becomes a little easier to see the power of having resources available as file paths once you see a system fully embracing the concept (rather than mostly/partially, like modern Linux/Unix do). — mtraceur, Apr 16 '18 at 21:06
As well as no-one pointing out that a device driver running in kernel space is a program with "a handful of lines of C", none of the answers so far have actually addressed the supposition of "too many programs accessing the same file" in the question. — JdeBP, Apr 17 '18 at 08:00
@JdeBP Probably that's because nobody knows what "too many programs accessing the same file" is supposed to mean. — glglgl, Apr 17 '18 at 11:48
Re "its function could be implemented by a handful of lines in C": You wouldn't believe it, but it is implemented by a handful of lines in C! For example, the body of the read function for /dev/null consists of a "return 0" (meaning it doesn't do anything and, I suppose, results in an EOF): (From static https://github.com/torvalds/linux/blob/master/drivers/char/mem.c) ssize_t read_null(struct file *file, char __user *buf, size_t count, loff_t *ppos) { return 0; } (Oh, I just see that @JdeBP made that point already. Anyway, here is the illustration :-). — Peter - Reinstate Monica, Apr 17 '18 at 16:39
I doubt that very much. It's fairly easy to understand what "too many programs accessing the same file" means. Addressing the question that the supposition forms part of seems fairly simple. — JdeBP, Apr 17 '18 at 18:32
One of the things Unix is famous for is reducing several seemingly unrelated interfaces to a single file interface. A file is just a source of or destination for bytes; the OS doesn't care what the interpretation of a particular read or write is. This simplified the implementation of Unix greatly compared to previous operating systems, which had to provide distinct interfaces to things like files, printers, terminals, and yes, bit buckets. — chepner, Apr 18 '18 at 17:01
I am not sure how you propose that simple program should work that you want to use instead of /dev/null in the places where programs expect a file to write to. — PlasmaHH, Apr 19 '18 at 08:24
@JdeBP it might be easy for you, but not for everyone. Foe example,the only real problems with many programs accessing the same file I can see is file consistency, and that does not apply at all for /dev/null (as it does not store any data, but simply disregards it). Also, increased resource usage of extra one file descriptor is infinitesimally small compared to resource usage even of small program using it, so that can't be problem either. So, please explain what you think is a problem with many programs accessing /dev/null? — Matija Nalis, Apr 19 '18 at 09:35
@glglgl My guess is they have something in mind akin to the Windows "file is already in use" error. — Izkata, Apr 19 '18 at 23:56
Also, don't forget that a special null constant interferes with the file namespace. Sometimes when a Windows user gives me eg. a USB dongle for some data I will add a file called nul on it as well - deleting that is not an easy task: https://stackoverflow.com/questions/17883481/delete-a-file-named-nul-on-windows — j_kubik, Apr 22 '18 at 21:40

score 159 · Accepted Answer · edited Apr 20 '18 at 07:40

159

In addition to the performance benefits of using a character-special device, the primary benefit is modularity. /dev/null may be used in almost any context where a file is expected, not just in shell pipelines. Consider programs that accept files as command-line parameters.

# We don't care about log output.
$ frobify --log-file=/dev/null

# We are not interested in the compiled binary, just seeing if there are errors.
$ gcc foo.c -o /dev/null  || echo "foo.c does not compile!".

# Easy way to force an empty list of exceptions.
$ start_firewall --exception_list=/dev/null

These are all cases where using a program as a source or sink would be extremely cumbersome. Even in the shell pipeline case, stdout and stderr may be redirected to files independently, something that is difficult to do with executables as sinks:

# Suppress errors, but print output.
$ grep foo * 2>/dev/null

edited Apr 20 '18 at 07:40

dessert

1,687

answered Apr 16 '18 at 20:55

ioctl

1,306

14

Also you do not use /dev/null just in the shell's commands. You can use it in other parameters supplied to a software --- for example in configuration files. --- Fort the software this is very convenient. It does not need to make a difference between /dev/null and a regular file. – pabouk - Ukraine stay strong Apr 17 '18 at 08:48
I'm not sure I understand the part about separate redirections to sink executables being difficult. In C, you just do a pipe, fork, and execve like any other process piping, just with changes to the dup2 calls that set up the connections, right? It's true most shells don't offer the prettiest ways to do that, but presumably if we didn't have so much device-as-file pattern and most of the things in /dev and /proc were treated as executables, shells would have been designed with ways to do it as easily as we redirect now. – aschepler Apr 17 '18 at 23:43
6

@aschepler It's that not redirecting to sink executables is difficult. It's that writing applications that can write to/read from both files and the null sink would be more complicated if the null sink wasn't a file. Unless you're talking about a world where instead of everything being a file, everything is an executable? That'd be a very different model than what you have in *nix OS. – Cubic Apr 18 '18 at 14:02
@Cubic Yes, that's clear. I was more asking about the last paragraph in this post. "stdout and stderr may be redirected to files independently" is contrasted with something hypothetical, but it's not entirely clear exactly what. I jumped at first to "... redirected to sinks independently", but maybe that's not the intent? If it's meant to be about the generality between regular files and special files, it doesn't help that the following example doesn't involve any regular file. – aschepler Apr 18 '18 at 22:22
1

@aschepler You forgot wait4! You are correct, it is certainly possible to pipe stdout and stderr to different programs using POSIX apis, and it may be possible to invent a clever shell syntax for redirecting stdout and stderr to different commands. However, I'm not aware of any such shell right now, and the larger point is that /dev/null fits neatly into existing tooling (which largely works with files), and /bin/null wouldn't. We could also imagine some IO API that makes it as easy for gcc to (securely!) output to a program as to a file, but that's not the situation we are in. – ioctl Apr 19 '18 at 01:14
2

@ioctl regarding shells; both zsh and bash at least will allow you to do things like grep localhost /dev/ /etc/hosts 2> >(sed 's/^/STDERR:/' > errfile ) > >(sed 's/^/STDOUT:/' > outfile), resulting is separately processed errfile and outfile – Matija Nalis Apr 19 '18 at 10:17
...and of course the >(...) syntax just expands to a special filename (you're actually running 2> /proc/x/fd/y), which neatly loops back to what ioctl wrote in the answer. – u1686_grawity Apr 19 '18 at 14:45
Also, even once you have your shell able to pipe into programs for you, or if you think spawning a subprocess isn't particularly complicated, you're still communicating with the process using a file handle anyway; so all you've added is a bunch of different steps to getting a writeable stream. I don't think it really matters whether those different steps are done in the shell or in the driver for the null device, you're not going to use it any different than a file anyway – millimoose Apr 23 '18 at 11:57

DopeGhoti · Answer 2 · 2018-04-19T19:24:57.677

61

In fairness, it's not a regular file per se; it's a character special device:

$ file /dev/null
/dev/null: character special (3/2)

It functioning as a device rather than as a file or program means that it's a simpler operation to redirect input to or output from it, as it can be attached to any file descriptor, including standard input/output/error.

edited Apr 19 '18 at 19:24

answered Apr 16 '18 at 15:56

DopeGhoti

76,081

Thanks for the answer, it was certainly informative. Could you elaborate a bit more on why a program couldn't be used in its place in redirection? – Ankur S Apr 16 '18 at 16:15
3

A program could be used in its place for indirection, but redirection (e. g. cat file > /dev/null) would overwrite the executable with the contents of file rather than redirecting the output. – DopeGhoti Apr 16 '18 at 16:20
24

cat file | null would have a lot of overhead, first in setting up a pipe, spawning a process, executing "null" in the new process, etc. Also, null itself would use quite a bit of CPU in a loop reading bytes into a buffer that is later just discarded... The implementation of /dev/null in the kernel is just more efficient that way. Also, what if you want to pass /dev/null as an argument, instead of a redirection? (You could use <(...) in bash, but that's even way heavier handed!) – filbranden Apr 16 '18 at 16:31
4

If you had to pipe to a program named null Instead of using redirection to /dev/null, would there be a simple, clear way to tell the shell to run a program while sending just its stderr to null? – Mark Plotnick Apr 16 '18 at 19:09
@AnkurS: If you want to do that, you can write cat file | true. – user1024 Apr 16 '18 at 20:49
@ Ankur S: Re "I had cat file | null more in mind", so you're a perfect typist, and never accidentally type '>' when you meant '|'? – jamesqf Apr 16 '18 at 22:48
5

This is a really expensive setup for an overhead demonstration. I would suggest using /dev/zero instead. – chrylis -cautiouslyoptimistic- Apr 16 '18 at 23:19
2

@MarkPlotnick: Presumably shell grammar would have evolved syntax like foo 2>(bar) or foo 2|(bar). (In case you were wondering, in current bash 4.4, echo 2>(cat) prints 2/dev/fd/63, because bash does process substitution, but echo doesn't treat its args as filenames. (And concatenating a leading 2 doesn't help).) – Peter Cordes Apr 17 '18 at 02:33
20

Those examples are wrong. dd of=- writes to a file called -, just omit the of= to write to stdout as that's where dd writes by default. Piping to false wouldn't work as false doesn't read its stdin, so dd would be killed with a SIGPIPE. For a command that discards its input you can use... cat > /dev/null. Also the comparison who probably be irrelevant as the bottle neck would probably be the random number generation here. – Stéphane Chazelas Apr 17 '18 at 08:58
@StéphaneChazelas: To be fair, using cat > /dev/null isn't totally accurate, because as well as reading its input, it will write it out, so it will make roughly twice as many syscalls as a null process that just read and discarded data. Your other points are valid (and important) though! – psmears Apr 17 '18 at 10:19
@psmears, well it does discard its input even though not in the most efficient way. You'll probably find sed d or awk 0 or perl -ne '' which discard without writng to /dev/null are actually still less efficient as they have extra overhead of their own. – Stéphane Chazelas Apr 17 '18 at 10:23
@StéphaneChazelas: True - that's why I didn't suggest them as an alternative :) – psmears Apr 17 '18 at 11:40
2

@psmears, any any way, at least on Linux, the most effective way to write such a null command would probably to use splice() onto /dev/null to at least avoid having to copy the content of the pipe back into user space as would happened if you used read() to flush the content of the pipe. – Stéphane Chazelas Apr 17 '18 at 11:45
8

The AST versions of dd etc. don't even bother doing a write syscall when they detect the destination is /dev/null. – Mark Plotnick Apr 17 '18 at 11:48
2

@StéphaneChazelas: Yes, indeed :) Unfortunately GNU cat doesn't use splice() (I know it's Linux-specific, but other GNU tools do take advantage of Linux-specific features e.g. tail uses inotify). Nor does it use the optimisation that MarkPlotnick mentions... – psmears Apr 17 '18 at 14:01

mtraceur · Answer 3 · 2018-04-17T19:27:52.380

55

I suspect the why has a lot to do with the vision/design that shaped Unix (and consequently Linux), and the advantages stemming from it.

No doubt there's a non-negligible performance benefit to not spinning up an extra process, but I think there's more to it: Early Unix had an "everything is a file" metaphor, which has a non-obvious but elegant advantage if you look at it from a system perspective, rather than a shell scripting perspective.

Say you have your null command-line program, and /dev/null the device node. From a shell-scripting perspective, the foo | null program is actually genuinely useful and convenient, and foo >/dev/null takes a tiny bit longer to type and can seem weird.

But here's two exercises:

Let's implement the program null using existing Unix tools and /dev/null - easy: cat >/dev/null. Done.
Can you implement /dev/null in terms of null?

You're absolutely right that the C code to just discard input is trivial, so it might not yet be obvious why it's useful to have a virtual file available for the task.

Consider: almost every programming language already needs to work with files, file descriptors, and file paths, because they were part of Unix's "everything is a file" paradigm from the beginning.

If all you have are programs that write to stdout, well, the program doesn't care if you redirect them into a virtual file that swallows all writes, or a pipe into a program that swallows all writes.

Now if you have programs that take file paths for either reading or writing data (which most programs do) - and you want to add "blank input" or "discard this output" functionality to those programs - well, with /dev/null that comes for free.

Notice that the elegance of it is that is reduces the code complexity of all involved programs - for each common-but-special usecase that your system can provide as a "file" with an actual "filename", your code can avoid adding custom command-line options and custom code paths to handle.

Good software engineering often depends on finding good or "natural" metaphors for abstracting some element of a problem in a way that becomes easier to think about but remains flexible, so that you can solve basically the same range of higher-level problems without having to spend the time and mental energy on reimplementing solutions to the same lower-level problems constantly.

"Everything is a file" seems to be one such metaphor for accessing resources: You call open of a given path in a heirarchical namespace, getting a reference (file descriptor) to the object, and you can read and write, etc on the file descriptors. Your stdin/stdout/stderr are also file descriptors that just happened to be pre-opened for you. Your pipes are just files and file descriptors, and file redirection lets you glue all these pieces together.

Unix succeeded as much as it did in part because of how well these abstractions worked together, and /dev/null is best understood as part of that whole.

P.S. It's worth looking at the Unix version of "everything is a file" and things like /dev/null as the first steps towards a more flexible and powerful generalization of the metaphor that has been implemented in many systems that followed.

For example, in Unix special file-like objects like /dev/null had to be implemented in the kernel itself, but it turns out that it's useful enough to expose functionality in file/folder form that since then multiple systems have been made that provide a way for programs to do that.

One of the first was the Plan 9 operating system, made by some of the same people who made Unix. Later, GNU Hurd did something similar with its "translators". Meanwhile, Linux ended up getting FUSE (which has spread to the other mainstream systems by now as well).

edited Apr 17 '18 at 19:27

answered Apr 16 '18 at 22:05

mtraceur

1,166
9
14

Linux's FUSE (filesystem in user-space) makes it possible for a program to let other programs access virtual files/directories, e.g. to make a .zip looks like a filesystem, or whatever. But mount points and virtual filesystems are not the normal Unix mechanism, so this isn't widely used for things that don't really behave like filesystems for reading or storing file data. – Peter Cordes Apr 17 '18 at 02:24
8

@PeterCordes the point of the answer is starting from a position of not understanding the design. If everyone already understood the design, this question would not exist. – OrangeDog Apr 17 '18 at 11:46
@PeterCordes Last I checked, a process had to be root (or at least have CAP_SYS_ADMIN) to set up a FUSE mount. Is this still the case? In Plan 9 it was a regular operation subject to regular permissions, rather than a strictly root one. This is what I meant by "at least some kernel intervention", though I recognize that this is bad wording - I've edited it to be more explicit. – mtraceur Apr 17 '18 at 17:45
@OrangeDog: That was my point: it's a good and sensible design, not something that needs to be apologized for. :P Unlike with the cat foo | bar vs. <foo bar debate, or other Unix features that may seem a bit crusty these days, this one's pretty clear-cut. – Peter Cordes Apr 17 '18 at 17:46
1

@mtraceur: Mount an image file without root permission? shows some evidence that FUSE might not require root, but I'm not sure. – Peter Cordes Apr 17 '18 at 17:48
1

@PeterCordes RE: "seems weird": It's not an apology for the design, just an acknowledgement of how it can seem if you're not thinking of the system implementation under it, and haven't yet had the eureka moment about the system-wide design advantages. I tried to make that clear by opening that sentence with "from a shell scripting perspective", and alluding to the contrast between that vs. a system perspective a couple sentences prior. On further thought "can seem weird" is better, so I'll tweak it to that. I welcome further wording suggestions to make it clearer without making it too verbose. – mtraceur Apr 17 '18 at 17:55
@PeterCordes Thanks! And thanks for the link about FUSE mounting! As I understand it, at an underlying kernel/system level, mounting requires root, but in practice there are tools like fusermount that help initiating FUSE mounts from non-privileged contexts. You've actually helped me realize that I shouldn't be focusing on how FUSE and Plan 9 implementations differ. I've edited the ending again to reflect the greater point: that the design itself is good, with the fact that multiple systems have reimplemented generalizations of it being illustrative examples. – mtraceur Apr 17 '18 at 19:32
1

Best answer, but to nitpick, "everything is a file" isn't a metaphor, everything in unix actually is a file. Now the word "file" of course is figurative. – figtrap Apr 17 '18 at 20:54
2

The very first thing I was told as a young engineer in relation to Unix was "Everything Is A File" and I swear you could hear the capitals. And getting hold of that idea early makes Unix/Linux seem a lot more easy to understand. Linux inherited most of that design philosophy. I'm glad someone mentioned it. – StephenG - Help Ukraine Apr 17 '18 at 20:55
Just want to add that implementing "null" as a char. device invites a lot of creativity; there's lots of possibilities there. As an executable it would be more limited. – figtrap Apr 17 '18 at 22:22
2

@PeterCordes, DOS "solved" the typing problem by making the magic filename NUL appear in every directory, i.e. all you have to type is > NUL. – Cristian Ciupitu Apr 18 '18 at 13:01
@figtrap Thanks! I've been thinking about the "metaphor" thing a bit - I agree it's maybe not the best word - can you think of a better one to use here? So far "abstraction" feels the most "right", but I'm happy to hear other suggestions. – mtraceur Apr 18 '18 at 16:42
@mtraceur actually I don't think anyone was concerned with such things, at least as part of the UNIX design, "file" is a term rather strictly defined in UNIX ... metaphors came later :) – figtrap Apr 19 '18 at 19:30
@CristianCiupitu I believe that in DOS, NUL is not a magic filename, it is a device. More properly, it's written "NUL:", just like C:, D:, LPT1:, etc. IOW, it exists globally, not "in every directory", and is in fact very much like /dev/null. – Lyle Apr 21 '18 at 20:00
1

@Lyle NUL, CON, etc act as files in exactly the same way (and for the same reason) as /dev/null and friends on Unix-likes. They also act as though they exist in every directory, and are apparently reserved with every extension as well; try yourself with a command like echo hello > C:\Temp\NUL.txt – IMSoP Apr 22 '18 at 16:34
1

@CristianCiupitu Actually, as the article you link explains, the existence in every directory is just because early versions of MS-DOS had a flat filesystem, so writing > NUL was ubiquitous. Once directories were added, that meant "NUL in the current directory", so if only the root \NUL was special, you'd end up with dozens of files called NUL from running old programs with some other current directory. Unix was, I believe, always tree-based, so didn't need to worry about this. – IMSoP Apr 22 '18 at 16:42

score 14 · Answer 4 · answered Apr 16 '18 at 16:21

14

I think /dev/null is a character device (that behaves like an ordinary file) instead of a programm for performance reasons.

If it would be a program it would require loading, starting, scheduling, running, and afterwards stopping and unloading the program. The simple C program you are describing would of course not consume a lot of ressources, but I think it makes a significant difference when considering a large number (say millions) of redirect / piping actions as process management operations are costly on a large scale as they involve context switches.

Another assumption: Piping into an program requires memory to be allocated by the receiving program (even if it is discarded directly afterwards). So if you pipe into the tool you have the double memory consumption, once on the sending program and again on the receiving program.

answered Apr 16 '18 at 16:21

user5626466

289
1
7

10

It's not just the setup cost, it's that every write into a pipe requires memory copying, and a context switch to reading program. (Or at least a context switch when the pipe buffer is full. And the reader has to do another copy when it reads the data). This is not negligible on a single-core PDP-11 where Unix was designed! Memory bandwidth / copying is much cheaper today than it was then. A write system call to an FD open on /dev/null can return right away without even reading any data from the buffer. – Peter Cordes Apr 17 '18 at 02:41
@PeterCordes, my note is tangential, but it's possible that, paradoxically, memory writes today are more expensive than ever. An 8-core CPU potentially performs 16 integer operations in a clock time, while an end-to-end memory write would complete in e. g. 16 clocks (4GHz CPU, 250 MHz RAM). That's the factor of 256. RAM to the modern CPU is like an RL02 to the PDP-11 CPU, almost like a peripheral storage unit! :) Not as straightforward, naturally, but everything hitting the cache will get written out, and useless writes would deprive other computations of the ever important cache space. – kkm -still wary of SE promises Apr 19 '18 at 02:14
@kkm: Yes, wasting about 2x 128kiB of L3 cache footprint on a pipe buffer in the kernel and a read buffer in the null program would suck, but most multi-core CPUs don't run with all cores busy all the time, so the CPU time to run the null program is mostly free. On a system with all cores pegged, useless piping is a bigger deal. But no, a "hot" buffer can be rewritten many times without getting flushed to RAM, so we're mostly just competing for L3 bandwidth, not cache. Not great, especially on a SMT (hyperthreading) system where other logical core(s) on the same physical are competing... – Peter Cordes Apr 19 '18 at 03:18
.... But your memory calculation is very flawed. Modern CPUs have lots of memory parallelism, so even though latency to DRAM is something like 200-400 core clock cycles and L3>40, bandwidth is ~8 bytes / clock. (Surprisingly, single-threaded bandwidth to L3 or DRAM is worse on a many-core Xeon with quad-channel memory vs. a quad-core desktop, because it's limited by the max concurrency of requests one core can keep in flight. bandwidth = max_concurrency / latency: Why is Skylake so much better than Broadwell-E for single-threaded memory throughput?) – Peter Cordes Apr 19 '18 at 03:23
... See also https://www.7-cpu.com/cpu/Haswell.html for Haswell numbers comparing quad-core vs. 18-core. Anyway, yes modern CPUs can get a ridiculous amount of work done per clock, if they aren't stuck waiting for memory. Your numbers appear to only be 2 ALU operations per clock, like maybe a Pentium from 1993, or a modern low-end dual-issue ARM. A Ryzen or Haswell potentially performs 4 scalar integer ALU ops + 2 memory ops per core per clock, or far more with SIMD. e.g. Skylake-AVX512 has (per core) 2-per-clock throughput on vpaddd zmm: 16 32-bit elements per instruction. – Peter Cordes Apr 19 '18 at 03:31
@kkm: TL:DR: cache misses (and thus things like lookup tables to avoid ALU work) get worse and worse as memory and even L3 cache latency stays near-constant in nanoseconds but gets worse in terms of core clock cycles. But bandwidth to L3 and DRAM per core clock cycle may have improved over the last 10 years, as DDR1/2/3/4 memory clock speeds have climbed a lot. My Skylake desktop has dual-channel DDR4-2666, with each channel doing 2x 8-byte transfers per clock, at a 1333MHz memory clock. Pipe buffers should stay hot in L3 if they're really being used heavily; write-back not write-through. – Peter Cordes Apr 19 '18 at 03:39
See also What Every Programmer Should Know About Memory?. (My 2017 update on Ulrich Drepper's very excellent original article from late Pentium 4 / early Core 2 days, when even a single-threaded workload could bottleneck on DRAM bandwidth, but now it takes multiple threads to max out the memory controllers.) Anyway, yes, cache space is very valuable, and depriving other cores of some of it does suck. – Peter Cordes Apr 19 '18 at 03:41
@kkm: Not trying to jump down your throat with this flood of comments; more like CPU performance is an interesting topic for me so I can't help but comment with more details. And see my profile pic :P – Peter Cordes Apr 19 '18 at 03:55
@PeterCordes: Why, thank you, I appreciate the info and links! I maybe dated on my CPU part (anyone pulling RL02 from their memory would by now :) ). The main flaw with my argument is, I believe, as you pointed, a buffer may be written over many times without being flushed to disk. And you are right, I should have better compared byte to byte in CPU/RAM throughput. If I may ask, how should I bring DRAM latencies into the picture? The figure of 250MHz (40ns) may be too conservative, but it is unlikely possible to achieve 2666 MHz either, that's only the burst speed, is it? – kkm -still wary of SE promises Apr 19 '18 at 15:38
@PeterCordes: Flushed to RAM immediately, not disk, what was I thinking! Probably that RL02 thing. Brain fart, sorry. – kkm -still wary of SE promises Apr 19 '18 at 17:57
@kkm: Right, DDR4-2666 is 2666 mega-transfers per second during a burst. Command overhead reduces throughput a little. (It's not 2666MHz, because the actual clock is only half that, and data is transferred on the rising and falling edge. That's what DDR = double-data-rate means.) DRAM latency is mostly column-access latency, not the time to actual transfer data once the right column and row are selected and data is transferring. http://www.crucial.com/usa/en/memory-performance-speed-latency, and "What Every Programmer Should Know About Memory?" has a detailed DRAM section. – Peter Cordes Apr 20 '18 at 00:12
@kkm: But don't forget the latency for a request to even make it from a core to a memory controller, which is like 36 cycles on a quad-core Haswell or 62 cycles on an 18-core Haswell https://www.7-cpu.com/cpu/Haswell.html. My answer on Ram real time latency says the same thing. Anyway, DRAM latency is determined by its clock, and how tight the CAS latency timings are, and cache-miss latency is latency inside the CPU + DRAM latency. But a single core can keep ~10 requests in flight (Intel CPUs have 10 Line Fill Buffers, and 16 superqueue L2<->L3) – Peter Cordes Apr 20 '18 at 00:18
@kkm: But really for this, DRAM latency isn't in the picture, normally just L3 bandwidth for the pipe buffer (not even latency if we don't hit max parallelism). DRAM latency comes in if the extra cache footprint causes extra misses in other tasks. But it's very hard to predict what impact exactly that will have, and how much of that latency out-of-order execution can hide. Related: http://users.elis.ugent.be/~leeckhou/papers/ispass06-eyerman.pdf examines the cost of branch-prediction misses compared with other stalls like cache misses that have to go all the way to DRAM, vs. I-cache miss. – Peter Cordes Apr 20 '18 at 01:00
@PeterCordes: Wow, thank you so much, I am speechless! I still do not have a complete picture, and would certainly like to ask you a couple more q's, if you do not mind. Maybe you can suggest which SE would be the best for it? Would it be too focused for SO--or do you feel it would be the right place? There is not a SE on computer hardware, at the least at this level of detail, AFAIK. But your answers would be super useful to other people too, I am certain, if they'd get more exposure. – kkm -still wary of SE promises Apr 20 '18 at 07:56
@kkm: go ahead and ask on SO, tag it with [performance] and maybe [cpu-architecture] and [cpu-cache] and/or [memory], and/or [x86] if it's about x86 hardware. Include a link to this comment thread for background on what you're asking. – Peter Cordes Apr 20 '18 at 08:00
@PeterCordes: Absolutely, thank you so much! I'll write a summary of your explanations and include your links, and then describe what I understand and what I do not. – kkm -still wary of SE promises Apr 20 '18 at 08:04

score 7 · Answer 5 · answered Apr 19 '18 at 11:32

7

Aside from "everything is a file" and hence ease of usage everywhere that most other answers are based on, there is also performance issue as @user5626466 mentions.

To show in practice, we'll create simple program called nullread.c:

#include <unistd.h>
char buf[1024*1024];
int main() {
        while (read(0, buf, sizeof(buf)) > 0);
}

and compile it with gcc -O2 -Wall -W nullread.c -o nullread

(Note: we cannot use lseek(2) on pipes, so only way to drain the pipe is to read from it until it is empty).

% time dd if=/dev/zero bs=1M count=5000 |  ./nullread
5242880000 bytes (5,2 GB, 4,9 GiB) copied, 9,33127 s, 562 MB/s
dd if=/dev/zero bs=1M count=5000  0,06s user 5,66s system 61% cpu 9,340 total
./nullread  0,02s user 3,90s system 41% cpu 9,337 total

whereas with standard /dev/null file redirection we get much better speeds (due to facts mentioned: less context switching, kernel just ignoring data instead of copying it etc):

% time dd if=/dev/zero bs=1M count=5000 > /dev/null
5242880000 bytes (5,2 GB, 4,9 GiB) copied, 1,08947 s, 4,8 GB/s
dd if=/dev/zero bs=1M count=5000 > /dev/null  0,01s user 1,08s system 99% cpu 1,094 total

(this should be a comment there, but is too big for that and would be completely unreadable)

answered Apr 19 '18 at 11:32

Matija Nalis

3,111
1
14
27

What hardware did you test on? 4.8GB/s is pretty low compared to the 23GB/s I get on a Skylake i7-6700k (DDR4-2666, but the buffer should stay hot in L3 cache. So a good portion of the cost is system calls being expensive with Spectre + Meltdown mitigation enabled. That hurts doubly for piping, because pipe buffers are smaller than 1M, so that's more write / read system calls. Nearly 10x perf difference is worse than I expected, though. On my Skylake system it's 23GB/s vs. 3.3GB/s, running x86-64 Linux 4.15.8-1-ARCH, so that's a factor of 6.8. Wow, system calls are expensive now) – Peter Cordes Apr 20 '18 at 08:40
@PeterCordes It's old low end laptop (Acer Aspire E17), with 4x Intel(R) Celeron(R) CPU N2940 @ 1.83GHz running x86_64 Linux 4.9.82-1+deb9u3, so the lower score is quite expected – Matija Nalis Apr 20 '18 at 09:06
1

@PeterCordes 3GB/s with 64k pipe buffers suggests 2x 103124 syscalls per second... and that number of context switches, heh. On a server cpu, with 200000 syscalls per second, you might expect ~8% overhead from PTI, since there is very little working set. (The graph I'm referencing doesn't include the PCID optimization, but maybe that's not so significant for small working sets). So I'm not sure PTI has a big impact there? http://www.brendangregg.com/blog/2018-02-09/kpti-kaiser-meltdown-performance.html – sourcejedi Apr 20 '18 at 09:24
1

Oh interesting, so it's a Silvermont with 2MB of L2 cache, so your dd buffer + receive buffer don't fit; you're probably dealing with memory bandwidth instead of last-level cache bandwidth. You might get better bandwidth with 512k buffers or even 64k buffers. (According to strace on my desktop, write and read are returning 1048576, so I think that means we're only paying the user<->kernel cost of TLB invalidation + branch-prediction flush once per MiB, not per 64k, @sourcejedi. It's Spectre mitigation that has the most cost, I think) – Peter Cordes Apr 20 '18 at 09:34
Interesting! All the search results/blogs seems to be for the meltdown/pti news, I guess they're popular & also included the word "spectre". Would be very convenient if you can give me a pointer for the impact of the subsequent Spectre mitigation on syscalls. – sourcejedi Apr 20 '18 at 10:39
1

@sourcejedi: With Spectre mitigation enabled, the cost of a syscall that returns right away with ENOSYS is ~1800 cycles on Skylake with Spectre mitigation enabled, most of it being the wrmsr that invalidates the BPU, according to @BeeOnRope's testing. With mitigation disabled, the user->kernel->user round trip time is ~160 cycles. But if you are touching lots of memory, Meltdown mitigation is significant, too. Hugepages should help (fewer TLB entries need to be reloaded). – Peter Cordes Apr 21 '18 at 23:43
@PeterCordes Thanks! Circling back. In the pipe case, we see only 1 extra syscall per 1MB in the reader. In every case we used a writer with 2 syscalls per 1MB. On my system, reducing the block size below 64k does start to crater throughput, but that's not what we're doing. – sourcejedi Apr 22 '18 at 10:12
1

@PeterCordes On a single-core unix system, we would surely see 1 context switch per 64K, or whatever your pipe buffer was, and that would hurt... actually I also see the same number of context switches with 2 cpu cores ; it must also be counting a sleep/wake cycle for each 64k as a context switch (to a nominal "idle process"). Keeping the pipeline processes on the same cpu actually worked more than twice as fast. – sourcejedi Apr 22 '18 at 10:13
FYI, if I modify bs=16k count=320000 and reduce buffer in nullread.c to 16k, /dev/null worsens to 3.3GB/s, but nullread.c improves to 1.2GB/s. At 64k both, it is 4.7Gb/s for /dev/null and 1.3GB/s for nullread.c. At 4k, /dev/null drops to 1.5GB/s, and nullread.c to 683 MB/s. In any case, /dev/null retains much superior perfomance. – Matija Nalis Apr 22 '18 at 11:20

score 7 · Answer 6 · answered Apr 19 '18 at 22:07

Your question is posed as if something would be gained perhaps in simplicity by using a null program in lieu of a file. Perhaps we can get rid of the notion of "magic files" and instead have just "ordinary pipes".

But consider, a pipe is also a file. They're normally not named, and so can only be manipulated through their file descriptors.

Consider this somewhat contrived example:

$ echo -e 'foo\nbar\nbaz' | grep foo
foo

Using Bash's process substitution we can accomplish the same thing through a more roundabout way:

$ grep foo <(echo -e 'foo\nbar\nbaz')
foo

Replace the grep for echo and we can see under the covers:

$ echo foo <(echo -e 'foo\nbar\nbaz')
foo /dev/fd/63

The <(...) construct is just replaced with a filename, and grep thinks it's opening any old file, it just happens to be named /dev/fd/63. Here, /dev/fd is a magic directory that makes named pipes for every file descriptor possessed by the file accessing it.

We could make it less magic with mkfifo to make a named pipe that shows up in ls and everything, just like an ordinary file:

$ mkfifo foofifo
$ ls -l foofifo 
prw-rw-r-- 1 indigo indigo 0 Apr 19 22:01 foofifo
$ grep foo foofifo

Elsewhere:

$ echo -e 'foo\nbar\nbaz' > foofifo

and behold, grep will output foo.

I think once you realize pipes and regular files and special files like /dev/null are all just files, it's apparent implementing a null program is more complex. The kernel has to handle writes to a file either way, but in the case of /dev/null it can just drop the writes on the floor, whereas with a pipe it has to actually transfer the bytes to another program, which then has to actually read them.

Hm. Good point. Well, this is implemented by shells, so perhaps your shell is different from the old Bourne shell I grew up with. — Lyle, Apr 21 '18 at 20:44
One difference is that echo doesn't read from stdin, while grep does, but I can't think how the shell would know that before it execs them. — Lyle, Apr 21 '18 at 20:46
And strace does make this clearer: for me. you have it exactly right, with bash. The '<(...)' construct is doing something quite different from <filename. Hm. I learned something. — Lyle, Apr 21 '18 at 20:59

score 1 · Answer 7 · answered Apr 23 '18 at 15:14

1

As others already pointed out, /dev/null is a program made of a handful of lines of code. It's just that these lines of code are part of the kernel.

To make it clearer, here's the Linux implementation: a character device calls functions when read or written to. Writing to /dev/null calls write_null, while reading calls read_null, registered here.

Literally a handful lines of code: these functions do nothing. You'd need more lines of code than fingers on your hands only if you count functions other than read and write.

answered Apr 23 '18 at 15:14

Matthieu Moy

364

Maybe I should have phrased it more precisely . I meant why implement it as a char device instead of a program. It would be a few lines either way but the program implementation would be decidedly simpler. As the other answers have pointed out, there are quite a few benefits to this; efficiency and portability chief among them. – Ankur S Apr 23 '18 at 15:24
Sure. I just added this answer because seeing the actual implementation was fun (I discovered it recently myself), but the real reason is what others pointed out indeed. – Matthieu Moy Apr 23 '18 at 16:24
Me too! I recently started learning devices in linux and the answers were quite informative – Ankur S Apr 23 '18 at 16:35

score 0 · Answer 8 · answered Apr 20 '18 at 16:13

0

I would argue that there is a security issue beyond historical paradigms and performance. Limiting the number of programs with privileged execution credentials, no matter how simple, is a fundamental tenet of system security. A replacement /dev/null would certainly be required to have such privileges due to use by system services. Modern security frameworks do an excellent job preventing exploits, but they aren't foolproof. A kernel driven device accessed as a file is much more difficult to exploit.

answered Apr 20 '18 at 16:13

NickW

1

This sounds like nonsense. Writing a bug-free kernel driver is no easier than writing a bug-free program that reads+discards its stdin. It doesn't need to be setuid or anything, so for both /dev/null or a proposed input-discarding program, the attack vector would be the same: get a script or program that runs as root to do something weird (like try to lseek in /dev/null or open it multiple times from the same process , or IDK what. Or invoke /bin/null with a weird environment, or whatever). – Peter Cordes Apr 21 '18 at 23:51

score -2 · Answer 9 · answered Apr 20 '18 at 13:45

I hope that you are also aware of the /dev/chargen /dev/zero and others like them including /dev/null.

LINUX/UNIX has a few of these - made available so that people can make good use of WELL WRITTEN CODE FrAGMEnTS.

Chargen is designed to generate a specific and repeating pattern of characters - it is quite fast and would push the limits of serial devices and it would help debug serial protocols that were written and failed some test or other.

Zero is designed to populate an existing file or output a whole lot of zero's

/dev/null is just another tool with the same idea in mind.

All of these tools in your toolkit means that you have half a chance at making an existing program do something unique without regard to their (your specific need) as devices or file replacements

Lets set up a contest to see who can produce the most exciting result given only the few character devices in your version of LINUX.

Why is /dev/null a file? Why isn't its function implemented as a simple program?

9 Answers9

Linked