4

When I first borrowed an account on a UNIX system in 1990, the file limit was an astonishing 1024, so I never really saw that as a problem.

Today 30 years later the (soft) limit is a measly 1024.

I imagine the historical reason for 1024 was that it was a scarce resource - though I cannot really find evidence for that.

The limit on my laptop is (2^63-1):

$ cat /proc/sys/fs/file-max
9223372036854775807

which I today see as astonishing as 1024 in 1990. The hard limit (ulimit -Hn) on my system limits this further to 1048576.

But why have a limit at all? Why not just let RAM be the limiting resource?

I ran this on Ubuntu 20.04 (from year 2020) and HPUX B.11.11 (from year 2000):

ulimit -n `ulimit -Hn`

On Ubuntu this increases the limit from 1024 to 1048576. On HPUX it increases from 60 to 1024. In neither case is there any difference in the memory usage as per ps -edalf. If the scarce resource is not RAM, what is the scarce resource then?

I have never experienced the 1024 limit helping me or my users - on the contrary, it is the root cause for errors that my users cannot explain and thus cannot solve themselves: Given the often mysterious crashes they do not immediately think of ulimit -n 1046576 before running their job.

I can see it is useful to limit the total memory size of a process, so if it runs amok, it will not take down the whole system. But I do not see how that applies to the file limit.

What is the situation where the limit of 1024 (and not just a general memory limit) would help back in 1990? And is there a similar situation today?

Ole Tange
  • 35,514

3 Answers3

2

@patbarron has still not posted his comments as an answer, and they are really excellent. So for anyone looking for the answer it is here.

He writes:

You can look at the source code from Seventh Edition, for example (minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/h/user.h) to see how this was implemented originally. "NOFILE" is the maximum number of open files per process, and it affects the sizes of data structures that are allocated per-process. These structures take up memory whether they're actually used or not. Again, mostly of historical interest, as it's not done this way anymore, but that might provide some additional background on where this came from.

The other constant, "NFILE", is the maximum number of open files in the entire system (across all processes/users), and the per-process table of open files contains pointers into the "files" structure: minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/conf/c.c. This is also a compile-time constant and sizes the system-wide open files table (which also consume memory whether they're actually used or not).

This explains that historically there was a reason. Each process would reserve NOFILE file descriptors - no matter whether they were used or not. When RAM is scarce you want to avoid reserving memory you do not use. Not only is RAM cheaper today, the reservation is no longer done this way.

It confirms my observations: I have been unable to find a single reason why you would keep ulimit -n at 1024 instead of raising it to the max: ulimit -n $(ulimit -Hn). It only takes up memory when the file descriptors are actually used.

Ole Tange
  • 35,514
0

As far as I know, yes, the file-max kernel hard limit was due to the memory allocation strategy (the memory for the inode structure was allocated beforehand). This strategy was common, intuitive and efficient (back then), and was shared between DOS, Windows, Linux and other OSes.

Nowadays, I believe that the huge number you see is the theoretical maximum (264-1), and the "real" file-max is allocated at runtime and can be set via ulimit (ulimit -Hn and ulimit -Sn). So, the "file-max" is just a sort of maximum value for ulimit, essentially meaningless - it means, "whatever, until ulimits runs out of RAM".

LSerni
  • 4,560
  • I like 2^64-1. But why having a limit anywhere lower today? Why not let RAM be the limiting resource? What is the reason for that? (My ulimit -Hn is 1M - and while that is not a problem today, it does not feel astonishing in the slightest). – Ole Tange Dec 21 '20 at 23:59
  • 1
    Well, I think that RAM is the limiting resource. You could easily increase the limit, but it makes little sense to have it maxed out by default since it would never fit everyone anyway (and you can't devote all the RAM to file structures). So, the default is a compromise, as always. – LSerni Dec 22 '20 at 00:07
  • I have tried changing ulimit -Hn and no matter the size the difference in RAM usage is so small I cannot measure it. In other words, I cannot at all see than RAM is the limiting resource. Can you change it so it eats up a considerable amount of RAM? If you can, please show us how you did. (Also the number is 2^63-1). – Ole Tange Dec 22 '20 at 00:14
  • 1
    At least back in the times mentioned in this answer ... when your system has a total of 128KW of memory, then increasing the open file limit by even just a bit can really add up... – patbarron Dec 22 '20 at 02:22
  • @patbarron How does it add up? Let us assume you do not use any more files, how does it add up to set the limit higher? And why is a separate file limit preferable over a general limit on memory usage? I have updated the question with measurements from HPUX and Ubuntu: They show no change in memory usage. – Ole Tange Dec 22 '20 at 10:28
  • Oh, I'm not saying that the same reasons exist in modern systems today. But in the older systems where this idea came from, this is memory that's allocated for every process whether it was used or not - and when you have a system where you can find yourself scrounging for 30 or 40 bytes so you can fit your kernel into the available memory, it can be a problem. These days, not really a concern, and I'm not surprised that changing the limit in a modern system doesn't have an effect on memory use - the main reason the limit concept still exists is probably because POSIX specifies that it does. – patbarron Dec 22 '20 at 11:09
  • Also, in the older systems where this concept came from, NFILE was a kernel compile-time constant and could not be changed dynamically in any case. – patbarron Dec 22 '20 at 11:11
  • @patbarron With the kernel compile option it starts to make sense, but only if unused file descriptors would take up resources. And I have been unable to find evidence for that. – Ole Tange Dec 22 '20 at 11:18
  • 1
    @OleTange - you can look at the source code from Seventh Edition, for example (https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/h/user.h) to see how this was implemented originally. "NOFILE" is the maximum number of open files per process, and it affects the sizes of data structures that are allocated per-process. These structures take up memory whether they're actually used or not. Again, mostly of historical interest, as it's not done this way anymore, but that might provide some additional background on where this came from. – patbarron Dec 22 '20 at 19:59
  • 1
    The other constant, "NFILE", is the maximum number of open files in the entire system (across all processes/users), and the per-process table of open files contains pointers into the "files" structure: https://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/conf/c.c. This is also a compile-time constant and sizes the system-wide open files table (which also consume memory whether they're actually used or not). – patbarron Dec 22 '20 at 20:03
  • @patbarron You should convert that to an answer, because that explains why it used to make sense (fixed size memory allocation per process combined with much less memory available), and that it no longer does make any sense (because it is now dynamically allocated). If your answer also includes where the 1048576 limit comes from, it would be a perfect answer. – Ole Tange Dec 22 '20 at 20:20
0

The common function used in networking code to monitor file descriptors, select(), only handles file descriptors up to 1023 in many implementations. While this function is generally considered obsolete in new code, software than uses it will not function properly with higher numbered file descriptors.

File descriptors are only known to the user process as integer values and functions which operate on sets of file descriptors were implemented by assuming a small fixed range of possible file descriptors and iterating through the entire range checking if each one was marked for processing. This ends up being extremely costly if the maximum file descriptor number is too large.