10

According to this page:

File names in Linux can contain any characters other than (1) a forward slash ( / ), which is reserved for use as the name of the root directory (i.e., the directory that contains all other directories and files) and as a directory separator, and (2) the null character (which is used to terminate segments of text). Spaces are permitted, although they are best avoided because they can be incompatible with legacy software in some cases.

Great, both restrictions make a lot of sense. Since it is clearly possible to forbid the inclusion of certain characters in file names, why were newlines allowed? As far as I can tell their only use is to complicate our scripts. Is there ever a valid reason to have a new line in a file name?

terdon
  • 242,166
  • Why shouldn't it be included? Sure it may complicate your scripts, but so will lots of other characters. – Zoredache Oct 04 '13 at 04:41
  • @Zoredache none of them do so as much as the newline (except perhaps the backslash) and none of them while being so completely pointless. What in the world is the point of allowing these characters given that they cause such complications? – terdon Oct 04 '13 at 04:43
  • Seems like a simple case of the Robustness principle. Be conservative in what you send, be liberal in what you accept. Accepting everything gives the most flexibility to the user, future developers, applications and so on. – Zoredache Oct 04 '13 at 04:49
  • 2
    There's no good reason to allow newlines in filenames, but unfortunately we're stuck with them..too late to change now. And the Robustness Principle is subverted because the presence of \n in filenames leads to more fragile scripts as most shell programmers even find dealing with spaces in filenames to be difficult and most of the reset only know about find ... -print0 and xargs -0 (and don't realise, e.g., you can tell bash's built-in read to use NUL as a delimiter with -d $'\0', or that many GNU tools have -0, -z or -Z options for handling NUL-terminated stdin) – cas Oct 04 '13 at 06:02
  • 2
    worse, an extremely common file format (i.e. "one filename/item per line, separated by newlines") is made unreliable by the presence of newlines in filenames. There's not even a reliable way to convert that format to NUL-separated. All you can do is hope/assume that your users are relatively sane and haven't used \n in their filenames. – cas Oct 04 '13 at 06:06
  • It's not as simple as it seems. In general file systems are implemented in kernel space. The kernel basically deals with byte sequences and the users are free to interpret the byte sequences in any way they like by choosing an encoding. Note that the encoding does not affect the kernel, so to decide if particular characters are allowed, the kernel would have to know and understand the used encoding. At the moment the kernel makes two assumptions about the sequences 0x00 and 0x2F, and that's all there is. See Understanding Unix file name encoding – Marco Oct 04 '13 at 06:39
  • 1
    Yes, i understand WHY newlines are legitimate characters in a filename. I just don't think that there's any good reason for them to be allowed, and certainly no good reason to use them. It's possible to login as root to run X and all the usual user GUI apps, but it's a bad idea to do that. same with newlines in filenames - legit but stupid. – cas Oct 04 '13 at 08:33
  • @Marco I know that at the moment the kernel only assumes about \0 and / but my question is why not \n? Is there ever a legitimate reason to have \n in a file name? – terdon Oct 04 '13 at 14:43
  • Related: http://serverfault.com/questions/150740/linux-windows-unix-file-names-which-characters-are-allowed-which-are-unesc – slm Oct 04 '13 at 18:57
  • IMHO the "duplicate question" is about whether/where newlines are used. "How prevalent are new lines in filenames?" This question is mainly about why it is allowed. Related, yes, same no. Am I far off? – Runium Oct 06 '13 at 07:22

1 Answers1

5

NUL and / has their designated system functions. Other characters does not.

That is the basics of it – the rest is opinions, speculations and history. Heard, read etc. and only included as a filler not a debate or argument:

  • By forbidding certain characters you open up for complexity in the file system itself, which is the same as compromising it.
  • What about which bytes constitute as a newline on various systems? <CR> vs <LF> etc.
  • What if a remote system decides to create a file with newline on a NFS?
  • What if the filename get corrupted whilst the file contents is intact?
  • What if an application encode information in the filename?

And on it goes

  • Is it the systems job to fix bugs in user software?
  • Should a system, on it's root level, protect users from themselves?
  • Should the way the various shells are implemented internally govern a decision as to what file names are considered legal?

The basic operating system doesn't set limitations. Information to and from the system is byte streams. If a byte does not have a special meaning, don't create overhead by adding checks that should be handled in user space.


Anyhow, the biggest issue would most likely be the rather long history where newline, and other control characters, have been allowed.

Another case is what to forbid. You mention newline, but in discussions from the stone-age of UNIX, this has been debated, then also including other characters. Should * be forbidden? What about filenames starting with -? What about DEL and ESC? Should all control characters be forbidden? And so on and so forth.

I can unfortunately not recall any quotes on this topic by the founding fathers or code maintainers.

Runium
  • 28,811
  • 3
    None of your other examples are anywhere near as problematic as \n. They can all be dealt with more easily. The overhead of checking for \n is a valid point but it's not such a big deal. I would say that the overhead is worth avoiding the headaches. I can think of no valid reason why a file name should ever contain \n. – terdon Oct 04 '13 at 14:45
  • 1
    "Should the way the various shells are implemented internally govern a decision as to what file names are considered legal?" is the key point. Having trouble dealing with filenames containing newlines is a problem specific to the Bourne shell. C programs have no such difficulty. The Unix kernel was long written before the Bourne shell, so I would guess the filesystem just wasn't originally designed with the shell in mind. – Matt Oct 04 '13 at 20:04
  • 1
    @Matt Most modern unices allow any character in file names except NUL and /, but older unices didn't do this. Did the original Unix (the one that predates the Bourne shell) allow exotic characters? – Gilles 'SO- stop being evil' Oct 04 '13 at 22:22
  • Ctrl-O can mess up your tty, and Ctrl-G can make you want to strangle someone. IMO all control characters are problematic. – cas Oct 04 '13 at 22:38
  • @terdon: Yes, I understand your stand, though I'm on the dark side. Anyhow. A very interesting topic that touches the very core of a lot of very interesting topics. "Unfortunately" this is isn't a debate forum so I'll restrain myself from expanding on this any further. – Runium Oct 05 '13 at 17:56
  • I can't think of any valid reason that a filename should need to contain numbers. If someone wants to represent numbers, they could just as easily put the word "one" in their filename, for example. Besides, using numbers breaks my poorly-implemented program which assumes that all characters are in the ASCIII range between 65 and 122. ;) – dannysauer Oct 21 '13 at 23:02
  • In any event, a filesystem is a database. The keys in the database are intended to be useful to something which reads that database. If I want to write a program which stores some kind of useful data like a plain-text description of the file in the filename, that description may well contain newlines. It may be that it makes sense to be able to access that description with just a stat() rather than having to create a filehandle and open the file up, possibly for performance reasons. As it stands, "s/\///" is all I'd have to do to make this scheme work. – dannysauer Oct 21 '13 at 23:07