29

I understand and accept the premise that defensive1 shell scripting is both prudent and, in the longer term, more sustainable.

Many of the answers to text processing questions here follow this principle by building into the answers contingencies for unorthodox filenames; that might contain spaces, dashes and new lines.

How prevalent are new lines in filenames? Specifically:

  • Do any applications create filenames that include newlines by default?
  • Are there any situations where it would be desirable to create such filenames?
  • Or are they predominantly an instance of user error?

[1] Meaning planning for and managing the broadest possible range of scenarios and contingencies...

Question inspired by the (rather plaintive) comment on this question.

jasonwryan
  • 73,126
  • 5
    Short answer is bizarre filenames with newlines and/or unprintable characters are never good practice, sensible apps don't create them, and you only really see them if someone is trying to break your shell scripts or programs that do not handle such names correctly. I'll let other people provide more detailed answers with references and such. – jw013 Oct 23 '11 at 23:09

3 Answers3

31

I've never seen a file name with a newline other than ones deliberately created to test applications that manipulate file names. File names containing newlines can appear because:

  • Some bug or user error (e.g. a bad copy-paste) resulted in an unintended file name.
  • Some filesystem corruption affected a file name.
  • Someone deliberately created a “strange” file name to exploit a security hole, where an application put more trust in the file names it was passed than it should have.

POSIX defines a filename as “a name consisting of 1 to {NAME_MAX} bytes used to name a file. The characters composing the name may be selected from the set of all character values excluding the slash character and the null byte. The filenames dot and dot-dot have special meaning.” There is no guarantee that every filesystem will accept “strange” file names (the only guaranteed characters are ASCII letters, digits, period, hyphen and underscore, i.e. A-Z, a-z, 0-9 and ._-, with hyphen forbidden in first position), but most native filesystems on modern unices do.

  • So spaces in filenames are not guaranteed to be portable? It would be helpful if you clarified that those last three characters are period, underscore, and hyphen. With the underlined link, it's hard to tell. – toxalot Mar 09 '14 at 23:20
  • 4
    @toxalot No, spaces are not guaranteed to be portable, nor , (used by RCS), : (used by X.org), ~ (used by many programs on backup files), … But they are supported by almost all modern systems. – Gilles 'SO- stop being evil' Mar 09 '14 at 23:41
26

When writing a paper, I often collect a bibliography of PDF files from various sources. Not all of these contain the correct metadata, which means I sometimes copy-paste the title of the paper from the PDF viewer into the filename. This often results in newlines within the file name, but has never been an issue with any tools I have used.

IMHO there is nothing 'defensive' about coding to a standard.. a standard which states that newlines are allowed in filenames. If your script does not handle all file names allowed in the standard, then your script is broken.

sml
  • 613
4

I've never seen NORMAL users use newlines in filenames. It appears that their primary purpose is to (1) make it easy for attackers to subvert your system, and to (2) make it harder to write secure programs :-(. However, modern Unix-likes (such as Linux) allow them, so you have to prepare for them if you want a program that resists attack.

"Filenames and Pathnames in Shell: How to do it correctly" shows how to handle this correctly.

user45404
  • 181
  • I'm a normal user and I have newlines in my file names. The scenario stated in @sml 's answer happened to me more than once. What's interesting to me is how can a newline in a file name be used to "subvert the system"? Do you have any sources explaining that? – Joseph R. Aug 18 '13 at 21:42
  • @JosephR. I can't think of a way to compromise a system, but you could use it as a DOS to applications that dont handle new lines (and crash instead) – strugee Oct 22 '13 at 01:20