5

Following up on Fixing Unix/Linux/POSIX Filenames: Control Characters (such as Newline), Leading Dashes, and Other Problems*, is there some way to forbid the creation of files with problematic names on any commonly used Linux kernel/filesystem? This needs to be enforced at a low level, so that no amount of tinkering short of changing the configuration via root access would enable the creation of such filenames. For example, a mount option to tell the filesystem driver to accept only valid UTF-8 sequences with no control characters, no newlines, and possibly other characters such as hyphen at the start of the name would be ideal.

The use case is mainly security hardening, but also being able to handle filenames in code without a horrendous assembly of hacks (see article above for huge amounts of details).

*: Most of the points brought up so far in answers and comments have been addressed in this article. Please check it before posting.

l0b0
  • 51,350

2 Answers2

4

There has at least been one proposal for a module restricting filenames. It is/was a Linux security module (LSM) by David A. Wheeler, the author of "Fixing Unix/Linux/POSIX Filenames".

There's an LWN article about it here: Safename: restricting "dangerous" file names. There's a link to the email with the patch in the article. It's from May 2016, I haven't tried it personally and I have no idea if it got any traction, or if it can be ported to recent-ish kernels. But the suggestion has been floated in actual code.

ilkkachu
  • 138,973
  • Yeah, that was mentioned in the article I linked. Unfortunately I don't have any of the expertise necessary to try this in my own kernel. I was basically hoping for a turnkey solution, something which might reasonably be made into a NixOS option like boot.kernelPackages.safename. – l0b0 Jan 07 '23 at 19:47
2

is there some way to forbid the creation of files with problematic names on any commonly used Linux kernel/filesystem?

Yes, there is.

With fanotify you can let a userland program intercept file operation on any directory tree you want to watch. That can be used to implement exactly things like using forbidden file names.

With kernel ebpf probes you can probably do the same without inferring additional context switch overhead (and without implementing a file system, or adding a kernel module). But I've never tried that.

Since the author of that article might have a particular opinion on what file names are problematic that other people, especially those writing operating systems and utilities, might not share, I don't think there is a ready to use tool that does exactly what you want - but it would be easy enough to code in just a few lines of code.

  • 4
    By the way, I am fully opposed to the point the linked article is trying to make: instead of removing the unnecessary assumptions from the bad software, it argues for adding unnecessary assumptions to the file system layer. IMHO, that's the wrong approach. If someone can't write a program that deals with a newline in a filename, it's not the rest of the world that has to do better. I think I disagree with the premise what a file name is for (always a human-readable identifier for a file) - but that doesn't change at all that your question is an interesting and relevant one! – Marcus Müller Jan 07 '23 at 02:06
  • 2
    Have you looked at the incredibly ugly hacks you have to use to loop over arbitrary filenames safely (or even worse, portably)? If we really treated filenames as binary blobs (with no NUL or forward slash) then they would be useless for the 99.999% use case, which is displaying them to end users. At that point we'd be better served by just using inodes for everything. – l0b0 Jan 07 '23 at 02:46
  • I'm afraid this answer is too far removed from a solution to be usable. The question is about how to do this with existing systems, not which technologies might be used to implement such a system. – l0b0 Jan 07 '23 at 02:50
  • Re: answer too removed from practice: I'm sorry if that's the case, but your literal question as stated was "s there some way to forbid the creation of files with problematic names on any commonly used Linux kernel/filesystem? " And I considered that indicative of this answer hitting the right spot! – Marcus Müller Jan 07 '23 at 09:49
  • Re: have you seen the trouble people have with looping over files? Yes. We have good solutions to that; it's just hard if you lay out your very particular rule set about what's portable or not, as the article does. Especially the find examples could have been solved with the fully POSIX -exec options safely. I don't buy the trouble with having lists of files in a file. That this file has to be newline-separated feels like a fully self-inflicted complication. – Marcus Müller Jan 07 '23 at 09:53
  • Edited the answer to more explicitly address the question you raised in your second comment about practical tools. – Marcus Müller Jan 07 '23 at 10:02
  • "easy enough to code in just a few lines of code" - Maybe if I were already a C+kernel+filesystem expert, but I'm none of those. – l0b0 Jan 07 '23 at 10:14
  • I feel like you've misinterpreted the point of the article. It's not about trying to impose a set of arbitrary rules for portability, it's about making sure that dealing with files is simple, since clearly even experts can't get file handling right. – l0b0 Jan 07 '23 at 10:18
  • as said, experts can. The proposed ruleset is in fact quite arbitrary. And, things aren't as bad as the article makes them out to be, which might just be owed to progress over time: find reliably has the options the author says might not be available. My ls displays me a list of files – with all possible file names displayed correctly, and escaped and quoted as necessary, for human copy and paste. If I keep a list of files, that's not for human consumption. Thinking additional restriction will not add more complications in practical usage and corner cases is what's naive about the article. – Marcus Müller Jan 07 '23 at 10:38
  • But maybe let's not stray too much from the actual answer that I've given (as opposed to the comments, which I purposefully separated from the answer); did my edit address the reason why you're not content with my answer? If not, what could I have written that would change that? – Marcus Müller Jan 07 '23 at 10:43
  • 3
    The downside of "experts can do it", is that then anyone who needs to do it, has to be an expert. Which makes for a kind of a high bar, when we're talking about something as trivial as filenames. Or, well, obviously not trivial, but something as common as filenames. I've never really grokked the idea that it should be possible to have newlines in filenames, or control characters. They seem rather unnecessary for naming things (really, if your names have multiple lines, it might be a time to rethink), and it's not like arbitrary binary data works, since slashes and NULs need special care. – ilkkachu Jan 07 '23 at 12:22
  • well, I for one never grokked the idea that a file name should be restricted from using newlines. We already have an excellent, because illegal everywhere, delimiter, the null character. But I I think we're deep in the land of opinions here – and I fully support l0b0's freedom to attempt to solve this; hence my answer :) – Marcus Müller Jan 07 '23 at 12:30
  • 1
    If the commonly available toolset allowed us to handle null terminated filenames reliably and consistently it wouldn't be a problem. But while GNU tools gradually get this feature, POSIX has nothing to say on the matter. My own preference would be to address this omission rather than try to reduce the set of permitted characters in a filename. But (like ilkkachu?) I have never really liked having to cope with filenames that can contain newlines. Very very few informational utilities can handle it, which means that parsing their output requires an assumption of no newlines – Chris Davies Jan 07 '23 at 13:20
  • 2
    @roaima yep, same here. Especially since the thing about filenames is that they are meant for storing data on long-lived media – forbidding something today solves the problem, but in the very distant future only; and someone will have a very unpleasant surprise in 2043, when they dig up the tax data tapes and try to deal with file names using tools that assume restricted names. Standardizing useful tools solves the problem immediately, and future us needs these tools, either way. – Marcus Müller Jan 07 '23 at 13:23
  • Interesting point about long-lived media, and one which is addressed (indirectly, and assuming non-adversarial filenames) in the article linked. Basically, translate unsafe characters into safe characters at the filesystem level, so that the dangerous characters are never visible to system tools. That said, who in their right mind would use control characters in archived data filenames? – l0b0 Jan 07 '23 at 19:54
  • @l0b0 can't translate a file system where that ends up mapping multiple actual file names to a single "safe" name, or when the software using the files isn't expecting normalization. To illustrate, in a terrible, decades-of-trouble way, see the unicode file name normalization done by Mac OS: that breaks so many things. It's really not a cure, in my experience, it's a disease. "Who would do that?" If nobody did that, we would not have a problem. – Marcus Müller Jan 07 '23 at 20:59
  • @l0b0 problem is that you really shouldn't introduce file system changes to software systems that so far worked with the current abstraction. The article kind of naively deals with that in a "we're allowed to" way, but "we're allowed to" doesn't assess the damage done when you actually do. Where they go into "ok, and then we actually have to deal with it" territory, the author admits they have to modify low-level libraries and application software. My solution: Leave the file system alone and fix your userland software. Not that hard or big a problem to warrant breaking data. – Marcus Müller Jan 07 '23 at 21:09
  • Re. opening files on a different system in the future, POSIX already has that rather restricted list of portable filename characters, and Windows systems also restrict some funny characters. Meaning that one already needs to, and should think to, take a bit if care if portability is a concern. If the filename restrictions clashing with old media becomes an issue, I'd expect it wouldn't be impossible for a conversion tool to be created. – ilkkachu Jan 08 '23 at 12:07