34

People say you shouldn't use spaces in Unix file naming. Are there good reasons to not use capital letters in file names (i.e., File_Name.txt vs. file_name.txt)? Or is this just a matter of personal preference?

DD343
  • 399
  • 9
    There are some Unixy things that use filenames with capital letters... some examples include the Makefile, INSTALL, CHANGELOG and of course the venerable README. – Thomas Oct 29 '15 at 04:19
  • 1
    You can use caps but as a standard don't use it. Just use small letters and _ so file_name.txt is good. – Shabir A. Oct 28 '15 at 19:14
  • PSR-2 - the de-facto naming standard of the PHP world, which runs by majority on Linux uses camelCase http://www.php-fig.org/psr/psr-2/ – jdog Oct 29 '15 at 06:48

7 Answers7

53

People say you shouldn't spaces in Unix file naming.

People say a lot of things. There are some tools that may screw up, but hopefully they are few in number at this point in time, since spaces are a virus proliferated by giant consumer proprietary OS corporations and now impossible to avoid.

Spaces make specifying filenames on the command line, etc., awkward. That's about it. The only categorically prohibited characters on *nix systems are NUL (don't worry, it's not on your keyboard, or anyone else's) and /, since that is the path separator.1 Other than that anything goes. Individual path elements (file names) are limited to 255 bytes (a possible complication if you are using extended character sets) and complete paths to 4 KiB.

Or is this just a matter of personal preference

I would say it is. Most DE's seem to create a slew of capitalized directories in your $HOME (Downloads, Desktop, Documents -- the D is very popular), so there's nothing bizarre about it. There are also very commonplace traditional files with capitals in them, such as .Xclients and .Xauthority.

A value of capitalizing things at the beginning is that when listed lexicographically they'll come before lower case things -- at least, with many tools, and subject to locale.

I'm a fan of camel case (aka. camelCase) and use it with filenames, e.g., /home/goldilocks/blueSuedeShoes -- never mind what's in there. Definitely a matter of personal preference but it has yet to cause me grief.

Java class files tend to contain capitals by nature, because Java class names do. And of course, let's not forget NetworkManager, even if some of us would prefer to.


1. There is a much more delimited, recommended by POSIX "Portable Filename Character Set" that doesn't include the space -- but it does include upper case! POSIX also specifies the more general restriction regarding "the slash character and the null byte" elsewhere in the same document. This reflects, or is reflected in, long standing conventional practices.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
  • 7
    Mia: "Is that a fact?" Vincent: "No it's not, it's just what I heard." Mia: "Who told you this?" Vincent: "They." Mia: "They talk a lot don't they?" Vincent: "They certainly do." – corsiKa Oct 28 '15 at 21:35
  • 6
    “The value of capitalizing something at the beginning is that when listed lexicographically […], they'll come before everything else.”—Of course, this only works if most of the filenames are lowercase, giving you a reason to reserve caps (at least leading caps) for your READMEs and Makefiles and so on. – Blacklight Shining Oct 28 '15 at 21:46
  • 4
    On many keyboards, ctrl-space or ctrl-@ or alt-0 will type a NUL. – dubiousjim Oct 28 '15 at 21:53
  • 1
    Some third party tools will break - either by exploding or by subtly doing something wrong - when encountering spaces or non-ascii characters in file paths. Sucks, but it still happens, and it's quite possible to introduce such problems in your own software. One way to reduce this risk might be to have your home folder named something like "höme\t " so that if something relies on paths being ascii w/o whitespace, then you notice it immediately on your own system, not by chasing weird bugreports. – Peteris Oct 28 '15 at 22:51
  • You can certainly put slashes in an ext* Linux filename. You just have to escape them. – dodgethesteamroller Oct 29 '15 at 01:49
  • @dodgethesteamroller Since the question is file system and OS agnostic I went with the accepted standard for *nix systems generally, which is in fact specified by POSIX. – goldilocks Oct 29 '15 at 02:53
  • 2
    @dodgethesteamroller I believe you are flat-out mistaken about forward slash (or more precisely, the byte with value 0x2F) in ext*. In fact, I don't believe it will even get to the filesystem; the VFS layer will disallow it regardless of the backing store. – zwol Oct 29 '15 at 15:19
  • @dubiousjim I think probably you're thinking of specific applications which will do that; any alt/ctrl + key combination sends a scan code representing the combination from the keyboard to the OS. It's then up to the OS what to do with it; usually it translates it into a key code and passes that on to an application. So technically, keyboards don't send character byte values at all; those exist on a higher level. I was being a bit flippant about NULL not being there -- but there's a truth to it. – goldilocks Oct 29 '15 at 15:32
  • 5
    just don't use spaces in filenames and directory names. even if your system technically allows it, it will only cause your grief. Instead use "_" the underscore character. – SnakeDoc Oct 29 '15 at 17:20
  • @goldilocks The edit you rejected made extensive (and perhaps inappropriate) use of the term "the kernel", but I was not speaking of any particular implementation of Unix — all my proposed corrections are in fact specified by POSIX. You may have been confused by the use of the term 'character' in the standard; in context it means 'byte.' Please refer to the last two bullet points after "POSIX places only the following requirements on the encoded values of the characters in the portable character set" at http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html , and reconsider. – zwol Oct 29 '15 at 17:52
  • @goldilocks ... I guess I also shouldn't have put quite so much emphasis on the byte value 0x2F; POSIX does offer just enough wiggle room that a system based on (UTF-)EBCDIC (with at 0x61) could be conforming provided it supported no ASCII-based locales. And I'd not be surprised to learn that IBM's Unix-as-z/OS-guest products do just that. But I think the correction from "character" to "byte" is very, very important since it is why you cannot use UTF-16 for pathnames. – zwol Oct 29 '15 at 17:58
  • @zwol I'm sure you noticed I had a link to the POSIX spec re: "portable character filename set" in the footnote, and that neither that document nor the one you've indicated refers to a kernel, only "an implementation". I think you understand the difference. The POSIX spec WRT file names does in fact refer to "the slash character" (presumably resolved to a byte value by the "Portable Character Set" document)... – goldilocks Oct 29 '15 at 18:07
  • <- Anyway, I've added an additional link to the broader "excluding the slash character and the null byte" spec. That's all a user needs for a "best practice" as per the question. If you are working on an implementation, then maybe you want to dig deeper. – goldilocks Oct 29 '15 at 18:08
  • A nice thing about lowercase names/directories is it makes navigation from the command prompt easier since you don't have to capitalize your directory. Same with a lack of spaces. – enderland Oct 29 '15 at 20:22
  • @zwol You are absolutely right. I thought that I had been able under some pathological circumstance to create a file on Linux with a slash in the name by backslash-escaping it, but I haven't been able to reproduce those conditions and I accept the explanation given why it's essentially impossible on a POSIX-compliant machine. I was thinking of OS X. – dodgethesteamroller Oct 30 '15 at 21:33
  • @dodgethesteamroller Not to keep nitpicking, but compliance does not mean forbidding the slash -- it means you don't have to permit the slash (whereas you do have to permit everything else), hence for portability it should be considered forbidden. Also, FYI, OSX is more POSIX compliant than linux. It's officially certified, whereas linux is just considered "mostly compliant" (but since "linux" isn't a complete operating system unto itself, it can't be so certified). – goldilocks Oct 30 '15 at 22:59
  • "they'll come before everything else" - in the "C" locale, but not in many natural-language locales... – Toby Speight Nov 02 '15 at 20:07
12

One reason to avoid caps in filenames is that sorting order in Unix is case sensitive, so files starting with a capital letter will appear out of order. That's the reason why Makefile is usually named using a capital M - it's one of the files you want to see first, without scrolling/skipping down trough a-l.

This said, you can do much worse in terms of file names:

  • using spaces will break some badly-written programs and scripts which don't quote file names properly
  • starting a file name with a - may cause problems as many programs will see it as a command-line option instead of a file name (e.g. rm -r will not remove a file named -r).
  • starting a file name with a . will hide it from many utilities and shell globbing (e.g. rm * will not remove files like .config)
  • using special characters like |<>*? and even non-printable characters like newline is technically possible, but may break scripts/programs similar to space character. The difference is that the space character is often used, so programmers tend to test their programs against it, while less popular characters often remain untested.
  • 4
    This tends no longer to be true, sorting in modern locales tends to be case-insensitive nowadays and many tools and shell globbings honour the locale for sorting file names. – Stéphane Chazelas Oct 29 '15 at 13:43
  • 2
    Did you mean to say: rm * will not remove files like .config? – Wildcard Oct 29 '15 at 15:51
  • 1
    @Wildcard not really, but perhaps your example is more realistic than mine. My point was to show that filenames starting with a dot are immune to globbing even if the user specifies that dot explicitly. – Dmitry Grigoryev Oct 29 '15 at 16:15
  • 1
    @DmitryGrigoryev, no they aren't. Try ls -ald .??* in any directory that has dot files. – Bill Barth Oct 29 '15 at 18:52
  • rm .* will remove a file named .config. It will also, however, attempt to recurse into|remove the directories named . and .., provided you passed -r – Blacklight Shining Oct 29 '15 at 21:23
  • @BillBarth you're right, thanks. Though it's not what bash reference manual says: "dotglob: If set, Bash includes filenames beginning with a ‘.’ in the results of filename expansion.". I assumed it meant they are excluded if dotglob is not set. – Dmitry Grigoryev Oct 30 '15 at 08:20
  • @DmitryGrigoryev No, you're reading that wrong. Setting dotglob ignores leading dots when globbing. Try shopt -s dotglob; ls -aldtr ~/* vs. shopt -u dotglob; ls -aldtr ~/*. The former will show dotfiles and the latter will not. – Bill Barth Oct 30 '15 at 19:49
  • 1
    I believe that it would be more appropriate to say "If you choose to use capital letters in file names, you should bear in mind the fact that sorting order in Unix is (sometimes) case sensitive."  The user may want this behavior, and Makefile and README are perfect examples of that.  Note also that this effect is negligible if the letter isn't the first letter in the name, so it's not a big deal if you use camelCase.  Sure, you might be surprised to see anOctagon before angle, but at least they'd be together in the listing. – G-Man Says 'Reinstate Monica' Oct 31 '15 at 06:59
  • Starting a file name with a . makes it hidden, which is pretty useful in a lot of ways, including protection from accidental rm *. Files like .bashrc and .gitignore start with a dot for this very reason. – Nick Volynkin Nov 03 '15 at 07:04
  • @StéphaneChazelas GitHub sorting is still case sensitive. – Leponzo Jan 30 '20 at 03:35
6

If you are going to interface with a Windows environment you should avoid capitals because Windows will lowercase everything. This is more often a problem going the other way; a link to Page_2.html will find page_2.html in Windows, but will fail in Unix.

NL_Derek
  • 185
  • 14
    That's not true. NTFS, VFAT, and exFAT are all case-insensitive but case-preserving, meaning they ignore case for purposes of lookup, but store case nonetheless. The same applies to HFS+, the default filesystem on OSX. NTFS even has a POSIX namespace which works exactly like all other Unices, i.e. very long filenames of un-interpreted octets, with only NUL and / prohibited. – Jörg W Mittag Oct 28 '15 at 23:20
  • It's more of a problem when you have files stored on a unix system shared to a windows box, e.g. via samba or nfs. Windows programmers tend to be lazy about upper/lower case in filenames because native windows filesystems are case-insensitive. So the program will try to open //sharename/lowercasefilename that it created earlier as //sharename/UpperCaseFilename and then be unable to open it because the lower case version doesn't exist. or vice-versa. – cas Oct 29 '15 at 00:17
  • there are, of course, samba options to deal with this issue. – cas Oct 29 '15 at 00:19
  • 5
    More to the point, "case-insensitive but case-preserving" is another way of saying "capable of silently overwriting file A because its name differs only in case from file B" (or vice versa, depending on which was saved later). In other words, if you're using a *nix shell to access an NTFS share, cat > Foo will overwrite file foo. This behavior is likely to be unexpected and confusing if you are used to case-preserving and case-sensitive filesystems such as ext*. – dodgethesteamroller Oct 29 '15 at 01:52
  • @dodgethesteamroller, ouch. yes. that's even badderer. – cas Oct 29 '15 at 06:54
  • 1
    @JörgWMittag Unless i'm mistaken, NTFS is not case-insensitive, it's just that windows works in mysterious ways. – Cthulhu Oct 29 '15 at 11:10
  • 1
    @Cthulhu: AFAIK, NTFS has four different namespaces in which you can create names for files. (I don't know whether a single file can have a name in more than one namespace, though.) A "DOS" namespace (8.3, case-insensitive), a "long" namespace (case-insensitive, case-preserving, UTF-16), a special namespace for "short long" names, i.e. names whose case should be preserved but that fit into 8.3, and a POSIX namespace (a stream of octets other than \0 and /, case-sensitive). At least that's how I remember it. But I agree that it's kind-of a mess. There are further restrictions in the … – Jörg W Mittag Oct 29 '15 at 11:34
  • 1
    … kernel, and even further restrictions in the API (actually, there are different APIs from different eras with different restrictions), there are restrictions due to compatibility with DOS and FAT, there are restrictions in the command interpreter, there are restrictions in the (graphical) shell, and there are restrictions in Explorer. And it's often impossible to reliably determine where a restriction is coming from. It's crazy. I once managed to create a file using the Explorer, which could not be opened, copied, moved, renamed or deleted using any tool I tried. It basically stayed on … – Jörg W Mittag Oct 29 '15 at 11:37
  • 1
    … my system through various Windows upgrades from XP all the way to 7, harddisk upgrades, computer upgrades, until I moved to OSX, when I abandoned that particular filesystem. – Jörg W Mittag Oct 29 '15 at 11:38
5

Since NL_Derek opened this can of worms, but didn't articulate it properly, I'll say this:

It's OK to use capital letters, but you should avoid creating files (in the same directory) that differ only by case, e.g., File_Name.txt and file_name.txt, because

  • If you somehow make the directory available to a Windows system, it will not be able to access both files.  It will probably be able to access only the one that appears first in the directory, regardless of which name you use.  (Except: it may give you access to them as FILENA~1.TXT and FILENA~2.TXT — type dir /x to see what short name (if any) goes with what long name.)
  • If the file system is actually a Windows file system (e.g., mounted from an exFAT or NTFS file system from an NFS server running Windows), the two names will (probably) not be allowed to coexist.  For example, if you do cmd1 > foo and cmd2 > Foo, you may end up with a single file, containing the output from cmd2.
  • Similarly, if you ever transfer the files to a Windows system, the two names will (probably) not be allowed to coexist.  For example, if you created an archive (e.g., zip) containing the two files, and extracted it on a Windows system, the second file would probably overwrite the first one.  Same thing if you transferred them to a Windows box with FTP or something similar.
4

One reason to avoid caps is that bashs tabcompletion is case-sensitive (at least by default)—this still trips me up every time I end up in front of a bash with default configuration. Sure, there are other popular shells, but this combined with the fact that bash is the default login shell on many OSes means that the default is oftentimes case-sensitive completion. Using all-lowercase filenames rather simplifies things here.

3

Apart from technical reasons, I have a practical aspect to this. Sticking to lowercase letters will ensure that searches are easier unless one is too fond of using grep -i or locate -i. Sometimes, even camelCase can be confusing if one has to use a string of like-case words as in storageNYCDCPrimary. So, I find it best to stick to lowercase and pepper them with underscores or hyphens for readability, like storage_nyc_dc_primary.

  • 2
    snake_case is easy on the eyes - storageNycDcPrimary and StorageNycDcPrimary are both weird to read. – go2null Feb 06 '19 at 22:08
2

I do consider it is best practice to avoid using capitals and spaces in filenames.

Some will say they do not agree but it is a matter or what I call religious beliefs: hard to discuss and agree on. Those not agreeing say that most of the tools are now fixed to be capitals and spaces friendly: they are right but this is not the question though.

The right question is how much do you need to use capitals and spaces in filenames. To this question, except when I am programming in Java, the answer is mostly all the time: I do not need capitals and spaces in my filenames. All spaces I replace by an underscore (_) or a minus sign (-), and because of that I do not use camel case (aka. camelCase) contrary to some of the other religion.

Many people called bullshit on me for doing and teaching that - some of them still do - some of them tripped on a tool that was not capital/space friendly and came to me saying that I was right and that they should have listened to me. Do whatever you want, and if you use capitals and spaces in filename, I hope you will never trip on a badly written tool. However, if you trip on such tool, hopefully again, it will not be hard to fix and will not cost your business and/or you lot of money and/or time. But if it ends-up having bad repercussions, you will remember that some told you in the past that using capitals and spaces in filenames is bad practice.

And one last thing, if you want to avoid all problems, no special characters in filenames (only lower case letters, digits, underscore and minuses [1]). This unwanted character list also includes all non ascii characters (yes, French and other non English people - and I am one of them - none of those: à, â, ä, ç, é, ..., ö, æ, œ, ...). This also extends to many other things, including login and password. I will let you guess what happen when you put a quote or double quote (' or ") in a login or password that is handled by a bash script not written by a confirmed sysadmin....

[1]: maybe we could extend that to ~, @, # and some others, but this is looking for trouble (and yes I know about emacs files...).

jfg956
  • 6,336
  • 1
    The last thing is something that should be handled by the authenticating system, not the user coming up with the password. If the system limits the set of allowed characters in passwords, it is a bad system. – Blacklight Shining Nov 23 '15 at 00:41
  • Well, limiting characters in password is a subject for debate: li1, oO0, ... depending on the fond, hard to communicate. Some would say that password should not be communicated, but a WiFi Key is a sort of password that I communicate to my friends when they are at my place... – jfg956 Nov 23 '15 at 07:41
  • That's a conscious choice on your part to avoid using some characters, rather than a limitation built into the system (in this example, the Wi-Fi standards, AP and client implementations, etc). If you're using a string of randomly-selected characters as a password, you can improve readability by using (or encouraging the recipients to use) a monospace font, or by simply using more distinctive glyphs if you're handwriting them (seriffed lowercase L, uppercase I, and digit 1; smaller lowercase O, rounder uppercase O, slashed or dotted digit 0; etc). Alternatively, you could use a passphrase. – Blacklight Shining Nov 24 '15 at 05:01