6

Is it a good idea to use double dots or double minus signs as the delimiters? I'm trying to find a good naming convention for experimental scientific data. For example:

2017-12-11T19-45..JDoe-042..UO2(NO3)2-EtOAc_dist..150.3K..1.234mM.dat
2017-12-11T19-45--JDoe-042--UO2(NO3)2-EtOAc_dist--150.3K--1.234mM.dat

My reasons:

  1. To assure compatibility across platforms, the only suitable characters are _ - . and their combinations;
  2. None of them can be used on their own in my case:
    • _ is reserved for the spaces; due to case-sensitive chemical formulas I cannot use camelCase.
    • - is often a part of internal lab codes, plus it's being used as a replacement for a colon : in time (modified ISO 8601 notation) and ratios;
    • . is a decimal mark.
  3. Among their combinations the most popular, it seems, is _-_. However, this is 3 characters and the filenames are already pretty lengthy (as one can tell from the examples), so I'd like to stick with two characters if possible.
  4. Visually I find it's hard to quickly tell the difference between __ and _, whereas -- vs - and . vs .. are quite distinguishable to me.
  5. I haven't included comma , (as it has been rightfully suggested in the comments, this is also a viable character to consider) as I think it's easy to confuse it with a single dot ., which is already primarily reserved for the numerical values with a decimal point.

According to several posts across SE network, e.g.

I would assume both -- and .. are totally acceptable, and I'm thinking of finally choosing ... However, I'm not certain, especially regarding how regular expressions or python scripts can handle these files and folders (I have very little experience with both, but I'm learning).

Disregarding the behavior of specialized software, would you say these delimiters are generally safe for common file systems and scripting languages?

andselisk
  • 305
  • 1
    Why can't you use other characters, e.g. comma? – Salman A Dec 10 '17 at 18:33
  • @SalmanA Oh, I forgot to mention, I think a single dot . from the numerical values can be easily confused with a comma ,, so I dropped this option (must be reason 0 then:) ). Thanks for pointing this out! – andselisk Dec 10 '17 at 20:22
  • 1
    -- has been used historically as a separator, as in the GNU Arch version control system written by Tom Lord; personally, I've found the convention to be useful, and leveraged it myself, since. On the other hand, this was (like several of Arch's other naming conventions) considered pretty weird by a lot of people at the time. – Charles Duffy Dec 10 '17 at 21:27
  • @CharlesDuffy Could you please share a link where I could read about this convention? – andselisk Dec 10 '17 at 21:57
  • 1
    @andselisk, ...funny thing -- while a bunch of the other controversial naming decisions made in the Arch project got a wiki page, apparently this one wasn't considered odd enough to qualify. – Charles Duffy Dec 10 '17 at 22:08
  • 1
    @andselisk, ...with a bit more googling, though -- see "A few notes on GNU arch revision names" in http://www.enyo.de/fw/software/arch/get.html#9 – Charles Duffy Dec 10 '17 at 22:09
  • @CharlesDuffy That's a good one:) Thank you very much! – andselisk Dec 10 '17 at 22:11
  • Do you really still have software which do not handle spaces ? I find 2017-12-11T19-45 JDoe-042 UO2(NO3)2-EtOAc_dist 150.3K 1.234mM.dat nice. Otherwise =, ~, + seem good too. – user285259 Dec 11 '17 at 23:15
  • @user285259 One of the reasons I'm still afraid of spaces is that they need to be escaped with %20 to make the name web-safe, and yes, I encountered the software that couldn't tolerate spaces, though it was a long time ago. Regarding =, ~, + I feel somewhat uncomfortable as semantically they represent math operators, and I haven't seen many people using them as the filename delimiters (maybe there is an another reason for that). – andselisk Dec 12 '17 at 00:53
  • Oh, you use it on the web also. But you use a function to escape the name, right ? So you also have ( translated to %28, and ) to %29 ? =) Actually, I don't really understand the problem of escaping name, this is just translated for software, but this is not the form displayed to the user. – user285259 Dec 12 '17 at 05:01
  • @user285259 Actually, currently there is use on the web, but I just wanted to be on a safe side (what if one day...). I see you point; maybe I'm just over-complicating things, plus I'm not a computer-scientist, so my way of thinking might be lousy at times. I just remember this as a dogma that it's better not to have spaces in the filenames:) – andselisk Dec 12 '17 at 05:23

1 Answers1

10

One of the more scrutinized and second-guessed design decisions in Unix/Linux is a file system feature that is working in your favor: any character is allowed in a file/directory name except for NUL \0 (ASCII 000) and slash / (the latter being reserved for file paths).

POSIX-compliant and/or well written programs and scripts will handle such lenience but, unfortunately, there are countless examples out there that don't. However, they tend to barf on a very particular set of characters and those characters are not dots or dashes. (Spaces and newlines are two of the most troublesome.) In fact, dots and dashes are very widely used. Common tools, languages and regular expressions will handle them fine...

...with one teeeeny little exception. (Of course, right?) I don't see any indication that you plan on doing this but it should be noted: avoid dashes at the beginning of a name. This is legal, of course, but there are too many programs in existence that will handle such names improperly resulting in them being interpreted as command-line options/flags. For example, if a script passes the filename to another script like this: some-script --my-dash-first-file ... then don't be surprised to see something like Unknown option '--my-dash-first-file'.

TL;DR Your proposed schemes are safe if you avoid names that begin with dash.

Additional word of caution: Though dots by themselves are common, especially to separate a file's base name from its "extension" (e.g. foo.txt), dots in pairs are usually seen alone...where they have special meaning: the parent directory of the current directory (..) or the preceding directory in a path (/foo/bar/../baz). So while this won't cause any technical issues double-dots in a name are a bit unconventional and may cause some users to do a double-take.


B Layer
  • 5,171