7

Why does Unix allow files with a period at the end of the name? Is there any use for this?

For example:

filename.

I am asking because I have a simple function that echoes the extension of a file.

ext() {
  echo ${1##*.}
}

But knowing it will print nothing if the filename ends in a ., I wondered whether it would be more reliable to write:

ext() {
  extension=${1##*.}
  if [ -z "$extension" ]; then
    echo "$1"
  else
    echo "$extension"
  fi
}

Clearly this depends on what you are trying to accomplish, but if a . at the end of the file name were not allowed, I wouldn't have wondered anything in the first place.

j--
  • 437
  • You've got an answer from Michael. Two more notes w.r.t. your code. Given a name example.tar.gz you will strip all suffixes starting from the first dot, which will not get the single extension necessary to call the right program for processing. And if your file name has no (dot-separated) "extension(s)" your function will print the whole entered name, so you'd need an extra test to return an empty string in that case. – Janis Mar 08 '15 at 10:49
  • My comment was addressing generally names with more than one dot or with no dots at all. (If all your files are guaranteed to always have just one dot - which is an uncommon restriction if you understood Michael's answer - and if you're also not considering dot-files, i.e. file that start with a dot, you may be fine. Otherwise you should rethink the issue.) – Janis Mar 08 '15 at 11:10
  • 1
    If I want to name my files with full sentences, I would expect many names to end with .. – Paŭlo Ebermann Mar 08 '15 at 22:25

2 Answers2

28

Unix filenames are just sequences of bytes, and can contain any byte except / and NUL in any position. There is no built-in concept of an "extension" as there is in Windows and its filesystems, and so no reason not to allow filenames to end (or start) with any character that can appear in them generally — a . is no more special than an x.

Why does Unix allow files with a period at the end of the name? "A sequence of bytes" is a simple and non-exclusionary definition of a name when there's no motivating reason to count something out, which there wasn't. Making and applying a rule to exclude something specifically is more work.

Is there a use for it? If you want to make a file with that name, sure. Is there a use for a filename ending with x? I can't say I would generally make a filename with a . at the end, but both . and x are explicitly part of the portable filename character set that is required to be universally supported, and neither is special in any way, so if I had a use for it (maybe for a mechanically-generated encoding) then I could, and I could rely on it working.


As well, the special filenames . (dot) and .. (dot-dot), which refer to the current and parent directories, are mandated by POSIX, and both end with a .. Any code dealing with filenames in general needs to address those anyway.

Michael Homer
  • 76,565
  • 3
    Nitpick: "can contain any character except / and NUL" is more accurately "can contain any byte except 0x2F and 0x00" -- the difference matters when someone tries to create filenames encoded in a non-ASCII-superset encoding, which appears to work until you trip over an 0x2F or 0x00 that doesn't stand alone. (Having said that, you'd have to go pretty far out of your way to encounter this problem in practice; none of my usual choices of "awkward legacy character encoding" (Shift-JIS, Big5, and EBCDIC) can use 0x2F as part of a graphic character other than /.) – zwol Mar 08 '15 at 17:37
  • 1
    @zwol: You are of course right about the byte/character point. I have fixed that. POSIX actually mandates that "the single-byte encoding of the character is required to be the same across all locales and to not occur within a multi-byte character" and paths to be null-terminated strings, so the other case can't show up. That means that, e.g., UTF-16 is not a valid filesystem encoding on a Unix system. – Michael Homer Mar 08 '15 at 20:44
5

The real question is, why do any operating systems place significance in '.' ? There's no technical reason to do so, it's just a standard which can help you assume the file type without checking.

If you rename an MP3 file to .txt and try to open it in windows you will immediately see why that idea has drawbacks: you suddenly "can't" open the file correctly. Technically speaking, without any speed considerations and so forth the "best" way would probably be to determine the file type before deciding what to do with it, as extensions are easily fumbled and can cause issues.

The reason linux doesn't care about a period in the name is the same reason a non-computer person doesn't: there's no inherent difference between a period and any other character other than the fact that some programs happen to be coded to see that period and treat it specially.

Assuming you actually just want the extension (which is not what both of your snippets do), you could use this:

ext(){
    extension=
    [[ $1 =~ \. ]] && extension="${1##*.}"
    echo "$1 -> ${extension:-No extension}"
}

ext something.    # something. -> No extension
ext something.txt # something.txt -> txt
ext something     # something -> No extension
ext som.thing.mp3 # som.thing.mp3 -> mp3
ext .whatever     # .whatever -> whatever

*Note that last one.

If you actually want to return the file name itself when there is no extension, like your code does, there's no reason to use the long, SH style second snippet you have. You've written:

ext() {
  extension=${1##*.}
  if [ -z "$extension" ]; then
    echo "$1"
  else
    echo "$extension"
  fi
}

Which is actually just:

ext(){
 extension="${1##*.}"
 # This line is what your first snippet is doing: 
 # echo "$extension"
 # This line is what your second snippet is doing:
 [[ $extension ]] && echo "$extension" || echo "$1"
}

Which is actually just:

# First snippet
ext(){
 echo "${1##*.}"
}

# Second snippet
ext(){
 extension="${1##*.}"
 echo "${extension:-$1}"
}

You can't take for granted anything that users can input basically.If you want to see what kind of file it actually is, try the file command. Because parsing file names to try to figure out the file type is not the only way to skin that cat. You can even have a filename in linux called simply: \

Nate
  • 91
  • 4
    What is the rationale behind using a \ at the end or a $ ? There needn't be any, as they are valid characters. You're selecting "." specifically as if there's any actual difference. There isn't. – Nate Mar 08 '15 at 11:45
  • Correct, but the code echo ${1##*.} only cares about periods and if periods at the end of a filename were not allowed, I wouldn't have to think about the special case of a period ending a file name. – j-- Mar 08 '15 at 11:52
  • 4
    @JorgeBucaran And if your code was splitting a file name on and other arbitrary character, you'd have the same problem if it were at the end; no? You are just picking on .. – Boris the Spider Mar 08 '15 at 17:23
  • @BoristheSpider He's "picking on ." because using this as the separator for an extension is a common convention, and many scripts are written on this assumption. – Barmar Mar 11 '15 at 19:18