12

It seems that the file name length limitation is 255 "characters" on Windows (NTFS), but 255 "bytes" on Linux (ext4, BTRFS). I am not sure what text encoding those file systems use for file names, but if it is UTF-8, one Asian character, such as Japanese, could take 3 or more bytes. So, for English, 255 bytes means 255 characters, but for Japanese, 255 bytes could mean a lot less characters, and this limitation could be problematic in some cases.

Other than practically impossible method for a general user like modifying Linux file system/kernel etc, is there any practical way to increase the limitation so that I could have guaranteed 255-character file name capacity for Asian characters on Linux?

2 Answers2

10

TL/DR: there's a way but unless you're a kernel hacker/know C very well, there's no way.


Detailed answer:

While glibc defines #define FILENAME_MAX 4096 on Linux which limits path length to 4096 bytes there's a hard 255 bytes limit in Linux VFS which all filesystems must conform to. The said limit is defined in /usr/include/linux/limits.h:

/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
#ifndef _LINUX_LIMITS_H
#define _LINUX_LIMITS_H

#define NR_OPEN 1024

#define NGROUPS_MAX 65536 /* supplemental group IDs are available / #define ARG_MAX 131072 / # bytes of args + environ for exec() / #define LINK_MAX 127 / # links a file may have / #define MAX_CANON 255 / size of the canonical input queue / #define MAX_INPUT 255 / size of the type-ahead buffer / #define NAME_MAX 255 / # chars in a file name / #define PATH_MAX 4096 / # chars in a path name including nul / #define PIPE_BUF 4096 / # bytes in atomic write to a pipe / #define XATTR_NAME_MAX 255 / # chars in an extended attribute name / #define XATTR_SIZE_MAX 65536 / size of an extended attribute value (64k) / #define XATTR_LIST_MAX 65536 / size of extended attribute namelist (64k) */

#define RTSIG_MAX 32

#endif

And here's a piece of code from linux/fs/libfs.c which will throw an error in case you dare use a filename length longer than 255 chars:


/*
 * Lookup the data. This is trivial - if the dentry didn't already
 * exist, we know it is negative.  Set d_op to delete negative dentries.
 */
struct dentry *simple_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
{
    if (dentry->d_name.len > NAME_MAX)
        return ERR_PTR(-ENAMETOOLONG);
    if (!dentry->d_sb->s_d_op)
        d_set_d_op(dentry, &simple_dentry_operations);
    d_add(dentry, NULL);
    return NULL;
}

So, not only you'll have to redefine this limit, you'll have to rewrite filesystems source code (and disk structure) to be able to use it. And then outside of your device, you won't be able to mount such a filesystem unless you use its extensions to store very long filenames (like FAT32 does).

AdminBee
  • 22,803
  • 1
    I guess that the 255 byte limitation was defined when all Linux users were Europeans, thinking "It won't be necessary to have a file name longer than 255 characters". Now that Linux is used around the world, is there any future plan to increase the file name byte length limit to allow 255 Asian characters? Or would that even technically be possible (in regard to backward compatibility, etc)? – Damn Vegetables Nov 13 '20 at 20:57
  • 1
    Unix was developed by the Bell labs in the US when no one even thought about alphabets other than Latin. Then came locales and later UTF-8/16 and suddenly it turned out that 255 bytes translated into quite few national characters. The way it looks like this limitation won't be lifted in my lifetime. Probably we'll need POSIX 2.0 to lift the limit and rewrite quite a lot of code. – Artem S. Tashkinov Nov 13 '20 at 21:10
  • 1
    Ah, Unix and Bell lab... So, it was a limitation from the 1970, eh? Since I know that Linux was created in the 1990's, I thought this length was decided in the 1990's. If it was 1970's, no wonder they did not consider internationalisation, back then computers barely can use Asian languages. NTFS seems to be created in 1993, much later than that, and its file name length is "255 UTF-16 code units". The designers must have expected Unicode and internationalisation. It is unfortunate that this limitation from 1970's won't go away, but oh well. Thanks for the information. – Damn Vegetables Nov 14 '20 at 04:15
  • 2
    @DamnVegetables, back when the 255-byte limitation was set, different countries tended to use their own character encodings, so it could just as easily hold 255 ASCII characters or 255 JIS X 0201 kana. (And back then, a 255-byte filename was considered longer than any reasonable person would need, regardless of encoding -- most people had to deal with things like DOS's 8.3 limit, while the Mac's 31-character limit was seen as luxurious.) – Mark Nov 17 '20 at 00:19
  • 1
  • 2
    @DamnVegetables Bell Systems Unix had a 14 character filename limit. – RonJohn Mar 20 '23 at 11:54
9

In many cases, the 255-byte limit is baked into the on-disk format; see for example Ext4 which only provides 8 bits to encode the name length. Thus, even if you could work around the kernel APIs’ limits, you wouldn’t be able to store anything longer than 255 bytes anyway.

You would therefore have to come up with a name storage extension (for example, VFAT-style using multiple directory entries to store names which are too long, or 4DOS-style using a separate file to store the long names), and then you’re effectively creating a new file system...

Stephen Kitt
  • 434,908