4

There are programs such as hexedit that allow one to edit each bit of a file, and as I keep hearing that in Unix "everything is a file", I tried to edit some directory, expecting bytes referring to inodes or filenames. Instead, I get:

hexedit: teste: not a file

However, I was happy in finding that it did work with many dev files (partitions, entire hard drives, fake components (such as /dev/null), you name it).

And for soft links, hexedit follows them, and the error is the same as the destination's, though with the name of the link instead, in my test:

hexedit: testeln: not a file

Even if the original file is removed and the link invalid, the behaviour is the same. But this isn't all.

If one copies symbolic links without a special option to preserve (or use rsync's options), they become regular files, and fortunately, I had an old link like that:

00000000   49 6E 74 78  4C 4E 4B 01  2F 00 73 00  72 00 76 00  2F 00 73 00  61 00 6D 00  62 00 61 00  2F 00 73 00  IntxLNK./.s.r.v./.s.a.m.b.a./.s.
00000020   68 00 61 00  72 00 65 00                                                                                h.a.r.e.

In the case of /dev/initctl, loop-control, snapshot and tty2 (even if it's open and I have read/write privileges), hexedit gives (after clearing the screen, in the last line):

the long seek failed (-1 instead of 0), leaving :(

For /dev/log:

hexedit: log: No such device or address

Can I see directories in such a way? Why are links automatically followed? How can I change that behaviour? Why does hexedit behave like that just (AFAIK) for symlinks and directories? I suspect some weird files I found in /dev are only seemingly empty, even if I use sudo.

slm
  • 369,824
JMCF125
  • 1,052

3 Answers3

3

POSIX requires this behaviour for files of certain types on the system call level. The open function follows symbolic links, except if the O_NOFOLLOW flag is set – which hexedit most probably does not do.

For directories, the read function shall return an error code and set errno to EISDIR. From http://pubs.opengroup.org/onlinepubs/9699919799/functions/read.html:

The fildes argument refers to a directory and the implementation does not allow the directory to be read using read() or pread(). The readdir() function should be used instead.

As @goldilocks suggested, there are 7 different file types at least, each with different semantics:

  • regular files
  • symbolic links
  • directories
  • block devices
  • character devices
  • FIFOs and
  • sockets
Frank
  • 348
  • Can I change that flag for certain programs? – JMCF125 Jan 11 '14 at 20:58
  • POSIX doesn't require EISDIR upon a read on a directory, Unix does (see the XSI flag in the link you posted). – Stéphane Chazelas Jan 11 '14 at 21:23
  • I've tried O_NOFOLLOW=1 hexedit somelink and export O_NOFOLLOW=1, but it doesn't work. Are you sure about that flag? – JMCF125 Feb 09 '14 at 21:05
  • @JMCF125 O_NOFOLLOW is not an environment variable but the symbolic name of a flag for the C function open. With strace you can see what a program is doing on the system level. Try strace hexedit . for yourself! – Frank Feb 14 '14 at 16:13
  • I've heard env-vars called flags before, so I assumed that was the case. – JMCF125 Feb 14 '14 at 18:35
2

You are partly right in your expectations. "Everything is a file" really means that even things that are not files, like screens, microphones etc., get a place on the filesystem tree and can be accessed using some of the same system calls as regular files (open, write, etc.). Many of your examples are not really very much like files. /dev/null is not an empty file, for example, but a very special "device driver", a small bit of kernel code that accepts read and write requests (returning EOF for the first, and ignoring the second). So it is not actually a file, but behaves like one. Think also of the mount point for a disk: It is not a file in any sense, it is a hook that gives you access to the whole file hierarchy on that disk.

A directory, on the other hand, is indeed a "block of bytes" as @goldilocks put it: It takes up space on the disk, and it contains the data structure that maps file names to inodes. But as you have discovered, it does not behave like plain files. You're getting error messages because directories are too important to leave vulnerable to ordinary user commands. Imagine if you accidentally wrote

grep foo bar > subdir

where you meant to write

grep foo bar > subdir/newfile

... your directory subdir would be hosed, and with it the consistency of your filesystem because all of its files would suddenly become orphan inodes. So, directories are protected in all sorts of ways. You're also not allowed to make a hard link to a directory, because multiple links to directories mean that the filesystem is no longer a tree. In the old days (decades ago) root used to be able to do some of that, and there were all sorts of "directory editors" floating around. Not sure to what extent this is still possible.

alexis
  • 5,759
  • +1, awesome answer! But can there be an hex editor of directories that checks for the consistency of a directory before saving, warning the user about data lost when necessary? Also, what about soft links? The worst thing that can happen is for them to get broken, and that is not that of a problem. Why can't they be edited? BTW, just found one of those directory editors called vidir. – JMCF125 Jan 11 '14 at 20:50
  • A directory editor would need to "understand" the data structure of the directory, yes. But today's filesystems are robust enough that there's not too much use for editors. Also, as I said the intent of the errors is to guard against accidental corruption by regular programs. – alexis Jan 11 '14 at 20:54
  • 1
    About symbolic links, I don't know the reasoning to be honest. But their semantics are weird enough that it might just be a byproduct of how they work. ls is a regular userland program and it can fetch the content of the symlink if you ask it to, so the OS allows it. – alexis Jan 11 '14 at 20:57
1

Some unix variants (including the original Unix, but not Linux) allow opening a directory like a regular file and reading from it. The content of the directory depends on the filesystem format, which makes that of very limited usefulness.

Writing to a directory would be very dangerous. I think the original Unix allowed it to programs running as root, but modern unices don't. If you could write to a directory, you could render the filesystem invalid, introduce loops, refer to blocks that don't exist, create setuid executables by changing their owner and permissions, and so on. This is why only code running inside the kernel is allowed to make up directories.

Symbolic links are a different case. Opening and reading a symbolic links opens the target file because that's what symbolic links are designed for: they are transparent to anything that operates on file content (such as an editor). It would be possible to have an interface where reading the content of a symbolic link uses open (with a special flag), read and close, but that would provide little benefit over the single readlink call. Writing a symbolic link with write calls would be more problematic, as they would have to make the storage allocated to the target grow; on the application side that would lose the benefit of symlink being atomic.