26

I understand that "Everything is a file" is one of the major concepts of Unix, but sockets use different APIs that are provided by the kernel (like socket, sendto, recv, etc.), not like normal file system interfaces.

How does this "Everything is a file" apply here?

4 Answers4

28

sockets use different APIs

That's not entirely true. There are some additional functions for use with sockets, but you can use, e.g., normal read() and write() on a socket fd.

how does this "Everything is a file" apply here?

In the sense that a file descriptor is involved.

If your definition of "file" is a discrete sequence of bytes stored in a filesystem, then not everything is a file. However, if your definition of file is more handle like -- a conduit for information, i.e., an I/O connection -- then "everything is a file" starts to make more sense. These things inevitably do involve sequences of bytes, but where they come from or go to may differ contextually.

It's not really intended literally, however. A daemon is not a file, a daemon is a process; but if you are doing IPC your method of relating to another process might well be mitigated by file style entities.

goldilocks
  • 87,661
  • 30
  • 204
  • 262
  • 5
    I would say an accurate rephrasing of "everything is a file" should be "all interfaces are through files". You interact with processes through files (stdin/out/err, /proc/$pid, etc). You interact with the network through files (sockets / file descriptors). You interact with the mouse through a file (/dev/mouse). – phemmer Nov 07 '14 at 18:20
  • I once cloned a socket handle by opening it out of /proc. – Joshua Nov 07 '14 at 23:16
13

"Everything is a file" is just an overstatement. It was novel in 1970s and it was a primary distinguishing characteristic of UNIX. But it's just a marketing concept, not a real foundation of UNIX, because it's obviously not true. It's not beneficial or sensible to treat EVERYTHING as a file.

Is CPU a file? Does your program read() a CPU to get a new instruction? Is RAM a file? Does your program read() the next byte?

Back then, there were kinds of OS that gave you one API for a floppy disk and a different API for a hard disk, a different API for magnetic tape, and a bunch of different APIs for different terminals and so on. IBM mainframe systems had different types of files on hard disks and gave you a different API to each one of them, believe it or not! So UNIX "it is a file" approach, together with "stdin/stdout/stderr" approach, brought a very elegant abstraction to both users and programmers.

With the network, this particular abstraction just didn't work out. And there's no harm, just slightly less overall elegance and coherence of the OS. But it works. Do you see a file called /dev/myinternetz/www/google/com/tcp/80 anywhere on your system today? Can you open() it, write() a query, and read() the answer in nice HTML? No? This is because this "is a file" abstraction was not very handy for interacting around the network. It wouldn't work too good in practice. Law of leaky abstractions in action.

kubanczyk
  • 1,258
  • 9
  • 19
  • 10
    Fun fact: some versions of bash will allow you to open /dev/tcp/www.google.com/80. It's not an actual file though - bash is just faking it. – user253751 Nov 07 '14 at 22:51
  • 2
    @immibs: More the point, it would be reasonably possible to make a filesystem that actually implements that. – Joshua Nov 07 '14 at 23:16
  • I suppose you could read /dev/mem or /dev/kmem if you wanted. – Jason C Nov 08 '14 at 04:04
  • 4
    Note that plan 9 takes this further and indeed, network protocols are addressed through a pseudo filesystem to the effect of your /dev/myinternetz/www/google/com/tcp/80 example (with a different path of course). In addition, physical ram does actually work very much like a file, you mmap ram into your virtual address space just as you mmap a file into it. (malloc is implemented upon this idea). – Vality Nov 08 '14 at 04:53
  • 1
    Plan 9 taking "everything is a file" to the extreme in addition with "everything is network-transparent" has some pretty powerful implications. For example, there is no need for NAT, you can simply mount your router's TCP/IP stack (which is just network-transparent file(system)) on your local machine and send packets directly from your router. – Jörg W Mittag Nov 08 '14 at 11:38
  • +1 The concept of using the file system interface is great, but UNIX and the rest POSIX systems do it wrong. Plan 9 is better in this aspect because almost everything is handled using text files. That's an important distinction. Having a binary file that can be handled only using special (and ugly) system calls is not very beneficial. – sakisk Nov 13 '14 at 20:43
9

Sockets are files. You can use read and write on a socket: they're equivalent to calling recv and send with flags=0. You close them with close. You can move them around with dup and friends if you need to shuffle file descriptors. You can set some flags with fcntl, and use stdio buffering after calling fdopen. The list goes on. Very importantly, you can call select and poll on any type of file, including sockets, so these functions allow a program to block until it receives input via any means simply by listing file descriptors.

There are extra system calls for some socket types (recv and send, shutdown, etc.), like there is an extra system call for devices (ioctl).

Not all files have names, and of those that do, they don't always live in the directory structure. Pipes created by pipe (e.g. in a shell pipeline) and sockets created by socketpair don't have names, but they're still files. Sockets created by socket have a name whose syntax depends on the domain. This name is passed in a struct sockaddr to bind and other functions. For a Unix (AF_UNIX) socket, the name is a struct sockaddr_un, which is a family and a string; depending on the string, this can be a file name (named sockets can be created with mknod on many unix variants) or not (the abstract namespace). For an IPv4 (AF_INET) socket, the name is a struct sockaddr_in, containing a port number and IP address, plus the protocol from the socket call.

8

If you stat a socket, you will see that it has an inode number and other characteristics of regular files, so I would classify it as a file on the filesystem. Example:

# file live
live: socket
# stat live
File: `live'
  Size: 0               Blocks: 0          IO Block: 4096   socket
Device: fc03h/64515d    Inode: 198817      Links: 1
Access: (0660/srw-rw----)  Uid: (23129/  icinga)   Gid: (23130/icinga-cmd)
Access: 2014-11-07 09:27:59.000000000 -0800
Modify: 2014-11-05 09:27:03.000000000 -0800
Change: 2014-11-05 09:27:03.000000000 -0800

11/17. Additional information for Linux (ext3): A socket has an inode (which is a 256-byte block on the disk) but does not have any data blocks (you can verify this by extracting the inode and examining the data block pointers; or by running debugfs 'stat' which shows a Blockcount of 0). So, it has file metadata (owner, group, permissions, etc) but no data content on disk. This is identical to a regular empty file (touch /tmp/foo) which also has a blockcount of 0. In the first case, the "type" field in the inode shows "socket"; in second case it shows "regular file."

References: ext2 inode structure ; stat, dumpe2fs, and debugfs commands.

Michael Martinez
  • 982
  • 7
  • 12