0

I have been bothered for a long while by some confusions among

  • internet domain socket provided by Linux,
  • transport protocols (TCP/UDP)'s socket and
  • transport protocols (TCP/UDP)'s port.

Replies on some related posts on SO have lots of ambiguities and inconsistencies and make my confusions even more.

  1. Both Linux and transport protocols (TCP/UDP) have concepts "socket". How do the two concepts differ? Is internet domain socket (represented as a file?) provided by Linux a (faithful) implementation of socket in transport protocols (TCP/UDP)? (I guess yes, and if that is true, we can interchangeably use the two terms.)

  2. Conceptually, is it correct to think of a port in a transport protocol (TCP/UDP) as a tuple (IP address, transport protocol, port number) or just port number? (I guess a port is a tuple (IP address, transport protocol, port number), because I have been educated several times that the same port number with a different IP address or a different transport protocol represents a different port. In that sense, port and socket (in transport protocols) seem to be an identical concept.) It seems the established name "port" means "port number" only, and I will explicitly use "port number" in the following to avoid unnecessary confusions.

  3. What are the relations between socket (in transport protocols) and tuple (IP address, transport protocol, port number)? Is there a bijective mapping between the set of sockets and the set of tuples (IP address, transport protocol, port number)? Must there be one or more sockets for each tuple (IP address, transport protocol, port number), and must there be one or more tuples (IP address, transport protocol, port number) for each socket? Can two sockets share the same tuple (IP address, transport protocol, port number)? Can two tuples (IP address, transport protocol, port number) share the same socket?

  4. I heard that two processes can share the same socket (which I understand it in the way that two processes can share a file, assuming Linux's internet domain socket and transport protocols (TCP/UDP)'s socket can be used interchangeably). Can two processes share the same tuple (IP address, transport protocol, port number)?

  5. I heard that two connections can't share the same socket (assuming Linux's internet domain socket and transport protocols (TCP/UDP)'s socket can be used interchangeably). Can two connections share the same tuple (IP address, transport protocol, port number)?

Thanks.

Tim
  • 101,790
  • 1
    You likely mean UNIX domain socket and not internet domain socket when you talk about something presented as file. Internet domain sockets are UDP and TCP and are not represented as file. – Steffen Ullrich Feb 14 '19 at 21:27
  • @SteffenUllrich I avoid the discussion of Unix domain socket here, since its purpose is different from transport protocols' socket and port. Is an internet domain socket not represented as a file, but can be opened like a file and be represented by file descriptor? – Tim Feb 14 '19 at 21:30
  • 1
    The purpose of UNIX domain sockets and internet domain sockets are similar as is the API. Internet domain sockets are not represented as a file on the file system. All sockets are file descriptors (which is different from file), similar to regular files and also named and anonymous pipes. This is a basic philosophy in UNIX. But a socket cannot be opened like a file, it gets instead created, bound to a local address (maybe implicitly), followed by listen+accept or connect for TCP sockets. – Steffen Ullrich Feb 14 '19 at 21:32
  • 1
    not a file, except when it is cat </dev/tcp/towel.blinkenlights.nl/23. (Not a real file, just some bash syntax, but you can have file-systems that do this). – ctrl-alt-delor Feb 14 '19 at 22:23
  • It's rather meaningless to say a thing is file or not, most things can be exposed as file description in linux but lots of them are not store-on-disk filesystem objects. It's better to think open file description as kernel objects, which further represent something. – 炸鱼薯条德里克 Feb 15 '19 at 01:14
  • Linux's internet domain socket and transport protocols (TCP/UDP)'s socket can't be used interchangeably. Because Linux kernel supports lots of different protocols on different layer, not just TCP and UDP. Like IP raw sockets or SCTP sockets. A process don't have local address, like a process doesn't have owner/ACL/extended_attr(file have them), sockets have such property, of course, raw sockets obviously doesn't have port(the corresponding protocol doesn't have such concept), or even IP address, in case of AF_PACKET. – 炸鱼薯条德里克 Feb 15 '19 at 02:16
  • @ctrl-alt-delor can you explain what you mean the "except" case? also see https://unix.stackexchange.com/questions/492742/does-linux-kernel-create-a-file-for-an-internet-domain-socket – Tim Feb 15 '19 at 13:35
  • @炸 "most things can be exposed as file description in linux" do you mean an open file descriptor is exposed as a file under /proc/<pid>/fd/? If that is true, isn't that contrary to that an internet domain socket doesn't have a file, while a Unix domain socket does? See https://unix.stackexchange.com/questions/492742/does-linux-kernel-create-a-file-for-an-internet-domain-socket – Tim Feb 15 '19 at 13:37
  • Read AUPE, open file description vs file descriptor. open file description is a much better terminology than file, you can't define "file". Network socket and Unix domain socket are all open file description. UDS might or might associate something on the disk(there's a lot of condition can affect this). NS never associate anything on disk. – 炸鱼薯条德里克 Feb 15 '19 at 14:17
  • People may use the word "file" because we know exactly what we're talking about, but you don't, so you should not use that word. – 炸鱼薯条德里克 Feb 15 '19 at 14:19
  • @炸 Is /proc/<pid>/fd/123 not a file? – Tim Feb 15 '19 at 14:20
  • Define "file". I know that's a pathname, its existence indicates something. – 炸鱼薯条德里克 Feb 15 '19 at 14:23
  • File has two meanings: 1) Something with a name in a directory. 2) an ordinary file, a file of type f, that is not a directory, symlink, device, pipe, unix domain socket … In the everything is a file model, we have everything once opened has a file descriptor, we also can have everything has a file name, for example (/dev/tcp/towel.blinkenlights.nl/23). Most Unixes only follow the first, but are increasingly adding to the second. We now have /proc, and other magic file-systems. – ctrl-alt-delor Feb 15 '19 at 15:24
  • @炸 This one is for you https://unix.stackexchange.com/questions/500907/is-a-file-characterized-by-having-a-inode – Tim Feb 15 '19 at 16:01

2 Answers2

4
  1. Sockets are an operating system API. This API lets applications on same or different systems communicate over the TCP and UDP (and other) protocols. UNIX domain sockets (not internet domain sockets as you write) provide similar functionality for communicating with applications on the same system only. The concepts for both are similar: the API provides ways to create a socket, bind, listen+accept and connect a socket, to read and write on it and to shut it down. Regarding read and write they match other file descriptors which relate to regular files, named pipes, anonymous pipes etc but the creating of the file descriptor is different and there are some more operations on the file descriptor compared to for example regular files.
  2. A port number in TCP and UDP is an integer between 1 and 65535. The word "port" is used as short for "port number". The tuple of IP address and port number and protocol describes the endpoint address. Calling it port instead will cause confusion when reading other literature.
  3. An unconnected (but already bound) socket represents only a single endpoint (ip,port,protocol). A connected socket represents a local endpoint and another (local or remote) endpoint, i.e. a connection. One cannot have multiple in-kernel sockets for the same connection but one can have multiple file descriptors for the same in-kernel socket. One can have the same endpoint in multiple connected sockets but not for the same connection, i.e. the other endpoint of the connection must be different. One can actually have multiple unconnected sockets representing the same endpoint but this is very unusual.
  4. Sockets can be shared between processes since sockets are file descriptors and file descriptors can be shared. Sharing is typically done by forking, i.e. the parent opens some file or socket and the child inherits it. But there are also ways to send a file descriptor/socket from one process to another. Sharing means that both can write and read but no data will be duplicated, i.e. if the parent reads some data these data are taken from the socket and cannot be also read by the child. But it is not possible that one process creates a new socket (instead of sharing an existing one) which represents exactly the same connection as an existing socket on the same system.
  5. Two sockets/connections can share the same port on one endpoint but they cannot share both endpoints, i.e. at least one of source IP, source port, destination IP, destination port or protocol needs to be different.
  • (1) "A port is no such tuple, a connection is". I suspect you think of a tuple consisting of both sides. Pay attention to the tuple in my post. It is completely on one side. So uses of "tuple" in your reply all diverge from those in my post. (2) also you can drop Unix domain socket, because that will make your explanation more focused and clear. – Tim Feb 14 '19 at 21:34
  • @Tim: a port is an integer not a tuple. What you call port here is instead an address of an endpoint (ip and port). Please don't use established names with different meanings. – Steffen Ullrich Feb 14 '19 at 21:37
  • If a port is just an integer, how can a port number with a different IP address or a different transport protocol represent a different port (or something you may call differently)? (I have strong doubt on the clarity of the established name.) – Tim Feb 14 '19 at 21:38
  • "a different thing (port)?" - the different thing is the address of the endpoint which consists of IP address and port number. Different IP address with same port number is a different endpoint address. Think of IP address as a street and the port number being the number of the house in the street - different streets can use the same house numbers. – Steffen Ullrich Feb 14 '19 at 21:41
  • 1
    "port" is like a house, identified by city, street name and house number, and "port number" is like the house number of a house. I hope this will help you understand what I mean in my post. What a name refers to is more important than what name we call it. I am not saying the names I call them are better, but just clearer to me. – Tim Feb 14 '19 at 21:43
  • Again, please don't use established names with different meanings. It makes everything here confusing and you get confused when you read something somewhere else which follows established meanings and not your meanings. "port" is used short for "port number" when talking about TCP and UDP sockets. – Steffen Ullrich Feb 14 '19 at 21:45
  • Okay, I have changed my post to avoid using "port". I use "tuple (IP address, transport protocol, port number)" instead. Hope you can update your reply if it clarifies any misunderstanding. – Tim Feb 14 '19 at 22:03
  • @Tim: I've reworked the answer to match your question. – Steffen Ullrich Feb 14 '19 at 22:25
  • Thanks. I'd appreciate if you could also consider https://unix.stackexchange.com/questions/504157/how-many-connections-can-there-be-between-two-unix-domain-sockets – Tim Mar 03 '19 at 21:53
1

"port" is like a house, identified by city, street name and house number, and "port number" is like the house number of a house. I hope this will help you understand what I mean in my post.

I see. I agree, this is a useful concept to think about.

When we need to talk about some exact technical detail in the existing systems, the "port number" concept is easier to define. We can refer to the value of the port field in a TCP packet, or the sin_port field of struct sockaddr_in used with the UNIX socket API. In this type of discussion, we can let the reader look up the full story of how TCP packets are used (perhaps in the original RFC :-). Or how the socket API functions are used in a program, perhaps by looking at the man pages.

It is natural for this concept to become abbreviated. We can naturally say "port 80" instead of "port number 80".

The original TCP standard, RFC 793, talks about the concepts of "port" and "port number" as distinct things. (The introduction also uses "port identifier" to mean the same as "port number").

The man pages on current Linux, for example, are not so careful to make this distinction. man 7 ip frequently uses "port" as an abbreviation for "port number".

The Linux man pages are a very prominent document that programmers refer to, so the term "port" becomes ambiguous. If you are worried that you might be mis-interpreted, talking about the "port number" + IP address is a perfectly good idea.

5. I heard that two connections can't share the same socket (assuming Linux's internet domain socket and transport protocols (TCP/UDP)'s socket can be used interchangeably). Can two connections share the same tuple (IP address, transport protocol, port number)?

A single listening port can receive connections from multiple different source ports.

In Linux TCP programming, the listener gets one socket for each such connection (and is listening on a socket which is not connected).

The socket API doesn't give you any way to do the opposite, i.e. make several outgoing connections using the same source port. The transport protocol standard might technically allow it; I am not sure.

Notice that if you could do this, you would not be able to make multiple connections to the same target port. There would be no way for the target system to distinguish the two connections. I guess they felt this specific limitation was too weird to deal with. Then the broader limitation gets enshrined in the original APIs, including UNIX sockets. And later on, if anyone tries to do it, they will risk finding corner cases somewhere which have never actually been tested :-).

The UDP transport protocol doesn't have any concept of connections, so the question does not apply to it. You can call connect() on a UDP socket if you like, but it is only for convenience.


If you write "socket" in a modern document, people will interpret it as referring to the socket concept in the UNIX socket API. The definition of "socket" in RFC 793 is different. That definition, below, is not used any more.

To allow for many processes within a single Host to use TCP communication facilities simultaneously, the TCP provides a set of addresses or ports within each host. Concatenated with the network and host addresses from the internet communication layer, this forms a socket. A pair of sockets uniquely identifies each connection. That is, a socket may be simultaneously used in multiple connections.

sourcejedi
  • 50,249
  • Thanks. "The socket API doesn't give you any way to do the opposite, i.e. make several outgoing connections using the same source port", do you mean the socket API doesn't give you any way to make several outgoing connections using the same source socket? "The transport protocol standard might technically allow it; I am not sure", do you mean the transport protocol standard might allow several outgoing connections using the same source port? – Tim Feb 15 '19 at 16:57
  • @Tim Yes, I think I agree with both alternative phrasings. – sourcejedi Feb 15 '19 at 17:03
  • @Tim My main answer sounds a bit patronizing, sorry. E.g. I'm guessing you already got the "socket (in transport protocols)" definition from the RFC I quoted, directly or indirectly. It's so alien to how I normally see the word used though, if I just referred to "socket (in transport protocols)" I'm not sure anyone other reader would have known that meaning :-). I might find a way to edit my answer better some time. – sourcejedi Feb 15 '19 at 17:08
  • Is it correct that in Linux, two connections can't share a socket, while in TCP protocol (as in RFC 793 ), two connections can share a socket? In Linux the mapping from TCP sockets to (IP address, port number)'s is many to one, while in TCP protocol, the mapping is one to one (i.e. injective)? – Tim Feb 15 '19 at 18:37
  • @Tim Yes, that matches the definitions I see. With the disclaimer that if you don't say "(as in RFC 793)" and include a link, a large number of readers will be very confused by the TCP protocol part :-). – sourcejedi Feb 15 '19 at 18:50
  • Is RFC 793 up-to-date? In the most recent RFC for TCP, how is socket defined? – Tim Feb 15 '19 at 19:44
  • @Tim RFC 793 from 1981 is the most recent RFC that defines TCP. There is no replacement RFC. There are some amendments, similar to amended legislation :-). "A number of details in RFC 793 were corrected, modified, or clarified in RFC 1122. Familiarity with RFC 1122 and more recent TCP documents is imperative before any implementation of RFC 793 is attempted." There are a few more "Status: Verified" errata, that you could check through if you had a specific point that seemed strange. – sourcejedi Feb 15 '19 at 19:56
  • @Tim I don't know how relatively important the other RFCs are, that are listed in the "updated by" header. I do know there's at least one more topic you should know about if you want to implement TCP on the open internet, and be robust against known attacks. And if you want to transmit at modern rates over internet distances (high latency), you need a better congestion control strategy than the original TCP. Linux servers use TCP Cubic for this by default. – sourcejedi Feb 15 '19 at 20:11
  • Thanks. (1) Can I say that TCP's concept "socket" is the same as tuple (IP address, transport protocol, port number)? Sorry if you realize that I ask the same question again. (2) In the same topic as part 4 in my post, can two processes share the same TCP's socket? can two processes share the same Linux's socket? – Tim Feb 15 '19 at 21:30
  • Same answer again: yes. Same disclaimer: if you say that to anyone without linking RFC 793 they are likely not to understand what you mean. (Including if you say it to me after I've forgotten about this again :-). 2) Two processes can share the same Linux socket, because a Linux socket is an open file (POSIX "file description"), and Linux has two generic mechanisms to pass an any open file to another process. The most common one is that when you call fork(), the child process inherits all the open files from the parent process :-).
  • – sourcejedi Feb 15 '19 at 21:46
  • Thanks. (1) given that two processes can share a Linux's TCP socket, can each process create a connection on the socket, without knowing the other process also doing so, and therefore there are two connections on the same Linux's TCP socket? (2) can two processes share the same TCP's socket? – Tim Feb 15 '19 at 22:33
  • @Tim (2) two processes can have the same Linux TCP socket. That will have the same (IP address, transport protocol, port number) in both processes. Perhaps it depends what you mean by "share" though. E.g. if you run ncat -l localhost 31337 -e /bin/sh and connect to it with ncat localhost 31337, you are running a shell on a TCP socket (without job control), as usual you can run multiple background commands which "share" stdout, but if they write to it at the same time all the output is going to be mixed up :-). Maybe that is not "sharing nicely" :-). – sourcejedi Feb 15 '19 at 22:35
  • @Tim (1) no, you cannot connect() a Linux TCP socket more than once, even if the socket is shared between two processes. Errno EISCONN. – sourcejedi Feb 15 '19 at 22:37
  • Thanks. I'd appreciate if you could also consider https://unix.stackexchange.com/questions/504157/how-many-connections-can-there-be-between-two-unix-domain-sockets – Tim Mar 03 '19 at 21:53
  • Is SO_REUSEADDR Socket Option related to the two different definitions of socket? If you are willing, also see https://stackoverflow.com/questions/54976419/setting-so-reuseaddr-to-be-1-doesnt-allow-me-to-run-a-server-in-two-processes and https://stackoverflow.com/questions/54976846/does-reuseaddr-allow-to-reuse-the-port-in-this-scenario – Tim Mar 04 '19 at 12:00