3

What my application does is read data from kafka and hit another service via HTTP. I was seeing outgoing traffic slower in one box compare to others. I analysed tcpdump to that outgoing IP, logs from this box:

09:24:20.625288 IP (tos 0x0, ttl 64, id 16107, offset 0, flags [DF], proto TCP (6), length 7292)
    localIP.57854 > externalIp.http: Flags [.], cksum 0x03fb (incorrect -> 0x614e), seq 52963:60203, ack 464, win 2518, options [nop,nop,TS val 205440553 ecr 262205407], length 7240: HTTP
09:24:20.640851 IP (tos 0x0, ttl 64, id 16112, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0xb2a7), seq 60203:63099, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.640897 IP (tos 0x0, ttl 64, id 16114, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0x46c8), seq 63099:65995, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.640930 IP (tos 0x0, ttl 64, id 16116, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0xedd7), seq 65995:68891, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.640940 IP (tos 0x0, ttl 64, id 16118, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0x22df), seq 68891:71787, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.640973 IP (tos 0x0, ttl 64, id 16120, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0x7fad), seq 71787:74683, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.641016 IP (tos 0x0, ttl 64, id 16122, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0x19e9), seq 74683:77579, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.641027 IP (tos 0x0, ttl 64, id 16124, offset 0, flags [DF], proto TCP (6), length 2948)
    localIP.57854 > externalIp.http: Flags [.], cksum 0xf302 (incorrect -> 0xc26d), seq 77579:80475, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205422], length 2896: HTTP
09:24:20.644138 IP (tos 0x0, ttl 64, id 16132, offset 0, flags [DF], proto TCP (6), length 2223)
    localIP.57854 > externalIp.http: Flags [P.], cksum 0xf02d (incorrect -> 0x6078), seq 89163:91334, ack 464, win 2518, options [nop,nop,TS val 205440557 ecr 262205425], length 2171: HTTP
09:24:20.660631 IP (tos 0x0, ttl 64, id 16134, offset 0, flags [DF], proto TCP (6), length 775)
    localIP.57854 > externalIp.http: Flags [P.], cksum 0xea85 (incorrect -> 0x14c9), seq 90611:91334, ack 464, win 2518, options [nop,nop,TS val 205440562 ecr 262205426], length 723: HTTP

While at the same time in other box I see following:

09:26:53.610483 IP (tos 0x0, ttl 64, id 27441, offset 0, flags [DF], proto TCP (6), length 14532)
    localIP.50978 > externalIp.http: Flags [.], cksum 0xcb4f (incorrect -> 0x3b5c), seq 151537:166017, ack 1390, win 1444, options [nop,nop,TS val 1613152807 ecr 262243666], length 14480: HTTP
09:26:53.610609 IP (tos 0x0, ttl 64, id 27451, offset 0, flags [DF], proto TCP (6), length 16713)
    localIP.50978 > externalIp.http: Flags [P.], cksum 0xd3d4 (incorrect -> 0xed92), seq 166017:182678, ack 1390, win 1444, options [nop,nop,TS val 1613152807 ecr 262243668], length 16661: HTTP
09:26:53.632437 IP (tos 0x0, ttl 64, id 53481, offset 0, flags [DF], proto TCP (6), length 52)
    localIP.51054 > externalIp.http: Flags [.], cksum 0x92bf (incorrect -> 0x5bcc), ack 464, win 1444, options [nop,nop,TS val 1613152812 ecr 262243674], length 0
09:26:53.638408 IP (tos 0x0, ttl 64, id 2460, offset 0, flags [DF], proto TCP (6), length 11636)
    localIP.50892 > externalIp.http: Flags [.], cksum 0xbfff (incorrect -> 0x9468), seq 91408:102992, ack 927, win 1444, options [nop,nop,TS val 1613152814 ecr 262243675], length 11584: HTTP, length: 11584

I see big difference in field: length, while in first case it is quite small, while in later case it is large and whole data is being transferred very quickly. How is this length field determined, what factor impacts this?

Saurabh
  • 113
  • 1
  • 1
  • 10
  • 1
    What do you mean "in other box"? I can see that sequence numbers are different in pasted dumps. Maybe you want to know how TCP/IP protocol determines how large should be payload in any moment ? – mrc02_kr Oct 09 '18 at 13:45
  • @mrc02_kr Yes, this is comparison of two different hardware boxes, in one box length is higher while in other it is lower, want to understand why, what factor impacts it? – Saurabh Oct 11 '18 at 02:44
  • I think it's not specific to Linux but rather to mechanisms of TCP protocol and network topology. Are these two hosts are in the same local network and are they communicating with the same remote host? Can you tell us more about models of network interfaces (it can be relevant because of TCP offload engine) and operating systems you're using? – mrc02_kr Oct 11 '18 at 07:21
  • @mrc02_kr Yes, these two hosts are located in same data center and are talking to same remote host. Both of these have linux debian 9 installed. – Saurabh Oct 14 '18 at 17:07

2 Answers2

2

The difference in length observed is due to TCP segmentation offload. Most of the newer network cards support this feature in hardware, to reduce CPU usage in segmenting packets. tcpdump is observing packets before segmentation takes place, hence it sees packets way larger than configured MTU (actual packet on wire would still be limited by MTU size)

You can verify tcp segmentation offload for your NIC, using ethtool (For example, to check on eth0 device)

# ethtool -k  eth0 |grep 'tcp-segmentation-offload'
tcp-segmentation-offload: on

It can be disabled using ethtool -K tso off

Example of outgoing data seen with TSO enabled (Max reaching to 64k - TCP limit)

15:08:22.451667 IP 192.168.230.9.43736 > 192.168.157.102.22: Flags [.], seq 32023713:32088873, ack 19886, win 340, options [nop,nop,TS val 3241810413 ecr 3874669422], length 65160
15:08:22.452203 IP 192.168.230.9.43736 > 192.168.157.102.22: Flags [.], seq 32088873:32154033, ack 19886, win 340, options [nop,nop,TS val 3241810413 ecr 3874669423], length 65160

with TSO disabled, length is limited by MTU (here it's 1500)

15:09:43.181882 IP 192.168.230.9.43738 > 192.168.157.102.22: Flags [.], seq 9881:11329, ack 4206, win 319, options [nop,nop,TS val 3241830596 ecr 3874750153], length 1448
15:09:43.181886 IP 192.168.230.9.43738 > 192.168.157.102.22: Flags [.], seq 11329:12777, ack 4206, win 319, options [nop,nop,TS val 3241830596 ecr 3874750153], length 1448

Variable length of payload is due to number of segments merged by the NIC. It can vary based on NIC resources & traffic at the time on the server.

VenkatC
  • 2,175
0

From the manual of tcpdump

The general format of a TCP protocol line is:  
    src > dst: Flags [tcpflags], seq data-seqno, ack ackno, 
               win window, urg urgent, options [opts], length len   
Src and dst are the source and destination IP addresses and ports.
[...] Len is the length of payload data. 

In TCP the payload data is expressed in bytes, (I'm not 100% sure, but in the sources of tcpdump in the file print-tcp.c you can see that only the term bytes is used in the comment regarding the length field), and is the actual data inside your TCP datagram, the data your application will use.

Kafka is a messaging application and probably sends a stream of bytes varying in broadband used with the amount of message to send.

Although your packet are not TCP fragmented in this case as we can read the flag [DF] (don't fragment), it doesn't matter. the useful data inside your TCP data are "lenght" bytes long.
How the size is chosen is up to you TCP stack (probably your OS is responsible of that) and how many data it needs to send.
It varies and it's not a problem at all the TCP is flexible enough to not send 65,365 bytes when it only need to send 100 bytes.

Kiwy
  • 9,534
  • 1
    You've not considered jumbo frames. These packets are all >1500 bytes – Chris Davies Oct 08 '18 at 09:11
  • Yes I know, but they are only use in very specific environment, mostly on storage as they most of the time create more trouble than they solve problems – Kiwy Oct 08 '18 at 09:19
  • @roaima Also this is marginally relevant, it doesn't change the general idea behind my answer. – Kiwy Oct 08 '18 at 09:30
  • Sure. Just slightly puzzled why we're seeing larger packets on the wire. – Chris Davies Oct 08 '18 at 09:32
  • @roaima Oh I see what you mean, I will make explicit the fact that TCP is 65635 B and is reconstructed from fragmented ethernet packet that could be smaller anyway. – Kiwy Oct 08 '18 at 09:43
  • Oh , ok. I was sure that tcpdump/tshark didn't reassemble fragmented packets when showing wire-level traffic. (I know it reassembles for higher level protocols.) – Chris Davies Oct 08 '18 at 10:29
  • Re ...a stream of bytes varying in broadband used with the amount of message to send.": this is unclear and depending on the actual meaning perhaps ungrammatical. Why "broadband used"? Not all connections are uniformly broadband, and some of the world's remote TCP/IP interconnections use no broadband whatsoever. – agc Oct 08 '18 at 14:03
  • I would add MTU on the mix ;) – Rui F Ribeiro Oct 08 '18 at 16:57