17

Once in several days I have the following problem. My laptop (Debian testing) suddenly becomes unable to work with TCP connections to the internet.

The following things continue working fine:

  • UDP (DNS), ICMP (ping) — I get instant response
  • TCP connections to other machines in the local network (e.g. I can ssh to a neighbour laptop)
  • everything is ok for other machines in my LAN

But when I try TCP connections from my laptop, they time out (no response to SYN packets). Here's a typical curl output:

% curl -v google.com     
* About to connect() to google.com port 80 (#0)
*   Trying 173.194.39.105...
* Connection timed out
*   Trying 173.194.39.110...
* Connection timed out
*   Trying 173.194.39.97...
* Connection timed out
*   Trying 173.194.39.102...
* Timeout
*   Trying 173.194.39.98...
* Timeout
*   Trying 173.194.39.96...
* Timeout
*   Trying 173.194.39.103...
* Timeout
*   Trying 173.194.39.99...
* Timeout
*   Trying 173.194.39.101...
* Timeout
*   Trying 173.194.39.104...
* Timeout
*   Trying 173.194.39.100...
* Timeout
*   Trying 2a00:1450:400d:803::1009...
* Failed to connect to 2a00:1450:400d:803::1009: Network is unreachable
* Success
* couldn't connect to host
* Closing connection #0
curl: (7) Failed to connect to 2a00:1450:400d:803::1009: Network is unreachable

Restarting the connection and/or reloading the network card kernel module doesn't help. The only thing that helps is reboot.

Clearly something is wrong with my system (everything else works fine), but I have no idea what exactly.

My setup is a wireless router that is connected to the ISP via PPPoE.

Any advice?

Answers to comments

What NIC is it?

12:00.0 Network controller: Broadcom Corporation BCM4313 802.11b/g/n Wireless LAN Controller (rev 01)
  Subsystem: Dell Inspiron M5010 / XPS 8300
  Flags: bus master, fast devsel, latency 0, IRQ 17
  Memory at fbb00000 (64-bit, non-prefetchable) [size=16K]
  Capabilities: [40] Power Management version 3
  Capabilities: [58] Vendor Specific Information: Len=78 <?>
  Capabilities: [48] MSI: Enable- Count=1/1 Maskable- 64bit+
  Capabilities: [d0] Express Endpoint, MSI 00
  Capabilities: [100] Advanced Error Reporting
  Capabilities: [13c] Virtual Channel
  Capabilities: [160] Device Serial Number 00-00-9d-ff-ff-aa-1c-65
  Capabilities: [16c] Power Budgeting <?>
  Kernel driver in use: brcmsmac

What is the state of your NIC when the problem occurs?

iptables-save prints nothing.

ip rule show:

0:  from all lookup local 
32766:  from all lookup main 
32767:  from all lookup default 

ip route show table all:

default via 192.168.1.1 dev wlan0 
192.168.1.0/24 dev wlan0  proto kernel  scope link  src 192.168.1.105 
broadcast 127.0.0.0 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
local 127.0.0.0/8 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
local 127.0.0.1 dev lo  table local  proto kernel  scope host  src 127.0.0.1 
broadcast 127.255.255.255 dev lo  table local  proto kernel  scope link  src 127.0.0.1 
broadcast 192.168.1.0 dev wlan0  table local  proto kernel  scope link  src 192.168.1.105 
local 192.168.1.105 dev wlan0  table local  proto kernel  scope host  src 192.168.1.105 
broadcast 192.168.1.255 dev wlan0  table local  proto kernel  scope link  src 192.168.1.105 
fe80::/64 dev wlan0  proto kernel  metric 256 
unreachable default dev lo  table unspec  proto kernel  metric 4294967295  error -101 hoplimit 255
local ::1 via :: dev lo  table local  proto none  metric 0 
local fe80::1e65:9dff:feaa:b1f1 via :: dev lo  table local  proto none  metric 0 
ff00::/8 dev wlan0  table local  metric 256 
unreachable default dev lo  table unspec  proto kernel  metric 4294967295  error -101 hoplimit 255

All of the above is the same when the machine works in normal mode.

ifconfig — I ran it, but somehow forgot to save before rebooting. Will have to wait till the next time the problem occurs. Sorry about that.

Any QoS in place?

Probably not — at least I haven't done anything specifically to enable it.

Have you tried sniffing the traffic actually sent on the interface?

I ran curl and tcpdump several times, and there were two patterns.

The first is just SYN packets without answers.

17:14:37.836917 IP (tos 0x0, ttl 64, id 4563, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbea8), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33770316 ecr 0,nop,wscale 4], length 0
17:14:38.836650 IP (tos 0x0, ttl 64, id 4564, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbdae), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33770566 ecr 0,nop,wscale 4], length 0
17:14:40.840649 IP (tos 0x0, ttl 64, id 4565, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42030 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0xbbb9), seq 3764607647, win 13600, options [mss 1360,sackOK,TS val 33771067 ecr 0,nop,wscale 4], length 0

The second is this:

17:22:56.507827 IP (tos 0x0, ttl 64, id 41583, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0x2244), seq 1564709704, win 13600, options [mss 1360,sackOK,TS val 33894944 ecr 0,nop,wscale 4], length 0
17:22:56.546763 IP (tos 0x58, ttl 54, id 65442, offset 0, flags [none], proto TCP (6), length 60)
    fra07s07-in-f102.1e100.net.http > 192.168.1.105.42036: Flags [S.], cksum 0x6b1e (correct), seq 1407776542, ack 1564709705, win 14180, options [mss 1430,sackOK,TS val 3721836586 ecr 33883552,nop,wscale 6], length 0
17:22:56.546799 IP (tos 0x58, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [R], cksum 0xf301 (correct), seq 1564709705, win 0, length 0
17:22:58.511843 IP (tos 0x0, ttl 64, id 41584, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [S], cksum 0x27fc (incorrect -> 0x204f), seq 1564709704, win 13600, options [mss 1360,sackOK,TS val 33895445 ecr 0,nop,wscale 4], length 0
17:22:58.555423 IP (tos 0x58, ttl 54, id 65443, offset 0, flags [none], proto TCP (6), length 60)
    fra07s07-in-f102.1e100.net.http > 192.168.1.105.42036: Flags [S.], cksum 0x3b03 (correct), seq 1439178112, ack 1564709705, win 14180, options [mss 1430,sackOK,TS val 3721838596 ecr 33883552,nop,wscale 6], length 0
17:22:58.555458 IP (tos 0x58, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.1.105.42036 > fra07s07-in-f102.1e100.net.http: Flags [R], cksum 0xf301 (correct), seq 1564709705, win 0, length 0

ethtool output

ethtool -k wlan0:

Features for wlan0:
rx-checksumming: off [fixed]
tx-checksumming: off
  tx-checksum-ipv4: off [fixed]
  tx-checksum-unneeded: off [fixed]
  tx-checksum-ip-generic: off [fixed]
  tx-checksum-ipv6: off [fixed]
  tx-checksum-fcoe-crc: off [fixed]
  tx-checksum-sctp: off [fixed]
scatter-gather: off
  tx-scatter-gather: off [fixed]
  tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
  tx-tcp-segmentation: off [fixed]
  tx-tcp-ecn-segmentation: off [fixed]
  tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: off [requested on]
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: on [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]

iptables

# namei -l "$(command -v iptables)"
f: /sbin/iptables
drwxr-xr-x root root /
drwxr-xr-x root root sbin
lrwxrwxrwx root root iptables -> xtables-multi
-rwxr-xr-x root root   xtables-multi

# dpkg -S "$(command -v iptables)"
iptables: /sbin/iptables

# iptables -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t mangle -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# iptables -t security -nvL
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

module info

# ethtool -i wlan0                   
driver: brcmsmac
version: 3.2.0-3-686-pae
firmware-version: N/A
bus-info: 0000:12:00.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

# modinfo brcmsmac
filename:       /lib/modules/3.2.0-3-686-pae/kernel/drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko
license:        Dual BSD/GPL
description:    Broadcom 802.11n wireless LAN driver.
author:         Broadcom Corporation
alias:          pci:v000014E4d00000576sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004727sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004353sv*sd*bc*sc*i*
alias:          pci:v000014E4d00004357sv*sd*bc*sc*i*
depends:        mac80211,brcmutil,cfg80211,cordic,crc8
intree:         Y
vermagic:       3.2.0-3-686-pae SMP mod_unload modversions 686 

There's no /sys/module/brcmsmac/parameters. Here's what I have there:

# tree /sys/module/brcmsmac
/sys/module/brcmsmac
├── drivers
│   └── pci:brcmsmac -> ../../../bus/pci/drivers/brcmsmac
├── holders
├── initstate
├── notes
├── refcnt
├── sections
│   └── __bug_table
└── uevent

Some sites actually work

As suggested by dr, I tried some other sites, and to my great surprise some of them indeed worked. Here are some hosts that worked:

  • rambler.ru
  • google.ru
  • ya.ru
  • opennet.ru
  • tut.by
  • ro-che.info
  • yahoo.com
  • ebay.com

And here are some that didn't:

  • vk.com
  • meta.ua
  • ukr.net
  • tenet.ua
  • prom.ua
  • reddit.com
  • github.com
  • stackexchange.com

Network capture

I made a network capture and uploaded it here.

Roman Cheplyaka
  • 1,184
  • 3
  • 11
  • 25
  • 1
    Just by curiosity: What is the state of your NIC when the problem occurs? (/sbin/ifconfig ?) – yves Baumes Sep 30 '12 at 17:57
  • HAve you tried sniffing the traffic actually sent on the interface (wireshark/tcpdump...)? What NIC is it? Is it wireless? What's the output of iptables-save, of ip rule show, ip route show table all. Any QoS in place? – Stéphane Chazelas Sep 30 '12 at 22:07
  • Updated the post with answers to your questions. – Roman Cheplyaka Oct 03 '12 at 14:49
  • Did you install a package to gain BCM43XX compatibility or did you build the driver on your own? The only way I was able to get stable connectivity on Debian was by building the driver from source. – Mountainerd Oct 03 '12 at 15:21
  • 1
    I didn't build drivers from source. The module itself comes from the stock Debian kernel (package linux-image-3.2.0-3-686-pae), and the firmware comes from the firmware-brcm80211 package. Did you have problems similar to mine? I'd rather avoid building stuff by hand, unless it is some known issue. Also, why would a NIC module problem manifest itself on the layer 4? – Roman Cheplyaka Oct 03 '12 at 15:32
  • @RomanCheplyaka Can you also post outputs of traceroute www.google.com and curl -v www.yahoo.com when the problem occurs. – Karlson Oct 03 '12 at 18:55
  • 1
    More than likely whatever is wrong is on your Wi-Fi base station, switch or router. If possible try tracing packets (or packet counts) there. If not, try swapping them with alternates. – bahamat Oct 03 '12 at 20:56
  • @bahamat: why do you think so? Why does it happen to only one host in the network and goes away after reboot? Also, I've had this wifi router for about four years, and until recently I haven't seen this effect. (I think «recently» coincides with my upgrade of Debian from stable to testing several months ago.) – Roman Cheplyaka Oct 03 '12 at 21:00
  • @RomanCheplyaka: That's a good point. But I think that because you said TCP connections to the local network are working. The first place I'd start is to verify that packets are indeed on the wire outside of the trouble host. – bahamat Oct 03 '12 at 21:21
  • @RomanCheplyaka: What is the vendor and model name of your wifi-router? I use tp-link wr340gd with a default firmware. – Alex R Oct 09 '12 at 14:15
  • @dr01: interesting — mine is also tp-link, WR641G/642G, default firmware. – Roman Cheplyaka Oct 09 '12 at 14:39
  • Post the dmesg output relating to your NIC, gathered immediately after the problem occurs again. – Michael Hampton Oct 11 '12 at 02:26

4 Answers4

5

In the capture you provided, the Time Stamp Echo Reply in the SYN-ACK in the second packet doesn't match the TSVal in the SYN in the first packet and is a few seconds behind.

And see how all the TSecr sent by both 173.194.70.108 and 209.85.148.100 are all the same and irrelevant from the TSVal you send.

It looks like there's something that mingles with the TCP timestamps. I have no idea what may be causing that, but it sounds like it is outside your machine. Does rebooting the router help in this instance?

I don't know if it's what's causing your machine to send a RST (on the 3rd packet). But it definitely doesn't like that SYN-ACK, and it's the only thing wrong I can find about it. The only other explanation I can think of is if it's not your machine that is sending the RST but given the time difference between the SYN-ACK and RST I would doubt so. But just in case, do you use virtual machines or containers or network namespaces on this machine?

You could try disabling TCP timestamps altogether to see if that helps:

sudo sysctl -w net.ipv4.tcp_timestamps=0

So, either those sites send bogus TSecr or there's something on the way there (any router on the way, or transparent proxy) that mangles either the outgoing TSVal or the incoming TSecr, or a proxy with a bogus TCP stack. Why one would mangle the tcp timestamps I can only speculate: bug, intrusion detection evasion, a too-smart/bogus traffic shaping algorithm. That's not something I've heard of before (but then I'm no expert on the subject).

How to investigate further:

  • See if the TPLink router is to blame why resetting it to see if that helps or capture the traffic on the outside as well if possible to see if it does mangle the timestamps
  • Check whether there's a transparent proxy on the way by playing with TTLs, looking at request headers received by web servers or see behaviour when requesting dead websites.
  • capture traffic on a remote web server to see if it's the TSVal or TSecr that is mangled.
  • No, I didn't have any vms/containers running. I'll try your suggestions next time, thanks. – Roman Cheplyaka Oct 10 '12 at 21:36
  • 1
    Xm.. You suggestion about tcp_timestamps definitely solves my problem. No problem with google and other website at all after setting net.ipv4.tcp_timestamps to 0 and all bunch of problems again in case of net.ipv4.tcp_timestamps=1 but WHY? – Alex R Oct 12 '12 at 20:10
1

It says incorrect checksum above. Is there checksum offloading for that device (I didn't know wireless devices could offload checksums).

What does sudo ethtool -k wlan0 tell you. If there is offloading, you may want to try and disable it.

You need to be root to call iptables-save. There's still some remote chance that something is mangling packets there. If iptables-save doesn't work, try:

iptables -nvL
iptables -t mangle -nvL
iptables -t nat -nvL
iptables -t security -nvL

In your network capture, does the destination MAC address match that of the router. Anything interesting in a comparison from UDP traffic to TCP traffic?

Also, where $dev is the kernel driver (module) (see ethtool -i wlan0) for your wireless adaptor, what do modinfo "$dev" and grep . /sys/module/"$dev"/parameters/* tell you?

1

It seems, I have exactly the same behavior at my laptop too. I don't know the reason, but from time to time I couldn't connect to google.com and some other external resources. Pings and DNS queries work perfectly. Also I've found only one solution: reboot.

I could add several observations:

  1. If I boot some other OS in my Virtual Box(Windows, ArchLinux, Ubuntu), I could establish TCP connections with problem hosts without any issues.
  2. Some of hosts in the Internet behave like google.com, but there is many of them, which are normally accessible using telnet or web-browser
  3. I don't have a WIFI-adapter on my laptop, I have only Ethernet link to router
  4. I've tried to chroot into debian/gentoo userspace - it doesn't help
  5. I've replaced my NIC by the new one - it doesn't help

Some technical information about my box:

OS: Last ArchLinux amd64

$ ethtool -i  eth0
driver: via-rhine
version: 1.5.0
firmware-version: 
bus-info: 0000:02:07.0
supports-statistics: no
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$uname -a
Linux eniac-2 3.5.4-1-ARCH #1 SMP PREEMPT Sat Sep 15 08:12:04 CEST 2012 x86_64 GNU/Linux

I suppose, this buggy behavior occurs because of some subtle bug in some versions of Linux kernel, but I don't know how to debug this issue, and because of unstable reproducing I'm stuck.

Alex R
  • 2,351
0
/sbin/iptables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu

I had the same problem you described until added the above command my Internet gateway iptables commands. In is included by default in rp-pppoe package and others. But when you go for custom configurations and don't set it manually, the computers on the LAN behind the gateway will have the problems you describe.

chaos
  • 48,171