1

I have a zte router with Linux kernel 4.1.25 which I have installed an snmp agent (mini-snmpd) on it to control traffic.

As I was getting inaccurate data I investigated a bit and detected that ifconfig and /proc/net/dev both are giving wrong data, while zte programs (cspd/httpd) are providing correct values.

ifconfig returns this info for eth4 which is the interface Im plugged in:

RX bytes: 177365531 (161.1 MiB) TX bytes: 12507777123 (11.6Gib)

router webpage is showing a much larger and, I believe, correct values for this interface:

Bytes Received/Sent 100655800471/286284079917

If I do a speedtest-cli from my computer I can see the web counters increase according to the transmited data speedtest-cli reports that has been transferred (2/3GB in/out). However ifconfig barely moves, sometimes it increases +300MB, sometimes +100MB.


   Speedtest by Ookla
 Server: xxxxxx (id = 14979)
    ISP: xxxxxx
Latency:     1.93 ms   (0.12 ms jitter)

Download: 2339.50 Mbps (data used: 2.9 GB ) Upload: 2339.21 Mbps (data used: 2.1 GB ) Packet Loss: 0.0%

Could be there a reason of why kernel is not getting accurated data and however the program from zte does ?

Edit:

Well I believe I've found the cause, not a solution, if there is any.

There is some tunables in /proc/zte/sys/ffe

If I do: echo npu 0 > /proc/zte/sys/ffe/cmd

I start seeing good network graphs, but incoming traffic seems to be capped at about 1,7Gbps (I was obtaining ~2,35Gbps before)

If I do: echo ffe 0 > /proc/zte/sys/ffe/cmd

Download and upload speed dramatically fall to about 300mbps each.

I've been investigating about what npu and ffe stands for, and they appear to be some hardware network accelerators (Network Processing Unit and Fast Forward Engine).

Still I dont know why /proc/net/dev is not getting noticed of the total traffic passing through the npu and however a program in userspace is, and if this can be fixed somehow. There are some no_delete,timer,flowctrl options which im gonna play a bit with them to see if it is able to update net/dev file.

Edit 2:

Well, after investigating a bit more I've used strace on cspd process and by requesting eth stats on the router web I can see a write() of what looks like a call on cspd process function CmEthGetPortBasicStats(). Looking into cspd binary with ghidra I see that there is a fopen() of the file /dev/switch_dev, followed by an ioctl with value 0x38 (56) when it ask for basic stats, and other values for different things like setting speed,duplex, interface on/off... I've tried to create an ioctl program to retrieve the data from switch_dev. No sucess yet, only receiving a few bytes.

Edit 3:

Ok, so cspd binary has two functions:

https://i.stack.imgur.com/4OltN.png

This one is the function that does exactly what I wanted, it receives stats of all interfaces at once. This function is called when I check eth stats from the router webpage.

https://i.stack.imgur.com/gfl1x.png

And this one does the same but it does per interface and I've noticed that it is called when I run this command in console: sendcmd 1 switch_mgr getPortStats [iface number]

sendcmd

So despite not being able yet to directly interact with /dev/switch_mgr I've patched the mini-snmpd source code to generate a fake /proc/net/dev with added data from the sendcmd command, which mini-snmpd parses instead of the original to finally achieve what I was looking for:

grafana1

I know the solution its not perfect, as Im changing an ioctl() call for 5 popen("sendcmd 1 switch_mgr...") but I dont really see any noticeable overhead due to that, and it provides correct numbers to mini-snmpd. Obviously this doesnt fix ifconfig as im creating a fake /proc/net/dev on /var/tmp/dev-fake, which ifconfig does not feed from that. Maybe I will compile a new static ifconfig later, but not a priority for me at this point.

For me this is enough, but I still would like some help if someone wants to help me to directly interact with /dev/switch_dev

Alex
  • 111
  • 2
  • Any figures regarding dropped packets ? overrun ? arp protocol enabled ? disabled ? What does tcpdump tell ? – MC68020 Aug 12 '22 at 15:33
  • The obvious answer is that the router's web UI is watching the device you're actually using, and eth4 is not that device. I'm not familiar with zte routers so I can't suggest what the right device is. – Sotto Voce Aug 12 '22 at 15:47
  • @SottoVoce in config backup file the interface LAN5 which is the one I am plugged in and the only one that is 10gbps (the other 4 are 1Gbps) is referred as eth4. Also eth4 is the only one showing counters bigger than 1GB in ifconfig.

    Ethernet Port LAN5 Status Up/2500Mbps/Full Duplex

    – Alex Aug 12 '22 at 17:37
  • @MC68020 in RX errors:0 dropped:9 overruns:0, in TX all 0. ARP enabled, I see tcpdump registering packets, I can see the full handshake of an ssh session, but if when I do speedtest-cli, I dont see the tcpdump console moving as fast as I believe it should be. One thing I believe its happening is that counters are stopping after a few secs of transfer init. In this graph (https://ibb.co/w6Dm95S) at 19:48 and 19:48:30 I've done two speedtest-cli, both gave me about 2350mbps u/d, transferring about 2GB i/o each. At 19:49:20 I downloaded for some min at 70MB/s and only the start spike is shown. – Alex Aug 12 '22 at 18:04
  • Out of interest & possibly unrelated, looking to your graph, legend telling ppp0 ??? Do you actually get a ppp interface somewhere ? – MC68020 Aug 12 '22 at 18:19
  • @MC68020, yes, its a home router, ppp0 is connected to the ISP. I took the picture of ppp0 instead of eth4 just to clarify that this is happening in all interfaces and its not a problem of me looking at the wrong iface. The graph of eth4 was similar to this one at the time. – Alex Aug 13 '22 at 01:11
  • So what's populating the items under /proc/zte? You're aware that the /proc filesystem exposes kernel internals using a filesystem API, right? – Sotto Voce Aug 13 '22 at 09:54
  • @SottoVoce Yeah I guess its some of the zte propietary kernel modules that are loaded in the system and populating /proc/zte. There are a few and they all depend on one called bspsomething.ko, I'll check this but I already checked /proc/zte and every change I've done hasnt been positive in terms of performance, looks like its either disabling npu and getting good counters at /proc/net/dev with perf penalty or enabling it at the price of having /proc/net/dev info unusable. – Alex Aug 13 '22 at 11:33
  • My interpretation is that the low-level router operations are implemented in a custom kernel module, which maintains its own set of devices. The ZTE software knows how to query the custom stack for traffic stats but the standard Linux tools only know the standard Linux network stack. – Sotto Voce Aug 13 '22 at 16:30

0 Answers0