1

I have an embedded Linux system built using Yocto. Recently I upgraded from Dunfell (kernel 5.4) to Kirkstone (kernel 5.15) and I started having problems with the serial port. Large chunks of bytes will sometimes be dropped while transmitting. Sometimes I can send 1M of data with no dropped bytes, other times I send that amount with only 300 bytes transmitting.

I had a previous problem with this system where I had a similar symptoms. A few bytes would be dropped on TX, and I had to significantly lower the baud rate to achieve stability. Then the problem ended up being that I wasn't running the imx_sdma module. Once I included that, everything worked normal at higher baud rates.

So when I had this problem, the first thing I checked as that that module was loaded, which it is. I also found that dropping the baud rate isn't having a noticeable impact on the dropped bytes. So it seems different.

I can rule out hardware issues as it wasn't happening before the upgrade and it's happening on other systems.

There isn't a heavy processing load during these transfers. The processor is currently sitting at 0% utilization.

I should note there is no hardware or software flow control on this system. It's actually a TX only path. Some data loss or corruption might be expected in such a case, but not like this, and it was working fine before.

Does anyone have any ideas of what I should be looking at for a cause or solution?

Edit: One somewhat strange thing on the receive side, the device (/dev/ttymxc1) is non-blocking. I can stream bytes to it, but if I do a cat /dev/ttymxc1 or dd if=/dev/ttymxc1, it returns immediately with no data, even if a lot of bytes should be coming in on that port. That makes me think there might be some receive issue.

xAptive
  • 11
  • Have you looked at scheduling granularity? Look at this answer: https://unix.stackexchange.com/a/466723/73558 (My guess is that the task gets switched out. In theory, you shouldn't lose any data, as the FIFO should just spend more time empty. But I've seen weird things when FIFOs go empty...) – Popup Sep 05 '23 at 14:25
  • It could be a good idea to find a way to reliable replicate it. Also, is there any chance to downgrade the kernel? There are probably other pieces of software that changed, it doesn't have to be the kernel. – Eduardo Trápani Sep 05 '23 at 17:51
  • How did you try, using your application or simply dding to the device file? And no hints in the system log about FIFO overrun? – Philippos Sep 06 '23 at 10:41
  • @Popup Thanks, I will look in to scheduling. I had considered the idea of task switching, and wishing I had maybe gone with some sort of RTOS. But I didn't think of tweaking scheduling settings. – xAptive Sep 06 '23 at 13:28
  • @EduardoTrápani I can replicate it. My kernel options are very limited. I can go back to 5.10, but not all the way back to 5.4. But 5.10 may be worth trying. Of course it's not a good long term approach because I'll need to upgrade the kernel eventually anyway. But certainly there's more going on than just the kernel version. – xAptive Sep 06 '23 at 13:34
  • @Philippos I wrote a simple python script, a tx script on one side and an rx script on the other. It sends a variable number of bytes, and I compare counts on both sides to verify they are the same. But dd may be worth trying just to rule out any sort of python issue. Nothing in the system log about FIFO overrun. Would it be normal to see something like that? Does an absence of such a message suggest that isn't the issue? – xAptive Sep 06 '23 at 13:36
  • I'd bet on the UART driver module behaving buggily. I'd complain to Microchip, they are your hardware vendor. – Marcus Müller Sep 06 '23 at 14:28
  • @MarcusMüller Why Microchip? This smells like NXP to me. But they will not listen to complaints, unless they are reproduceable and well documented. – Philippos Sep 07 '23 at 05:25
  • @xAptive I'd expect an error message, yes, but I've never met this situation. Does your script give any hints like the numbers of lost bytes show some pattern? And if it's about i.MX, searching the forum https://community.nxp.com/t5/i-MX-Processors/bd-p/imx-processors may be worth a try. Are you using mainline or imx kernel? – Philippos Sep 07 '23 at 05:57
  • @Philippos your right, I had a brain hiccup. I.mx is nxp :) – Marcus Müller Sep 07 '23 at 15:24

0 Answers0