I'm trying to throubleshoot an annoying shutdown problem with my Sun Ultra 24 Workstation running under Devuan ASCII.
groucho@devuan:~$ inxi -b
System: Host: devuan Kernel: 4.9.0-8-amd64 x86_64 (64 bit) Desktop: Xfce 4.12.3
Distro: Devuan GNU/Linux ascii
Machine: Device: portable System: Sun Microsystems product: Ultra 24 v: 0.00.01
Mobo: Sun Microsystems model: Ultra 24 v: 50 BIOS: American Megatrends v: 1.56 date: 01/21/2011
--- snip ---
groucho@devuan:~$
Obviously it's not a portable system. It's just that this BIOS file was published post Jan 2010, date of Sun's demise.
groucho@devuan:~$ uname -a
Linux devuan 4.9.0-8-amd64 #1 SMP Debian 4.9.144-3.1 (2019-02-19) x86_64 GNU/Linux
groucho@devuan:~$
This seems to be a distribution agnostic problem. It also happens on the same rig with an emergency TCore Linux I have on a memory stick accessible through F8 at boot-time.
I don't know if it happens in MSOS installations as I don't have one, just a VM running XP for testing stuff.
The issue is basically this:
On shutdown, the machine will do one of two things:
- shut down properly
- freeze during the shutdown at this point ...
e1000e: EEE Tx LPI Timer
Preparing to enter sleep state S5
Reboot: Power Down
... with the fans blowing at full speed.
Originally this was a two-part problem: the first part was a reboot on shutdown issue but (apparently) that got fixed by disabling WoL and it has not happened again.
The second part ocurrs (like the first part) in a totally unpredictable manner and I have not been able to reproduce it or link it to anything in particular.
Besides disabling WoL (a hassle of sorts as it cannot be done via BIOS) I also disabled the Intel e1000e controller's EEE settings but to no avail.
Unloading the e1000e driver module with a script at shutdown or inserting a variety of reboot= stanzas in the kernel command line have not worked either. ie: reboot=force, reboot=acpi, reboot=BIOS, etc.
To try to get a glimpse of what was going on, I decided to shut down the rig using a script that would (hopefully) isolate each of the stages of the shut down process and (maybe) give me some feedback at the terminal, much like what I did in my MS-DOS days by running config.sys and autoexec.bat in a step-by-step manner to weed out start-up issues:
#!/bin/sh
#Shut down system without the use of shutdown helper
#
PATH=/sbin:/bin:/usr/sbin:/usr/bin:
for i in s u o; do echo $i | sudo tee /proc/sysrq-trigger; sleep 2; done # halt
But no, after a number of shutdowns it eventually occurs again and this is what I see on screen:
s
u
sudo: unable to open log file: /var/log/sudo.log: read only file system
... with the fans blowing at full speed.
When I do not get a shutdow freeze, the var/log/messages will always file reads:
Mar 8 09:37:16 devuan kernel: [ 8831.030260] sysrq: SysRq : Emergency Sync
Mar 8 09:37:16 devuan kernel: [ 8831.051494] Emergency Sync complete
Mar 8 09:37:18 devuan kernel: [ 8833.038247] sysrq: SysRq : Emergency Remount R/O
Mar 8 09:37:18 devuan kernel: [ 8833.069992] EXT4-fs (sdb1): re-mounted. Opts: (null)
Mar 8 09:37:18 devuan kernel: [ 8833.139131] EXT4-fs (sdb6): re-mounted. Opts: (null)
But sometimes (not all) a shutdown freeze will write a long series of ascii "non-text" codes to the pertinent log files, specifically 0xx (string terminating character), which seems to be the standard behaviour with ungracefull halts such as the one caused by the freeze.
This screws up the log files and the usual text editors will show just up to that point in the file (Leafpad) of directly refuse to open the logfile (Pluma). You have to open it either with a hex editor or with MC, which will actually show you everything and anything there is to see in a file.
So what I would need now is a way to look into that part of the shutdown process with a bit more granularity and see what brings about the shutdown freeze.
Then I may be able to reliably reproduce it.
It may be caused by the Ultra 24's ugly BIOS or maybe it's an obscure bug in the kernel, of the type that has gone by unseen or been neglected by the maintainers because it did not affect a sufficient number of installations, has not been reported, assigned and was dropped or simply bothered with because support ended with that whatever.
I've seen examples of bugs being assigned (eg: LibreOffice), dropped by the assignee due to the lack of available time then passed on to someone else who for whatever reason did not take it up only to end up unassigned for years.
Is there any other way to inspect the shutdown process to troubleshoot this further?