I have an old laptop (around 10 years old, maybe), on which I have a minimal install of Debian 10. I use it to download and store media files, which I reproduce from other machines on my home network. I generally keep its lid closed, and access it through ssh. I've had it doing this for around a year, and it generally runs smoothly — excluding a random crash once every month or so, maybe. Recently, though, it started crashing way more frequently: between once a week, to sometimes within minutes to an hour of me booting it and getting everything up and running, or even during boot.
I've ran memtest86+ and a SMART test, and both reported no problems. I also checked the core's temperature, and it seemed to not be the problem either. Like I said, this is an old laptop, so it may be that something has just reached its end of life, but I'd like to make sure that's the case...
What else should I be looking at to assess the reason(s) for these random crash/shutdowns? I'm interested in figuring out what if this is a hardware or software problem, and how I can solve it — or, alternatively, which parts of the computer are potentially still salvageable.
Also happy to dump whatever extra info is needed here :)
As per this comment, pasting the output of dmesg --level=alert,crit,err,warn
:
[ 0.225970] ACPI BIOS Warning (bug): Incorrect checksum in table [ATKG] - 0xB0, should be 0x4A (20180810/tbprint-177)
[ 0.362067] core: PEBS disabled due to CPU errata
[ 0.363544] mtrr: your CPUs had inconsistent variable MTRR settings
[ 0.424461] Expanded resource Reserved due to conflict with PCI Bus 0000:00
[ 3.474163] Unstable clock detected, switching default tracing clock to "global"
If you want to keep using the local clock, then add:
"trace_clock=local"
on the kernel command line
[ 3.728460] ACPI Warning: SystemIO range 0x0000000000000828-0x000000000000082F conflicts with OpRegion 0x0000000000000800-0x000000000000084F (\PMIO) (20180810/utaddress-213)
[ 3.728473] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x000000000000053F (\GPIO) (20180810/utaddress-213)
[ 3.728481] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x000000000000053F (\GPIO) (20180810/utaddress-213)
[ 3.728488] lpc_ich: Resource conflict(s) found affecting gpio_ich
dmesg
for any clues? Have you runsensors
to check whether your CPU is not overheating? – Artem S. Tashkinov Aug 28 '20 at 11:18sensors
, yes, and that doesn't seem to be the problem. I'm checkingdmesg
now, but I'm a bit of a noob and am not sure what I should be looking for — got any pointers, @ArtemS.Tashkinov? Or should I should paste its output into the question body? – Marcy Aug 28 '20 at 11:32dmesg --level=alert,crit,err,warn
to see only what's "bad" ;-) You may paste it into your question, yes. – Artem S. Tashkinov Aug 28 '20 at 11:39dmesg
output. Perhaps you're looking at a HW failure but I've no idea how to diagnose it. Also, I presume you've rebooted/powered on just recently, so the errors are yet to appear. – Artem S. Tashkinov Aug 28 '20 at 11:42dmesg
output because at the point of a crash the kernel is unable to log anything to the disk - at most errors could be seen on the device screen. – Artem S. Tashkinov Aug 28 '20 at 11:47setterm
) and next time it happens make a photo of your screen. – Artem S. Tashkinov Aug 28 '20 at 11:54the computer completely shuts down
- that surely indicates a HW failure. Even if the kernel or some app crash, your system will keep on running in a broken state. – Artem S. Tashkinov Aug 28 '20 at 11:58