Env: Linux TRANQUILITY 5.3.18-150300.59.49-preempt #1 SMP PREEMPT Mon Feb 7 14:40:20 UTC 2022 (77d9d02) x86_64 x86_64 x86_64 GNU/Linux
Also OpenSuse Leap 15.3 with KDE Plasma 5
When I run the OS install from a USB stick I get as far as the partition check, where it bombs out with: /usr/bin/udevadm /dev/sdd1 could not be found (failed)
- (I can't recall the precise error text, without re-running the install process but this is the gist of it)
Last year (April 2021) I had a problem when I lost my /home
data and had to take the non-RAIDed disk to a data recovery company. From memory, I think it turned out to be a GPT error, which was only expensive and not astronomical to fix (happy to pay to get my data safe). I had them transfer all the /home
data from that disk to one of two 1TB WD Black disks I bought. I setup RAID-1 with the two new disks and this seemingly worked fine until recently.
Weird things stared happening, I have no idea whether they're related but they seem quite systemic - like:
- Firefox tabs started crashing immediately on certain sites, which I thought/think might be a Firefox bug
- Python stopped working to the extent that unrelated programs failed (Libre-Office mainly - keeps entering recovery) but FreeCAD (largely Python based) still worked
- Corel AftershotPro won't start because of a missing/incorrect version library (which is still there)
Lots of the fixes I tried seemed to come back to Python. In the end I removed Python which uninstalled a lot of key OS stuff (particularly the GUI) and then I re-installed it (along with most of the items that had been auto uninstalled. Even that didn't work, so I thought I'd reinstall the OS - thinking my /home
data would be safe. As I mentioned this doesn't appear to be the case and it's stopping my reinstall to stabilise my computer.
lsblk
<snipped the loop devices>
sda 8:0 0 931.5G 0 disk
└─sda1 8:1 0 931.5G 0 part
└─md127 9:127 0 931.5G 0 raid1
└─md127p1 259:0 0 931.5G 0 part /home
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
└─cr-auto-1 254:0 0 1.8T 0 crypt /china2
sdc 8:32 0 111.8G 0 disk
├─sdc1 8:33 0 148M 0 part
├─sdc2 8:34 0 100G 0 part /
├─sdc3 8:35 0 2G 0 part
└─sdc4 8:36 0 2G 0 part
sdd 8:48 0 931.5G 0 disk
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part /chinaPhotos
Note sda/sda1/md127/md127p1
is mounted here as /home
I'd guess sda
& sdd
are the contributing physical RAID devices and sdd
is the broken one?
I have tried the following: (which shows /home
is /dev/md127p1
)
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 4.0M 0 4.0M 0% /dev
tmpfs 7.8G 243M 7.5G 4% /dev/shm
tmpfs 3.1G 335M 2.8G 11% /run
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
/dev/sdc2 100G 38G 63G 38% /
/dev/sdc2 100G 38G 63G 38% /.snapshots
/dev/sdc2 100G 38G 63G 38% /boot/grub2/i386-pc
/dev/sdc2 100G 38G 63G 38% /boot/grub2/x86_64-efi
/dev/sdc2 100G 38G 63G 38% /opt
/dev/sdc2 100G 38G 63G 38% /root
/dev/sdc2 100G 38G 63G 38% /srv
/dev/sdc2 100G 38G 63G 38% /tmp
/dev/sdc2 100G 38G 63G 38% /var
/dev/sdc2 100G 38G 63G 38% /usr/local
/dev/sdb1 1.8T 1.7T 94G 95% /chinaPhotos
/dev/md127p1 932G 402G 530G 44% /home
<snipped loop devices>
/dev/mapper/cr-auto-1 1.9T 1.5T 399G 79% /china2
tmpfs 1.6G 72K 1.6G 1% /run/user/1000
/dev/sdm 3.4M 3.3M 116K 97% /run/media/░░░░/SBEAM
/dev/sdn 1.6M 24K 1.6M 2% /run/media/░░░░/UPDATE
192.168.░░░.░░░:/volume1/DataBackup 2.7T 2.3T 492G 83% /░░░░░
<snipped loop devices>
Finally, I've tried
mdadm --detail /dev/md*
mdadm: /dev/md does not appear to be an md device
/dev/md127:
Version : 1.0
Creation Time : Sun Apr 25 17:58:55 2021
Raid Level : raid1
Array Size : 976761344 (931.51 GiB 1000.20 GB)
Used Dev Size : 976761344 (931.51 GiB 1000.20 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Mar 6 19:26:29 2022
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : any:home
UUID : 75959fa2:f25b6088:7a9e9a80:c1f38480
Events : 3183874
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
- 0 0 1 removed
/dev/md127p1:
Version : 1.0
Creation Time : Sun Apr 25 17:58:55 2021
Raid Level : raid1
Array Size : 976760303 (931.51 GiB 1000.20 GB)
Used Dev Size : 976761344 (931.51 GiB 1000.20 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Sun Mar 6 19:26:29 2022
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Consistency Policy : bitmap
Name : any:home
UUID : 75959fa2:f25b6088:7a9e9a80:c1f38480
Events : 3183874
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
- 0 0 1 removed
The install process identifies /dev/sdd
as the problem but seems to have the same problem as fdisk
below.
fdisk /dev/sdd
Welcome to fdisk (util-linux 2.36.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
The primary GPT table is corrupt, but the backup appears OK, so that will be used.
and
fdisk /dev/sda
Welcome to fdisk (util-linux 2.36.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Command (m for help): v
No errors detected.
Header version: 1.0
Using 1 out of 128 partitions.
A total of 2014 free sectors is available in 1 segment.
The computer is mostly working OK and I've backed up most of /home's data but updates are backing up and I can't reinstall safely.
In terms of hardware, physically removed each of the RAID disks in turn (PC off first, obvs) - one of the disks makes no difference to the boot (all working on reboot) but removing the other stops the boot process, so I guess the second is the working RAID /dev/sda
- /home
disk and the former is the 'broken' /dev/sdd
one! I also tried swapping the broken one to the working one's SATA cable in case that was the problem but nada!
What further diagnostics/actions can I run to see if:
- my supposition about that disk being the faulty one is right
- find out what's wrong with it
- format it or otherwise recover it and add it back into the RAID1 array (or should I get it replaced?)