0

Env: Linux TRANQUILITY 5.3.18-150300.59.49-preempt #1 SMP PREEMPT Mon Feb 7 14:40:20 UTC 2022 (77d9d02) x86_64 x86_64 x86_64 GNU/Linux Also OpenSuse Leap 15.3 with KDE Plasma 5

When I run the OS install from a USB stick I get as far as the partition check, where it bombs out with: /usr/bin/udevadm /dev/sdd1 could not be found (failed) - (I can't recall the precise error text, without re-running the install process but this is the gist of it)

Last year (April 2021) I had a problem when I lost my /home data and had to take the non-RAIDed disk to a data recovery company. From memory, I think it turned out to be a GPT error, which was only expensive and not astronomical to fix (happy to pay to get my data safe). I had them transfer all the /home data from that disk to one of two 1TB WD Black disks I bought. I setup RAID-1 with the two new disks and this seemingly worked fine until recently.

Weird things stared happening, I have no idea whether they're related but they seem quite systemic - like:

  1. Firefox tabs started crashing immediately on certain sites, which I thought/think might be a Firefox bug
  2. Python stopped working to the extent that unrelated programs failed (Libre-Office mainly - keeps entering recovery) but FreeCAD (largely Python based) still worked
  3. Corel AftershotPro won't start because of a missing/incorrect version library (which is still there)

Lots of the fixes I tried seemed to come back to Python. In the end I removed Python which uninstalled a lot of key OS stuff (particularly the GUI) and then I re-installed it (along with most of the items that had been auto uninstalled. Even that didn't work, so I thought I'd reinstall the OS - thinking my /home data would be safe. As I mentioned this doesn't appear to be the case and it's stopping my reinstall to stabilise my computer.

lsblk
<snipped the loop devices>
sda             8:0    0 931.5G  0 disk  
└─sda1          8:1    0 931.5G  0 part  
  └─md127       9:127  0 931.5G  0 raid1 
    └─md127p1 259:0    0 931.5G  0 part  /home
sdb             8:16   0   1.8T  0 disk  
└─sdb1          8:17   0   1.8T  0 part  
  └─cr-auto-1 254:0    0   1.8T  0 crypt /china2
sdc             8:32   0 111.8G  0 disk  
├─sdc1          8:33   0   148M  0 part  
├─sdc2          8:34   0   100G  0 part  /
├─sdc3          8:35   0     2G  0 part  
└─sdc4          8:36   0     2G  0 part  
sdd             8:48   0 931.5G  0 disk  
sde             8:64   0   1.8T  0 disk  
└─sde1          8:65   0   1.8T  0 part  /chinaPhotos

Note sda/sda1/md127/md127p1 is mounted here as /home

I'd guess sda & sdd are the contributing physical RAID devices and sdd is the broken one?

I have tried the following: (which shows /home is /dev/md127p1)

df -h
Filesystem                        Size  Used Avail Use% Mounted on
devtmpfs                          4.0M     0  4.0M   0% /dev
tmpfs                             7.8G  243M  7.5G   4% /dev/shm
tmpfs                             3.1G  335M  2.8G  11% /run
tmpfs                             4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/sdc2                         100G   38G   63G  38% /
/dev/sdc2                         100G   38G   63G  38% /.snapshots
/dev/sdc2                         100G   38G   63G  38% /boot/grub2/i386-pc
/dev/sdc2                         100G   38G   63G  38% /boot/grub2/x86_64-efi
/dev/sdc2                         100G   38G   63G  38% /opt
/dev/sdc2                         100G   38G   63G  38% /root
/dev/sdc2                         100G   38G   63G  38% /srv
/dev/sdc2                         100G   38G   63G  38% /tmp
/dev/sdc2                         100G   38G   63G  38% /var
/dev/sdc2                         100G   38G   63G  38% /usr/local
/dev/sdb1                         1.8T  1.7T   94G  95% /chinaPhotos
/dev/md127p1                      932G  402G  530G  44% /home
<snipped loop devices>
/dev/mapper/cr-auto-1             1.9T  1.5T  399G  79% /china2
tmpfs                             1.6G   72K  1.6G   1% /run/user/1000
/dev/sdm                          3.4M  3.3M  116K  97% /run/media/░░░░/SBEAM
/dev/sdn                          1.6M   24K  1.6M   2% /run/media/░░░░/UPDATE
192.168.░░░.░░░:/volume1/DataBackup  2.7T  2.3T  492G  83% /░░░░░
<snipped loop devices>

Finally, I've tried

mdadm --detail /dev/md*  
mdadm: /dev/md does not appear to be an md device
/dev/md127:
           Version : 1.0
     Creation Time : Sun Apr 25 17:58:55 2021
        Raid Level : raid1
        Array Size : 976761344 (931.51 GiB 1000.20 GB)
     Used Dev Size : 976761344 (931.51 GiB 1000.20 GB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent
 Intent Bitmap : Internal

   Update Time : Sun Mar  6 19:26:29 2022
         State : clean, degraded 
Active Devices : 1

Working Devices : 1 Failed Devices : 0 Spare Devices : 0

Consistency Policy : bitmap

          Name : any:home
          UUID : 75959fa2:f25b6088:7a9e9a80:c1f38480
        Events : 3183874

Number   Major   Minor   RaidDevice State
   0       8        1        0      active sync   /dev/sda1
   -       0        0        1      removed

/dev/md127p1: Version : 1.0 Creation Time : Sun Apr 25 17:58:55 2021 Raid Level : raid1 Array Size : 976760303 (931.51 GiB 1000.20 GB) Used Dev Size : 976761344 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent

 Intent Bitmap : Internal

   Update Time : Sun Mar  6 19:26:29 2022
         State : clean, degraded 
Active Devices : 1

Working Devices : 1 Failed Devices : 0 Spare Devices : 0

Consistency Policy : bitmap

          Name : any:home
          UUID : 75959fa2:f25b6088:7a9e9a80:c1f38480
        Events : 3183874

Number   Major   Minor   RaidDevice State
   0       8        1        0      active sync   /dev/sda1
   -       0        0        1      removed

The install process identifies /dev/sdd as the problem but seems to have the same problem as fdisk below.

fdisk /dev/sdd

Welcome to fdisk (util-linux 2.36.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command.

The primary GPT table is corrupt, but the backup appears OK, so that will be used.

and

fdisk  /dev/sda

Welcome to fdisk (util-linux 2.36.2). Changes will remain in memory only, until you decide to write them. Be careful before using the write command.

Command (m for help): v No errors detected. Header version: 1.0 Using 1 out of 128 partitions. A total of 2014 free sectors is available in 1 segment.

The computer is mostly working OK and I've backed up most of /home's data but updates are backing up and I can't reinstall safely.

In terms of hardware, physically removed each of the RAID disks in turn (PC off first, obvs) - one of the disks makes no difference to the boot (all working on reboot) but removing the other stops the boot process, so I guess the second is the working RAID /dev/sda - /home disk and the former is the 'broken' /dev/sdd one! I also tried swapping the broken one to the working one's SATA cable in case that was the problem but nada!

What further diagnostics/actions can I run to see if:

  • my supposition about that disk being the faulty one is right
  • find out what's wrong with it
  • format it or otherwise recover it and add it back into the RAID1 array (or should I get it replaced?)
Greg
  • 115
  • 4

1 Answers1

0

The GPT error is very minor and you can ignore it. It appears that you removed sdd from the array yesterday. Run mdadm -E /dev/sdd1 and add its output to your question. Hopefully that still sees the raid metadata on that drive and confirms that it has recently been removed, in which case you can just use mdadm --re-add to put it back into the array.

Before adding it back to the array though, you can check the health of the drive with smartctl -a /dev/sdd.

As for the other strange symptoms, you might want to run memtest86 to check your ram.

psusi
  • 17,303