1

I just installed Fedora 22 fresh, onto system with existing RAID-5 array. Five drives. Kernel reported device errors overnight, the 3 TB XFS filesystem was unmounted, and now after a reboot the array won't assemble.

This is the result of trying to assemble the array:

mdadm --assemble /dev/md0 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 
mdadm: /dev/md0 assembled from 2 drives and 1 spare - not enough to start the array.  

Below is the output of 'mdadm --examine for each of the 5 partitions. I'm not good enough to understand the difference betweeen the event counters and the 'array state' (which is not the same across all devices).

I know not to use '--create', but I hesitate to try '--force' without having someone looking over my shoulder.

Is this array lost? If possibly not, what steps should I take?

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 65f056dc:780db9b5:023c0144:77f12f74
           Name : odin.hudaceks.home:1  (local to host odin.hudaceks.home)
  Creation Time : Thu Sep 18 16:30:47 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953101824 (931.31 GiB 999.99 GB)
     Array Size : 2929651200 (2793.93 GiB 2999.96 GB)
  Used Dev Size : 1953100800 (931.31 GiB 999.99 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1024 sectors
          State : clean
    Device UUID : 3f08354b:c076cddc:99b85968:a8928ea8

    Update Time : Sun Aug  2 22:25:33 2015
       Checksum : 9db3229f - correct
         Events : 6078

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 65f056dc:780db9b5:023c0144:77f12f74
           Name : odin.hudaceks.home:1  (local to host odin.hudaceks.home)
  Creation Time : Thu Sep 18 16:30:47 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953101824 (931.31 GiB 999.99 GB)
     Array Size : 2929651200 (2793.93 GiB 2999.96 GB)
  Used Dev Size : 1953100800 (931.31 GiB 999.99 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1024 sectors
          State : clean
    Device UUID : addd6f2b:fb4c33a6:2a8b152e:e716eba7

    Update Time : Sun Aug  2 22:25:33 2015
       Checksum : c6c2519 - correct
         Events : 6078

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 1
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 65f056dc:780db9b5:023c0144:77f12f74
           Name : odin.hudaceks.home:1  (local to host odin.hudaceks.home)
  Creation Time : Thu Sep 18 16:30:47 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953101824 (931.31 GiB 999.99 GB)
     Array Size : 2929651200 (2793.93 GiB 2999.96 GB)
  Used Dev Size : 1953100800 (931.31 GiB 999.99 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1024 sectors
          State : clean
    Device UUID : d92883c5:0e3ded13:75b11223:f0570e0a

    Update Time : Sun Aug  2 22:21:47 2015
       Checksum : 6b57c6ce - correct
         Events : 6073

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 65f056dc:780db9b5:023c0144:77f12f74
           Name : odin.hudaceks.home:1  (local to host odin.hudaceks.home)
  Creation Time : Thu Sep 18 16:30:47 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953101824 (931.31 GiB 999.99 GB)
     Array Size : 2929651200 (2793.93 GiB 2999.96 GB)
  Used Dev Size : 1953100800 (931.31 GiB 999.99 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1024 sectors
          State : active
    Device UUID : 42eeb231:ccb57477:357d0c47:d99b159d

    Update Time : Sun Aug  2 22:21:51 2015
       Checksum : f21014a5 - correct
         Events : 6074

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)

/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 65f056dc:780db9b5:023c0144:77f12f74
           Name : odin.hudaceks.home:1  (local to host odin.hudaceks.home)
  Creation Time : Thu Sep 18 16:30:47 2014
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 1953101824 (931.31 GiB 999.99 GB)
     Array Size : 2929651200 (2793.93 GiB 2999.96 GB)
  Used Dev Size : 1953100800 (931.31 GiB 999.99 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=1024 sectors
          State : clean
    Device UUID : 93469bf1:9f571d4b:dab66eb4:08c45766

    Update Time : Sun Aug  2 22:25:33 2015
       Checksum : bc477178 - correct
         Events : 6078

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : AA.. ('A' == active, '.' == missing, 'R' == replacing)

EDIT 1: aded info about the controller.

04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
    Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller
    Flags: bus master, fast devsel, latency 0, IRQ 30
    I/O ports at d040 [size=8]
    I/O ports at d030 [size=4]
    I/O ports at d020 [size=8]
    I/O ports at d010 [size=4]
    I/O ports at d000 [size=16]
    Memory at fe510000 (32-bit, non-prefetchable) [size=2K]
    Expansion ROM at fe500000 [disabled] [size=64K]
    Capabilities: [40] Power Management version 3
    Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
    Capabilities: [70] Express Legacy Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: ahci

1 Answers1

0

You have a spare /dev/sdg1, is that correct? If it never was more than a spare, it should hold no data and thus be useless to your recovery efforts.

/dev/sde1 failed and shortly after so did /dev/sdf1, the big question is, why? Are these disks actually bad, did you check SMART and run a self-test? Or was it a controller / cable / power issue that you since fixed?

If you want to play it safe, use this method:

https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

And then assemble --force using dev/sdc1 /dev/sdd1 /dev/sdf1. (The two intact disks plus the one that failed last). If it runs, and the filesystem on the RAID was in use at the time of failure, it will look like after a power loss and may need fsck (all the more reason to do this on a copy-on-write layer, so you can undo if things go wrong).

frostschutz
  • 48,978
  • Thanks for the quick reply. I can't run smart tests against these disks. They are connected to a Startech controller - I'll update the post with the HW info for the controller. – Bill G.H. Aug 03 '15 at 14:43
  • Update: that's a fantastic page! thanks. Going through the steps now. I'll post back after surface scan finishes and I've followed your recipe... – Bill G.H. Aug 03 '15 at 14:51
  • Update: surface test failed - on all drives. Digging in a bit, it looks like it's a kernel bug. Hardware. Marvell controllers don't behave. It's been worked since 2012, May 2014 serious patch went into kernel. Now trying to find out if my 88SE9128 is included in the quirks list. Kernel bug is here for others in the same boat. FYI, the Startech SATA controllers appear o all be Marvell-based, such as my PEXSAT31E1. I'll have to get this sorted before I can work on the RAID issues :-$ – Bill G.H. Aug 03 '15 at 21:28