IBM x3650 M4 – RAID5 HDDs fail with preserved cache retained
Recently one of my older IBM X3650 M4 servers had a failed hdd drive in a RAID5 Volume where Hyper-V hosts the virtual machines.
RAID5 can tolerate one failed drive, a few days later the raid controller set another drive offline and the server was shutting down.
Normally this can happen during a read intensive drive operation and if Protection Information (PI) is enabled for the virtual drive.
This behavior has been corrected in ServeRAID M5100 Series SAS/SATA Controller firmware version 23.22.0-0024.
In my case regarding the MegaRaid Storage Manger, Data Protection is incapable. So no clue why exactly the drive was set offline.
After the shutdown the server didn’t boot anymore.
At boot screen the following message appears.
There are offline or missing virtual drives with preserved cache. Please check the cables and ensure that all drives are present. Press any key to enter the configuration utility.
Enter Your Input Here:*
Preserved cache is cache that remains in the controller cache after a drive goes
offline or missing and that has not been saved to a drive yet.
As you can see in the second screenshot below, you will not be able to start your operating system as long as the preserved cache persists.
In order to boot the server, I first need to bring the drive online again and of course I should also replace the failed hdd drive later.
To bring the offline marked hdd drive online, I need to execute the following steps below.
When click on enter and escape under Enter Your Input Here in the screenshot above, the MegaRaid BIOS Config Utility will get started as shown below.
Here I clicked on Cancel because I don’t want to loose any data.
Click on OK.
In the Locical View of the MegaRAID utility I could see my failed drive in red and further that one drive was set offline in yellow.
The second virtual volume (RAID5) have one failed drive and one is set offline.
Under Logical View select the drive which was set offline and click on it.
Select Make Online and click on OK.
Click on Yes
Click on Back.
The virtual drives in my RAID5 are now in state degraded because I still had one drive which failed.
Now I will exit the MegaRaid Bios Config Utility and try to reboot the server.
Because of the failed hdd drive, the virtual drive from my RAID5 volume is still in degraded mode.
After replacing the failed hdd drive and rebuild the virtual drive is back in optimal state.
ServeRAID M5100 Series: Drive(s) in Protection Information Virtual Disk Marked Offline – Servers