storage It can happen at any time - suddenly you get the message "DegradedArray event on /dev/md1".
What now?
In most cases, the message indicates a failed disk in a RAID array and now it is time to identify the affected disk and to take appropriate action.
Here is a example of a possible message you get when a RAID disk failed:
This is an automatically generated mail message from mdadm
running on server1
A DegradedArray event had been detected on md device /dev/md/1.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [raid0] [raid1] [linear] [multipath] [raid6] [raid5] [raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
1023936 blocks super 1.0 [2/2] [UU]
md1 : active raid1 sdb3[2](F) sda3[0]
1948488512 blocks super 1.0 [2/1] [U_]
unused devices: <none>
From message you can read which Raid array is affected. Here it is /dev/md/1.
Now look deeper and understand the content of md1 block.
md1 is active and uses raid1. The array is build with disk sdb3 and sda3.
[U_] does mean, one of the drives is not synced. In this case sdb3 is the failed disk, because it is marked with a (F).
If you like, you can get more details of the degraded array by using this command:
root@server:~# mdadm --detail /dev/md1
/dev/md1:
Version : 1.0
Creation Time : Tue Aug 5 11:11:40 2016
Raid Level : raid1
Array Size : 1948488512 (1858.22 GiB 1995.25 GB)
Used Dev Size : 1948488512 (1858.22 GiB 1995.25 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Thu Aug 11 10:20:10 2016
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 1
Spare Devices : 0
Name : h2328146:1
UUID : 90e3e3fa:9ef283af:0fd043f0:ee01cc22
Events : 558815
Number Major Minor RaidDevice State
0 8 3 0 active sync /dev/sda3
1 0 0 1 removed
2 8 19 - faulty spare /dev/sdb3
Regardless of whether the disk sdb3 is to be changed or simply re-enabled, it must be removed first from the array md1.
For this, the following command is executed:
root@server:~# mdadm --remove /dev/md1 /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md1
Now the disk can be replaced and the new one re-added by the following command.
root@server:~# mdadm --add /dev/md1 /dev/sdb3
mdadm: re-added /dev/sdb3
After re-add of the disk, the synchronization starts immediately. You can watch the status of synchronisation by executing the following command:
root@server:~# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdb3[2] sda3[0]
1948488512 blocks [2/1] [_U]
[>....................] recovery = 0.1% (2849024/1948488512) finish=455.9min speed=127898K/sec
md0 : active raid1 sdb1[1] sda1[0]
1023936 blocks super 1.0 [2/2] [UU]
unused devices: <none>
You can see it synchronize and you can see how long it takes to finish.