Skip to main content

Command Palette

Search for a command to run...

RAID-10: Production-Grade Setup Guide

Updated
โ€ข38 min read
RAID-10: Production-Grade Setup Guide

โš ๏ธ CRITICAL WARNING

THIS TUTORIAL DESTROYS DATA ON SPECIFIED DISKS

  • Use virtual disks or dedicated physical disks only

  • Never use system disks

  • Verify with lsblk before every command

  • Test in VM first

lsblk -o NAME,SIZE,TYPE,MOUNTPOINT

๐Ÿ’ก Understanding RAID-10: Two Different Approaches

What Does "RAID-10" Actually Mean?

RAID-10 literally means RAID 1+0 โ€” mirror first, then stripe:

Traditional RAID-1+0 (Nested):
Step 1: Create RAID-1 mirrors โ†’ Step 2: Stripe those mirrors

mdadm --level=10 (Single Array):
Step 1: Create one array โ†’ Step 2: mdadm handles mirroring via layout

They're NOT the same thing! Here's why it matters:


Method 1: mdadm --level=10 (What This Tutorial Uses)

Single array with automatic mirror placement.

๐Ÿ” CORRECTED: How Mirroring Actually Works with near=2

With 4 disks and near=2 layout:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Disk 0  โ”‚ Disk 1  โ”‚ Disk 2  โ”‚ Disk 3  โ”‚
โ”‚ (sda1)  โ”‚ (sdb1)  โ”‚ (sdc1)  โ”‚ (sdd1)  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   A1    โ”‚   A1    โ”‚   B1    โ”‚   B1    โ”‚  โ† Mirrors horizontally (0โ†”1, 2โ†”3)
โ”‚   A2    โ”‚   A2    โ”‚   B2    โ”‚   B2    โ”‚
โ”‚   A3    โ”‚   A3    โ”‚   B3    โ”‚   B3    โ”‚
โ”‚   A4    โ”‚   A4    โ”‚   B4    โ”‚   B4    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

How mdadm actually decides mirror pairs:

CORRECT Mirror Pairing:

  • Mirror Pair 1 โ†’ Disk 0 โ†” Disk 1 โ†’ /dev/sda1 โ†” /dev/sdb1

  • Mirror Pair 2 โ†’ Disk 2 โ†” Disk 3 โ†’ /dev/sdc1 โ†” /dev/sdd1

Explanation:

  • near=2 means each data block is stored twice ("2 copies")

  • mdadm places mirrors on adjacent disk positions (0โ†”1, 2โ†”3)

  • The "A" data is striped across positions 0-1, "B" data across positions 2-3

  • Both stripe sets work together for performance

Understanding set-A and set-B Labels:

When you see this in mdadm --detail:

Number   Major   Minor   RaidDevice State
   0       8        1        0      active sync set-A   /dev/sda1
   1       8       17        1      active sync set-B   /dev/sdb1
   2       8       33        2      active sync set-A   /dev/sdc1
   3       8       49        3      active sync set-B   /dev/sdd1

What set-A and set-B actually mean:

  • set-A = First position in each mirror pair (positions 0 and 2)

  • set-B = Second position in each mirror pair (positions 1 and 3)

  • These labels indicate striping roles, NOT mirror partners

  • The actual mirrors are: 0โ†”1 and 2โ†”3

Failure Tolerance (CORRECTED):

โœ… Can survive:

  • Loss of Disk 0 OR Disk 1 (not both) โ†’ Mirror Pair 1 survives

  • Loss of Disk 2 OR Disk 3 (not both) โ†’ Mirror Pair 2 survives

  • Loss of Disk 0 AND Disk 2 โ†’ โœ… Both pairs still have one disk

  • Loss of Disk 0 AND Disk 3 โ†’ โœ… Both pairs still have one disk

  • Loss of Disk 1 AND Disk 2 โ†’ โœ… Both pairs still have one disk

  • Loss of Disk 1 AND Disk 3 โ†’ โœ… Both pairs still have one disk

โŒ Will fail:

  • Loss of Disk 0 AND Disk 1 โ†’ โŒ Mirror Pair 1 completely lost

  • Loss of Disk 2 AND Disk 3 โ†’ โŒ Mirror Pair 2 completely lost

Summary: You can lose up to 2 disks IF they're from different mirror pairs (0โ†”1 or 2โ†”3).


Method 2: True RAID-1+0 (Nested Arrays)

User-controlled mirror pairs + explicit striping.

Step 1: Create Two RAID-1 Mirrors
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  sda1   โ”‚  sdc1   โ”‚         โ”‚  sdb1   โ”‚  sdd1   โ”‚
โ”‚   A1    โ”‚   A1    โ”‚         โ”‚   B1    โ”‚   B1    โ”‚
โ”‚   A2    โ”‚   A2    โ”‚         โ”‚   B2    โ”‚   B2    โ”‚
โ”‚   A3    โ”‚   A3    โ”‚         โ”‚   B3    โ”‚   B3    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    md1 (RAID-1)                  md2 (RAID-1)

Step 2: Stripe the Mirrors
         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
         โ”‚ md0 (RAID-0) โ”‚
         โ”‚  Stripes:    โ”‚
         โ”‚  md1 + md2   โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Commands (nested approach):

# Create two RAID-1 mirrors
mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sda1 /dev/sdc1
mdadm --create /dev/md2 --level=1 --raid-devices=2 /dev/sdb1 /dev/sdd1

# Stripe them
mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/md1 /dev/md2

Failure tolerance:

  • โœ… GUARANTEED to survive one disk from each mirror

  • You chose which disks mirror each other

  • Lose sda1 AND sdb1? โœ… Still works (different mirrors)

  • More control, more complexity


Method Comparison Table

Featuremdadm --level=10True RAID-1+0 (Nested)
Command ComplexitySimple (one command)Complex (three commands)
Mirror ControlAutomatic (adjacent pairs 0โ†”1, 2โ†”3)Manual (you choose pairs)
Best CaseSurvive 2 disk failuresSurvive 2 disk failures
Worst CaseโŒ Fail with 2 disk losses (same pair)โœ… Survive 2 disk losses (guaranteed)
GuaranteeProbabilistic (50% chance)Guaranteed (100% if different mirrors)
PerformanceExcellentExcellent
ManagementSingle arrayMultiple arrays
Use CaseTesting, general purpose, most productionMission-critical databases

Which Method Does This Tutorial Use?

This tutorial uses mdadm --level=10 because:

โœ… Simpler to learn and demonstrate
โœ… Good for understanding RAID-10 concepts
โœ… Sufficient for most use cases
โœ… Single array management

โš ๏ธ When to use nested RAID-1+0 instead:

  • Financial transaction databases

  • VM storage requiring guaranteed uptime

  • Any system where two disk failures must be survivable

  • Production systems with same-batch disks (higher correlated failure risk)


Math: Usable Capacity

Raw capacity    = 4 disks ร— 2GB = 8GB
Mirror overhead = 50% (everything duplicated)
Usable capacity = 8GB รท 2 = 4GB

Formula: (Total Disks รท 2) ร— Disk Size

This applies to BOTH methods โ€” you always lose 50% to mirroring.


๐Ÿ” Step 1: Verify Available Disks

lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE

Output:

NAME             SIZE TYPE MOUNTPOINT
sda                2G disk 
sdb                2G disk 
sdc                2G disk 
sdd                2G disk 
sde                2G disk 
sdf                2G disk 
vda               20G disk

Requirements:

  • Minimum 4 disks (must be even number)

  • Same size strongly recommended

  • Not mounted anywhere

  • Not your system disk!

Example for this tutorial:

  • /dev/sda through /dev/sdd โ†’ RAID-10 array (4 disks)

  • /dev/sde, /dev/sdf โ†’ Hot spares


๐Ÿ”ง Step 2: Partition Disks Properly

Why Partitioning Matters

Don't skip this! Using raw disks (/dev/sda) instead of partitions (/dev/sda1) causes:

  • Boot loader conflicts

  • Disk identification issues

  • Problems with disk replacement

Create Partition on First Disk

sudo fdisk /dev/sda

Inside fdisk (type exactly):

Command: n         โ† Create new partition
Type: p            โ† Primary partition
Number: 1          โ† Partition number 1
First sector: [ENTER]   โ† Use default (starts at beginning)
Last sector: [ENTER]    โ† Use default (uses entire disk)

Command: t         โ† Change partition type
Hex code: fd       โ† Linux RAID autodetect (legacy but works)
Command: w         โ† Write changes and exit

Note: Modern systems can use 83 (Linux) or 8e (Linux LVM) instead of fd. Both work fine with mdadm 3.0+.

Critical: Force Kernel to Update

This step prevents "partition doesn't exist" errors:

sudo partprobe /dev/sda 2>/dev/null || true
sudo sync
sleep 1

What this does:

  • partprobe โ†’ Tells kernel to re-read partition table

  • sync โ†’ Flushes disk caches

  • sleep 1 โ†’ Gives kernel time to process

Repeat for All Disks

# Disk sdb
sudo fdisk /dev/sdb
# (n, p, 1, ENTER, ENTER, t, fd, w)
sudo partprobe /dev/sdb 2>/dev/null || true
sudo sync && sleep 1

# Disk sdc
sudo fdisk /dev/sdc
# (n, p, 1, ENTER, ENTER, t, fd, w)
sudo partprobe /dev/sdc 2>/dev/null || true
sudo sync && sleep 1

# Disk sdd
sudo fdisk /dev/sdd
# (n, p, 1, ENTER, ENTER, t, fd, w)
sudo partprobe /dev/sdd 2>/dev/null || true
sudo sync && sleep 1

# Spare disk sde
sudo fdisk /dev/sde
# (n, p, 1, ENTER, ENTER, t, fd, w)
sudo partprobe /dev/sde 2>/dev/null || true
sudo sync && sleep 1

# Spare disk sdf
sudo fdisk /dev/sdf
# (n, p, 1, ENTER, ENTER, t, fd, w)
sudo partprobe /dev/sdf 2>/dev/null || true
sudo sync && sleep 1

Verify Partitions Exist

lsblk -o NAME,SIZE,TYPE,FSTYPE
ls -la /dev/sd{a,b,c,d,e,f}1

Expected output:

root@rhel:~# lsblk -o NAME,SIZE,TYPE,FSTYPE
NAME             SIZE TYPE FSTYPE
sda                2G disk 
โ””โ”€sda1             2G part 
sdb                2G disk 
โ””โ”€sdb1             2G part 
sdc                2G disk 
โ””โ”€sdc1             2G part 
sdd                2G disk 
โ””โ”€sdd1             2G part 
sde                2G disk 
โ””โ”€sde1             2G part 
sdf                2G disk 
โ””โ”€sdf1             2G part 


root@rhel:~# ls -la /dev/sd{a,b,c,d,e,f}1
brw-rw----. 1 root disk 8,  1 Oct 30 11:21 /dev/sda1
brw-rw----. 1 root disk 8, 17 Oct 30 11:21 /dev/sdb1
brw-rw----. 1 root disk 8, 33 Oct 30 11:21 /dev/sdc1
brw-rw----. 1 root disk 8, 49 Oct 30 11:21 /dev/sdd1
brw-rw----. 1 root disk 8, 65 Oct 30 11:21 /dev/sde1
brw-rw----. 1 root disk 8, 81 Oct 30 11:21 /dev/sdf1
root@rhel:~#

If partitions don't appear: Run partprobe and sync again.


๐Ÿ“ฆ Step 3: Install mdadm

Debian/Ubuntu:

sudo apt update
sudo apt install mdadm -y

RHEL/CentOS/Rocky/AlmaLinux:

sudo dnf install mdadm -y

Verify installation:

mdadm --version

Expected: mdadm - v4.x - ...


๐Ÿ—๏ธ Step 4: Create RAID-10 Array

The Critical Command

sudo mdadm --create --verbose /dev/md0 \
  --level=10 \
  --raid-devices=4 \
  --bitmap=internal \
  /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Understanding Each Parameter

ParameterMeaningWhy It Matters
--create /dev/md0Create new array named md0Standard naming convention
--level=10RAID-10 (mirrored stripe)Balance of speed + safety
--raid-devices=4Use exactly 4 disksMinimum for RAID-10
--bitmap=internalTrack changed blocksPrevents full resync after crash
Device ordersda1 sdb1 sdc1 sdd1Creates pairs: 0โ†”1 and 2โ†”3

Why --bitmap=internal Is Mandatory

Without bitmap:

Power loss โ†’ Unclean shutdown โ†’ mdadm doesn't know which blocks changed
Result: Full array rescan (hours or days)

With bitmap:

Power loss โ†’ mdadm checks bitmap โ†’ Only resync changed blocks
Result: Minutes of recovery time

Trade-offs:

  • Overhead: ~1MB per 256GB of array size

  • Slight write penalty: ~1-3% (negligible)

  • In production, always use --bitmap=internal

What Happens Next

Prompt:

mdadm: layout defaults to n2
Continue creating array?

Type: y then press ENTER

What n2 means:

  • n = "near" layout (mirrors are physically adjacent positions)

  • 2 = 2 copies of each block

  • Result: Positions (0,1) mirror each other, (2,3) mirror each other

Monitor Initial Synchronization

watch -n 2 cat /proc/mdstat

During sync:

md0 : active raid10 sdd1[3] sdc1[2] sdb1[1] sda1[0]
      4190208 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      [===>.................]  resync = 15.3% (642560/4190208) finish=1.2min speed=45000K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

What to look for:

  • [4/4] [UUUU] โ†’ All 4 disks active

  • bitmap: 1/1 pages โ†’ Write-intent bitmap working

  • resync = 15.3% โ†’ Initial sync in progress

Press Ctrl+C when resync reaches 100%

Or wait automatically:

while grep -q resync /proc/mdstat; do sleep 2; done
echo "โœ“ Sync complete"

Verify Array Configuration

sudo mdadm --detail /dev/md0

Expected output:

root@rhel:~# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu Oct 30 11:23:43 2025
        Raid Level : raid10
        Array Size : 4188160 (3.99 GiB 4.29 GB)
     Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Oct 30 11:23:47 2025
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : rhel:0  (local to host rhel)
              UUID : eec6fc91:3e2b911f:37dd1dda:0b661777
            Events : 17

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

๐Ÿ” Understanding the Mirror Pairs

RaidDevice Position    Disk        Mirror Partner
       0               /dev/sda1   โ”
       1               /dev/sdb1   โ”˜ Mirror each other (Pair 1)

       2               /dev/sdc1   โ”
       3               /dev/sdd1   โ”˜ Mirror each other (Pair 2)

Note: The set-A / set-B labels indicate striping positions, not mirror partners.
What matters: Position mapping (0โ†”1, 2โ†”3) โ€” that's how mirroring works in RAID-10 near=2.

Quick Mirror Pair Verification (Run Anytime)

Want to see mirror pairs instantly? Use this:

#!/bin/bash
echo "Mirror Pairs (RAID-10 near=2):"
sudo mdadm --detail /dev/md0 2>/dev/null | awk '
  /^[[:space:]]*[0-9]+[[:space:]]+8/ {
    slot = $1; dev = $8; pair = int(slot / 2)
    role = (slot % 2 == 0) ? "set-A" : "set-B"
    printf "  %-8s โ†’ Pair %d (position %d, %s)\n", dev, pair, slot, role
  }
' | sort -n -k6

Expected output:

root@rhel:~# #!/bin/bash
echo "Mirror Pairs (RAID-10 near=2):"
sudo mdadm --detail /dev/md0 2>/dev/null | awk '
  /^[[:space:]]*[0-9]+[[:space:]]+8/ {
    slot = $1; dev = $8; pair = int(slot / 2)
    role = (slot % 2 == 0) ? "set-A" : "set-B"
    printf "  %-8s โ†’ Pair %d (position %d, %s)\n", dev, pair, slot, role
  }
' | sort -n -k6

Mirror Pairs (RAID-10 near=2):
  /dev/sda1 โ†’ Pair 0 (position 0, set-A)
  /dev/sdb1 โ†’ Pair 0 (position 1, set-B)
  /dev/sdc1 โ†’ Pair 1 (position 2, set-A)
  /dev/sdd1 โ†’ Pair 1 (position 3, set-B)
root@rhel:~#

What this shows:

  • Pair 0 = /dev/sda1 and /dev/sdb1 mirror each other

  • Pair 1 = /dev/sdc1 and /dev/sdd1 mirror each other

  • Positions 0&1 are one mirror group, positions 2&3 are another

โœ… This proves the horizontal (0โ†”1, 2โ†”3) pairing, not vertical (0โ†”2, 1โ†”3)!


๐Ÿ’พ Step 5: Make Configuration Persistent

Why This Step Matters

Without saving the configuration:

  • Array won't assemble automatically after reboot

  • You'll have to manually reassemble with mdadm --assemble

  • System might not boot if it expects the array

Save Array Configuration

Debian/Ubuntu:

sudo mdadm --detail --scan | sudo tee -a /etc/mdadm/mdadm.conf

RHEL/CentOS/Rocky/AlmaLinux:

sudo mdadm --detail --scan | sudo tee -a /etc/mdadm.conf

Verify it saved correctly

# Debian/Ubuntu
sudo cat /etc/mdadm/mdadm.conf | grep md0

# RHEL/CentOS
sudo cat /etc/mdadm.conf | grep md0

Expected output:

root@rhel:~# sudo mdadm --detail --scan  | sudo tee -a /etc/mdadm.conf
ARRAY /dev/md0 metadata=1.2 UUID=eec6fc91:3e2b911f:37dd1dda:0b661777

Update Boot System (Critical!)

Debian/Ubuntu:

sudo update-initramfs -u

RHEL/CentOS:

sudo dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)

What this does:

  • Embeds mdadm configuration into boot image

  • Ensures array assembles before root filesystem mounts

  • Required for arrays that contain system files


๐Ÿ“‚ Step 6: Create Optimized Filesystem

Why Alignment Matters

Misaligned filesystem = 20-30% performance loss

Without Alignment:
Write 1MB file โ†’ Crosses chunk boundaries โ†’ Extra reads โ†’ Slower

With Alignment:
Write 1MB file โ†’ Fits within chunks โ†’ Direct writes โ†’ Faster

Calculate Alignment Parameters

For RAID-10 with 512K chunks:

Filesystem block size = 4KB (4096 bytes)
RAID chunk size = 512KB (524288 bytes)

Stride = Chunk Size รท Block Size
       = 524288 รท 4096
       = 128 blocks

Stripe-width = Stride ร— Number of Data Disks
             = 128 ร— 2
             = 256 blocks

Why ร— 2?
RAID-10 with 4 disks has 2 data disks actively storing unique data (the other 2 hold mirrors).

Corrected terminology: Use "data disks" instead of "stripe groups"

Create Filesystem with Proper Alignment

sudo mkfs.ext4 \
  -L SPEED_RAID10 \
  -b 4096 \
  -E stride=128,stripe-width=256 \
  /dev/md0

Parameter breakdown:

ParameterValuePurpose
-L SPEED_RAID10LabelEasy to identify in df and mount
-b 40964K blocksMatches modern 4K sector disks
-E stride=128128 blocksAligns writes to chunk boundaries
-E stripe-width=256256 blocksAligns to full stripe across data disks

Expected output:

root@rhel:~# sudo mkfs.ext4 \
  -L SPEED_RAID10 \
  -b 4096 \
  -E stride=128,stripe-width=256 \
  /dev/md0
mke2fs 1.47.1 (20-May-2024)
/dev/md0 contains a ext4 file system labelled 'SPEED_RAID10'
    last mounted on /mnt/raid10 on Wed Oct 29 18:03:04 2025
Proceed anyway? (y,N) y
Discarding device blocks: done                            
Creating filesystem with 1047040 4k blocks and 262144 inodes
Filesystem UUID: 5979f165-f4e5-45c5-8603-f07f6810a62c
Superblock backups stored on blocks: 
    32768, 98304, 163840, 229376, 294912, 819200, 884736

Allocating group tables: done                            
Writing inode tables: done                            
Creating journal (16384 blocks): done
Writing superblocks and filesystem accounting information: done 

root@rhel:~#

Mount the Array

sudo mkdir -p /mnt/raid10
sudo mount -o noatime,nodiratime /dev/md0 /mnt/raid10

Mount options explained:

  • noatime โ†’ Don't update file access times (reduces writes)

  • nodiratime โ†’ Don't update directory access times (faster listings)

  • Performance impact: 5-15% faster in read-heavy workloads

Verify Mount

df -hT /mnt/raid10
mount | grep raid10

Output:

root@rhel:~# df -hT /mnt/raid10
Filesystem     Type  Size  Used Avail Use% Mounted on
/dev/md0       ext4  3.9G   24K  3.7G   1% /mnt/raid10
root@rhel:~# mount | grep raid10
/dev/md0 on /mnt/raid10 type ext4 (rw,noatime,nodiratime,seclabel,stripe=256)
root@rhel:~#

โœ… Check for stripe=256 โ€” confirms alignment is active.


โšก Step 7: Optimize Rebuild Performance

Why This Matters

Default rebuild speed = too slow for modern hardware

Default: 200 MB/s max
Modern SSD: Can handle 500+ MB/s
Result: Rebuild takes 5ร— longer than necessary

During rebuild, array is vulnerable โ€” faster rebuild = safer.

Set Permanent Rebuild Speeds

sudo tee /etc/sysctl.d/99-raid.conf > /dev/null <<EOF
# RAID rebuild speed optimization
dev.raid.speed_limit_min = 50000
dev.raid.speed_limit_max = 500000
EOF

Apply immediately:

sudo sysctl -p /etc/sysctl.d/99-raid.conf

Expected output:

dev.raid.speed_limit_min = 50000
dev.raid.speed_limit_max = 500000

My Output:

root@rhel:~# sudo sysctl -p /etc/sysctl.d/99-raid.conf
dev.raid.speed_limit_min = 100000
dev.raid.speed_limit_max = 500000
root@rhel:~#

Verify:

cat /proc/sys/dev/raid/speed_limit_min
cat /proc/sys/dev/raid/speed_limit_max

Understanding the Values

HardwareMin (KB/s)Max (KB/s)Why
HDD50000200000Avoid starving apps during rebuild
SATA SSD100000500000Can handle full speed safely
NVMe SSD2000001000000Only if system is mostly idle

Explanation:

  • speed_limit_min = Guaranteed minimum rebuild progress

  • speed_limit_max = Cap to prevent I/O starvation

  • Higher values = faster rebuild BUT less responsive system

โš ๏ธ Don't set too high: Rebuild will starve normal I/O operations.

Note: Values like 1000000-2000000 (1-2 GB/s) shown in some examples are too aggressive for most systems.


๐Ÿ” Step 8: Verify Array Health

sudo mdadm --detail /dev/md0

Healthy array checklist:

root@rhel:~# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu Oct 30 11:23:43 2025
        Raid Level : raid10
        Array Size : 4188160 (3.99 GiB 4.29 GB)
     Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Oct 30 12:45:21 2025
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : rhel:0  (local to host rhel)
              UUID : eec6fc91:3e2b911f:37dd1dda:0b661777
            Events : 17

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync set-A   /dev/sda1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1
root@rhel:~#

Check bitmap location:

cat /sys/block/md0/md/bitmap/location

Expected: +8 or +1024 (not none)

If shows none: Bitmap is disabled โ€” recreate array with --bitmap=internal.


๐Ÿงช Step 9: Test With Data

Create Test Files

# Small text file
echo "RAID-10 Performance Test" | sudo tee /mnt/raid10/test.txt

# Large file (100MB with progress)
sudo dd if=/dev/zero of=/mnt/raid10/speedtest.dat \
  bs=1M count=100 oflag=direct status=progress

What oflag=direct does:

  • Bypasses OS cache

  • Forces direct writes to disk

  • Shows true RAID performance

Expected output:

root@rhel:~# echo "RAID-10 Performance Test" | sudo tee /mnt/raid10/test.txt
RAID-10 Performance Test
root@rhel:~# sudo dd if=/dev/zero of=/mnt/raid10/speedtest.dat \
  bs=1M count=100 oflag=direct status=progress
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.300927 s, 348 MB/s
root@rhel:~#

Verify Files

ls -lh /mnt/raid10/
cat /mnt/raid10/test.txt

Output:

root@rhel:~# ls -lh /mnt/raid10/
cat /mnt/raid10/test.txt
total 101M
drwx------. 2 root root  16K Oct 30 12:44 lost+found
-rw-r--r--. 1 root root 100M Oct 30 12:50 speedtest.dat
-rw-r--r--. 1 root root   25 Oct 30 12:49 test.txt

RAID-10 Performance Test
root@rhel:~#

Measure Performance

# Write speed
sudo dd if=/dev/zero of=/mnt/raid10/write_test \
  bs=1M count=500 oflag=direct status=progress

# Read speed
sudo dd if=/mnt/raid10/write_test of=/dev/null \
  bs=1M iflag=direct status=progress

Expected (RAID-10 with 4 disks):

  • Write: 1.5-2ร— single disk speed

  • Read: 2-3ร— single disk speed

Cleanup

sudo rm /mnt/raid10/speedtest.dat /mnt/raid10/write_test

๐Ÿ’ฅ Step 10: Simulate Disk Failure

Mark Disk as Failed

sudo mdadm --manage /dev/md0 --fail /dev/sda1

What happens:

  • mdadm marks disk as failed immediately

  • Mirror partner (/dev/sdb1) continues serving data

  • Array enters "degraded" state

Check Array Status

cat /proc/mdstat

Output:

oot@rhel:~# cat /proc/mdstat
Personalities : [raid10] 
md0 : active raid10 sdd1[3] sdc1[2] sdb1[1] sda1[0](F)
      4188160 blocks super 1.2 512K chunks 2 near-copies [4/3] [_UUU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>
root@rhel:~#

Indicators:

  • sda1[0](F) โ†’ Failed disk

  • [4/3] โ†’ 4 total, 3 working

  • [_UUU] โ†’ Position 0 failed, others OK

Verify Data Is Still Accessible

cat /mnt/raid10/test.txt
ls -la /mnt/raid10/

Output:

root@rhel:~# cat /mnt/raid10/test.txt
RAID-10 Performance Test

root@rhel:~# ls -la /mnt/raid10/
total 24
drwxr-xr-x. 3 root root  4096 Oct 30 12:51 .
drwxr-xr-x. 3 root root    20 Oct 29 18:02 ..
drwx------. 2 root root 16384 Oct 30 12:44 lost+found
-rw-r--r--. 1 root root    25 Oct 30 12:49 test.txt
root@rhel:~#

โœ… Still works! Data served from mirror (/dev/sdb1).

Check Detailed Status

sudo mdadm --detail /dev/md0

Shows:

root@rhel:~# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu Oct 30 11:23:43 2025
        Raid Level : raid10
        Array Size : 4188160 (3.99 GiB 4.29 GB)
     Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Oct 30 12:51:39 2025
             State : clean, degraded 
    Active Devices : 3
   Working Devices : 3
    Failed Devices : 1
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : rhel:0  (local to host rhel)
              UUID : eec6fc91:3e2b911f:37dd1dda:0b661777
            Events : 21

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

       0       8        1        -      faulty   /dev/sda1

Remove Failed Disk

sudo mdadm --manage /dev/md0 --remove /dev/sda1

Output:

mdadm: hot removed /dev/sda1 from /dev/md0

Verify removal:

sudo mdadm --detail /dev/md0 | grep State

Shows:

State : clean, degraded

๐Ÿ”ง Step 11: Replace Failed Disk

Add Replacement Disk

sudo mdadm --manage /dev/md0 --add /dev/sde1

Output:

mdadm: added /dev/sde1

What happens:

  • mdadm detects array is degraded

  • Automatically starts rebuilding to /dev/sde1

  • /dev/sde1 becomes active member after rebuild

Monitor Rebuild Progress

watch -n 2 'cat /proc/mdstat'

During rebuild:

md0 : active raid10 sde1[4] sdd1[3] sdc1[2] sdb1[1]
      4190208 blocks super 1.2 512K chunks 2 near-copies [4/3] [_UUU]
      [====>................]  recovery = 23.5% (986112/4190208) finish=0.8min speed=45000K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

Progress indicators:

  • sde1[4] โ†’ New disk (position 4 = rebuilding to position 0)

  • [4/3] โ†’ 4 total, 3 fully synced (rebuild in progress)

  • recovery = 23.5% โ†’ Current progress

  • finish=0.8min โ†’ Estimated time remaining

  • speed=45000K/sec โ†’ Current rebuild speed

Press Ctrl+C when complete.

Verify Recovery Complete

sudo mdadm --detail /dev/md0

Should show:

root@rhel:~# sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Thu Oct 30 11:23:43 2025
        Raid Level : raid10
        Array Size : 4188160 (3.99 GiB 4.29 GB)
     Used Dev Size : 2094080 (2045.00 MiB 2144.34 MB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Thu Oct 30 12:53:47 2025
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : near=2
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : rhel:0  (local to host rhel)
              UUID : eec6fc91:3e2b911f:37dd1dda:0b661777
            Events : 41

    Number   Major   Minor   RaidDevice State
       4       8       65        0      active sync set-A   /dev/sde1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

โœ… Note: /dev/sde1 took position 0 (where /dev/sda1 was).


๐Ÿ† Step 12: Add Hot Spare

What Is a Hot Spare?

Hot spare = standby disk that automatically activates on failure

Normal operation:    Disk fails:           Auto-rebuild:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”              โ”Œโ”€โ”€โ”€โ”€โ”€โ”
โ”‚sde1 โ”‚              โ”‚sde1 โ”‚ โœ—            โ”‚spareโ”‚ โ†’ activated
โ”‚sdb1 โ”‚              โ”‚sdb1 โ”‚              โ”‚sdb1 โ”‚ โ† rebuilding
โ”‚sdc1 โ”‚              โ”‚sdc1 โ”‚              โ”‚sdc1 โ”‚
โ”‚sdd1 โ”‚              โ”‚sdd1 โ”‚              โ”‚sdd1 โ”‚
โ”‚spareโ”‚ (idle)       โ”‚spareโ”‚ โ†’ activates  โ””โ”€โ”€โ”€โ”€โ”€โ”˜
โ””โ”€โ”€โ”€โ”€โ”€โ”˜              โ””โ”€โ”€โ”€โ”€โ”€โ”˜

Benefits:

  • โœ… Zero downtime for disk replacement

  • โœ… Rebuild starts immediately (no human intervention)

  • โœ… Array never stays degraded

Add Spare Disk

sudo mdadm --manage /dev/md0 --add-spare /dev/sdf1

Output:

mdadm: added /dev/sdf1

Verify Spare Added

sudo mdadm --detail /dev/md0 | tail -10

Output:

  root@rhel:~# sudo mdadm --detail /dev/md0 | tail -10
              UUID : eec6fc91:3e2b911f:37dd1dda:0b661777
            Events : 42

    Number   Major   Minor   RaidDevice State
       4       8       65        0      active sync set-A   /dev/sde1
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

       5       8       81        -      spare   /dev/sdf1
root@rhel:~#

Look for: spare in State column

Test Automatic Failover

Simulate another failure:

sudo mdadm --manage /dev/md0 --fail /dev/sde1

Check immediately:

cat /proc/mdstat

Output (progressing rapidly):

md0 : active raid10 sdf1[5] sde1[4](F) sdd1[3] sdc1[2] sdb1[1]
      4188160 blocks super 1.2 512K chunks 2 near-copies [4/3] [_UUU]
      [======>..............]  recovery = 32.3% (677056/2094080) finish=0.0min speed=677056K/sec
      bitmap: 0/1 pages [0KB], 65536KB chunk

What happened:

  1. โš ๏ธ /dev/sde1 marked as failed

  2. โšก /dev/sdf1 (spare) automatically activated

  3. ๐Ÿ”„ Rebuild started immediately (no manual intervention!)

After rebuild completes:

md0 : active raid10 sdf1[5] sde1[4](F) sdd1[3] sdc1[2] sdb1[1]
      4188160 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

Verify Spare Activation

sudo mdadm --detail /dev/md0 | tail -10

Output:

              UUID : de5845c1:f2b6a4ab:87bc4816:6ba93b9d
            Events : 59

    Number   Major   Minor   RaidDevice State
       5       8       81        0      active sync set-A   /dev/sdf1  โ† Spare became active!
       1       8       17        1      active sync set-B   /dev/sdb1
       2       8       33        2      active sync set-A   /dev/sdc1
       3       8       49        3      active sync set-B   /dev/sdd1

       4       8       65        -      faulty   /dev/sde1  โ† Old disk failed

Remove failed disk

sudo mdadm --manage /dev/md0 --remove /dev/sde1

โœ… This is why hot spares are critical in production.


๐Ÿ“Š Step 13: Set Up Monitoring

Why Monitoring Matters

RAID arrays fail silently:

  • Disk starts having errors โ†’ No immediate notification

  • Second disk fails โ†’ Data loss

  • Bitrot corrupts data gradually โ†’ Undetected until too late

Proper monitoring prevents disasters.

In production systems, you need to know when disks fail BEFORE you lose data.

Check RAID Health Regularly

# Quick status check
cat /proc/mdstat

# Detailed health report
sudo mdadm --detail /dev/md0

# Check all RAID arrays
sudo mdadm --detail --scan

Set Up Automated Daily Checks (Optional)

# Edit crontab
sudo crontab -e

# Add this line (checks every day at 2 AM)
0 2 * * * /usr/sbin/mdadm --detail --scan > /var/log/raid-check.log 2>&1

Check Disk Health with SMART

# Install smartmontools if not already installed
sudo apt install smartmontools -y  # Debian/Ubuntu
sudo dnf install smartmontools -y  # RHEL/CentOS

# Check individual disk health
sudo smartctl -a /dev/sda
sudo smartctl -a /dev/sdb
sudo smartctl -a /dev/sdc
sudo smartctl -a /dev/sdd
sudo smartctl -a /dev/sde
sudo smartctl -a /dev/sdf

Look for:

  • SMART Health Status: OK = Good

  • Reallocated_Sector_Ct = Should be 0 or very low

  • Current_Pending_Sector = Should be 0

๐Ÿ’ก Note for Virtual Machines

SMART doesn't work on virtual drives. For VMs, you can:

  1. Monitor the host's physical disks (from the hypervisor, not the VM)

  2. Use software-level checks inside the VM:

# Non-destructive read-only test (safe, shown with -s for progress)
sudo badblocks -sv /dev/sda
sudo badblocks -sv /dev/sdb
sudo badblocks -sv /dev/sdc
sudo badblocks -sv /dev/sdd
sudo badblocks -sv /dev/sde
sudo badblocks -sv /dev/sdf

Expected output (healthy disk):

Checking blocks 0 to 2097151
Checking for bad blocks (read-only test): done                                                 
Pass completed, 0 bad blocks found. (0/0/0 errors)

โš ๏ธ Warning: Never use badblocks -w (write test) on production data โ€” it's destructive.


๐Ÿ Step 14: Configure Auto-Mount at Boot

Why This Is Critical

Without auto-mount:

  • Array exists but isn't usable after reboot

  • Applications can't access data

  • Manual intervention required every boot

Get Array UUID

sudo blkid /dev/md0 -s UUID -o value

Example output:

a1b2c3d4-e5f6-7890-abcd-ef1234567890

Copy this UUID โ€” you'll need it next.

Add to fstab

sudo nano /etc/fstab

Add this line at the end (replace with your UUID):

UUID=a1b2c3d4-e5f6-7890-abcd-ef1234567890  /mnt/raid10  ext4  defaults,noatime,nodiratime,nofail  0  2

Understanding Each Field

FieldValuePurpose
UUID=...Your array's UUIDIdentifies array uniquely
/mnt/raid10Mount pointWhere array appears
ext4Filesystem typeTells kernel how to read it
defaults,noatime,nodiratimeMount optionsPerformance optimization
nofailCRITICAL!System boots even if array fails
0Dump frequency0 = don't backup with dump
2fsck order2 = check after root filesystem

Understanding nofail (Critical!)

Without nofail:

Boot โ†’ Wait for RAID โ†’ RAID doesn't assemble โ†’ System hangs forever
Result: Unbootable system, requires rescue mode

With nofail:

Boot โ†’ Wait for RAID โ†’ RAID doesn't assemble โ†’ Continue booting anyway
Result: System accessible, you can fix RAID issue

โœ… Always use nofail for non-root RAID arrays.

Test Auto-Mount Without Rebooting

# Unmount array
sudo umount /mnt/raid10

# Test fstab entry
sudo mount -a

# Verify it mounted
df -h | grep raid10

Output:

root@rhel:~# df -h | grep raid10
/dev/md0               3.9G   28K  3.7G   1% /mnt/raid10
root@rhel:~#

If error occurs: Check fstab syntax, verify UUID matches.

Test Reboot (Optional)

sudo reboot

After reboot:

df -h | grep raid10
cat /mnt/raid10/test.txt

Output:

/dev/md0               3.9G   28K  3.7G   1% /mnt/raid10
RAID-10 Performance Test

โœ… Should work automatically.


๐Ÿงน Step 15: Complete Cleanup (Lab Only)

โš ๏ธ WARNING: THIS DESTROYS THE ARRAY AND ALL DATA

Only do this in test/lab environments!

Stop Using the Array

# Unmount filesystem
sudo umount /mnt/raid10

# Remove from fstab
sudo sed -i '/raid10/d' /etc/fstab

Stop the Array

sudo mdadm --stop /dev/md0

Output:

mdadm: stopped /dev/md0

Erase RAID Metadata (Critical!)

Why this is necessary:

  • mdadm stores metadata at start of each partition

  • Without zeroing: Old metadata confuses new arrays

  • System might try to auto-assemble old array

sudo mdadm --zero-superblock /dev/sda1
sudo mdadm --zero-superblock /dev/sdb1
sudo mdadm --zero-superblock /dev/sdc1
sudo mdadm --zero-superblock /dev/sdd1
sudo mdadm --zero-superblock /dev/sde1
sudo mdadm --zero-superblock /dev/sdf1

Note: This command gives no output on success. It only outputs errors.

Remove Partitions

for disk in sda sdb sdc sdd sde sdf; do
    echo -e "d\nw" | sudo fdisk /dev/$disk
    sudo partprobe /dev/$disk 2>/dev/null || true
done

What this does:

  • d โ†’ Delete partition

  • w โ†’ Write changes

  • Repeats for all disks

Remove Array Configuration

Debian/Ubuntu:

sudo sed -i '/md0/d' /etc/mdadm/mdadm.conf
sudo update-initramfs -u

RHEL/CentOS:

sudo sed -i '/md0/d' /etc/mdadm.conf
sudo dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)

Verify Complete Cleanup

# No RAID arrays
cat /proc/mdstat

# Disks are clean
lsblk -o NAME,SIZE,TYPE,FSTYPE

# No RAID metadata
sudo mdadm --examine /dev/sda 2>&1 | grep -i "no md"

Expected output:

Personalities : [raid10] 
unused devices: <none>

NAME             SIZE TYPE FSTYPE
sda                2G disk 
sdb                2G disk 
sdc                2G disk 
sdd                2G disk 
sde                2G disk 
sdf                2G disk

๐Ÿ“š Quick Reference Commands

Daily Operations

# Check array status
cat /proc/mdstat
sudo mdadm --detail /dev/md0

# Check array health
sudo mdadm --detail /dev/md0 | grep -E 'State|Active|Failed'

# View rebuild speed
cat /sys/block/md0/md/sync_speed_min
cat /sys/block/md0/md/sync_speed_max

Disk Management

# Mark disk as failed
sudo mdadm --manage /dev/md0 --fail /dev/sda1

# Remove failed disk
sudo mdadm --manage /dev/md0 --remove /dev/sda1

# Add replacement disk
sudo mdadm --manage /dev/md0 --add /dev/sde1

# Add hot spare
sudo mdadm --manage /dev/md0 --add-spare /dev/sdf1

Maintenance Commands

# Start manual scrub (integrity check)
echo check | sudo tee /sys/block/md0/md/sync_action

# Check scrub progress
cat /proc/mdstat

# View mismatch count (should be 0)
cat /sys/block/md0/md/mismatch_cnt

# Stop scrub (if needed)
echo idle | sudo tee /sys/block/md0/md/sync_action

Performance Testing

# Write speed test 
sudo dd if=/dev/zero of=/mnt/raid10/write_test \
  bs=1M count=1000 oflag=direct status=progress

# Read speed test
sudo dd if=/mnt/raid10/write_test of=/dev/null \
  bs=1M iflag=direct status=progress

# Random I/O test (requires fio)
sudo fio --name=randwrite --ioengine=libaio --iodepth=16 \
  --rw=randwrite --bs=4k --direct=1 --size=1G \
  --numjobs=4 --runtime=60 --group_reporting \
  --filename=/mnt/raid10/fiotest

# Cleanup
sudo rm /mnt/raid10/write_test /mnt/raid10/fiotest

๐ŸŽฏ Production Deployment Checklist

Before putting RAID-10 into production, verify:

Hardware

  • [ ] All disks are same size and model

  • [ ] Disks are from different manufacturing batches

  • [ ] SMART monitoring enabled on all disks

  • [ ] Hardware RAID controller (if used) configured correctly

  • [ ] UPS power protection in place

Configuration

  • [ ] Array created with --bitmap=internal

  • [ ] Bitmap visible in mdadm --detail output

  • [ ] Filesystem created with proper stride and stripe-width

  • [ ] Mount options include noatime,nodiratime,nofail

  • [ ] Rebuild speed limits configured in /etc/sysctl.d/

Persistence

  • [ ] Array configuration saved in /etc/mdadm/mdadm.conf

  • [ ] Initramfs/dracut updated with new config

  • [ ] fstab entry uses UUID (not /dev/md0)

  • [ ] fstab includes nofail option

Monitoring

  • [ ] Monthly scrub scheduled (/etc/cron.monthly/raid-check)

  • [ ] Daily health checks scheduled (crontab)

  • [ ] Email alerts configured (mdadm daemon or custom script)

  • [ ] Logging to /var/log/raid-health.log working

Redundancy

  • [ ] At least one hot spare added

  • [ ] Spare disk(s) tested (simulate failure)

  • [ ] Automatic failover verified

  • [ ] Replacement disk procedure documented

Testing

  • [ ] Single disk failure tested

  • [ ] Data verified accessible during degraded state

  • [ ] Rebuild process tested and timed

  • [ ] Hot spare activation tested

  • [ ] System reboot tested (auto-assembly)

  • [ ] Performance benchmarks recorded

Backup

  • [ ] RAID is NOT a backup!

  • [ ] Regular backups to external system configured

  • [ ] Backup restore procedure tested

  • [ ] Recovery time objective (RTO) documented


โš ๏ธ Common Mistakes and How to Avoid Them

Mistake 1: "RAID is my backup"

Wrong:

RAID protects against: Disk failure
RAID does NOT protect against: Accidental deletion, ransomware, 
  corruption, fire, theft, user error

Right:

RAID = Availability (keeps system running)
Backup = Data protection (recovers from disasters)

You need BOTH!

Mistake 2: Forgetting --bitmap=internal

Impact:

  • Unclean shutdown โ†’ Full array resync (hours/days)

  • Extended vulnerability window

  • Poor performance during recovery

โœ… Always specify: --bitmap=internal when creating array

Mistake 3: No hot spare

Without spare:

Disk fails โ†’ You get paged โ†’ Drive to datacenter โ†’ Replace disk โ†’ 
  Start rebuild (30 minutes to hours elapsed)

With spare:

Disk fails โ†’ Spare activates immediately โ†’ Rebuild starts (30 seconds elapsed)

Mistake 4: Skipping filesystem alignment

Performance loss: 20-30% slower without proper stride/stripe-width

โœ… Always calculate and specify alignment parameters

Mistake 5: NOT using nofail in fstab

Without nofail: System won't boot if RAID fails

โœ… Always include nofail for non-root arrays

Mistake 6: Same-batch disks

Problem:

  • Disks from same manufacturing batch fail together

  • Higher chance of losing both mirrors simultaneously

Solution:

  • Buy disks from different vendors/batches

  • Stagger disk purchases over time


๐Ÿ” Advanced Topics

Understanding RAID-10 Layouts

This tutorial uses near=2 (default), but mdadm supports three layouts:

Layout 1: near=2 (Default - What We Use)

Disk 0: [A1][A2][A3][A4]  โ†โ”
Disk 1: [A1][A2][A3][A4]  โ†โ”˜ Mirror pair (0โ†”1)

Disk 2: [B1][B2][B3][B4]  โ†โ”
Disk 3: [B1][B2][B3][B4]  โ†โ”˜ Mirror pair (2โ†”3)

Characteristics:

  • โœ… Best read performance (sequential reads hit both disks in each pair)

  • โœ… Good write performance

  • โœ… Simple to understand

  • โœ… Recommended for most use cases

Layout 2: far=2

Disk 0: [A1][A2][B1][B2]
Disk 1: [A3][A4][B3][B4]
Disk 2: [B1][B2][C1][C2]  โ† Mirrors spread across disk
Disk 3: [B3][B4][C3][C4]

Characteristics:

  • โœ… Best sequential read performance (all disks contribute)

  • โš ๏ธ Slower random writes

  • Use case: Read-heavy workloads (media streaming)

To use:

mdadm --create /dev/md0 --level=10 --layout=f2 --raid-devices=4 ...

Layout 3: offset=2

Disk 0: [A1][A2][A3][A4]
Disk 1: [B1][B2][B3][B4]
Disk 2: [A1][A2][A3][A4]  โ† Offset mirror
Disk 3: [B1][B2][B3][B4]  โ† Offset mirror

Characteristics:

  • Balance between near and far

  • Rarely used in practice

Recommendation: Stick with near=2 (default) unless you have specific sequential read requirements.


๐Ÿš€ Performance Optimization

SSD-Specific Optimizations

For SSDs, add the discard option:

sudo mount -o noatime,nodiratime,discard /dev/md0 /mnt/raid10

Or in fstab:

UUID=... /mnt/raid10 ext4 defaults,noatime,nodiratime,discard,nofail 0 2

What discard does:

  • โœ… Enables TRIM support

  • โœ… Tells SSD which blocks are free

  • โœ… Maintains long-term performance

  • โœ… Essential for SSD longevity


๐Ÿ“– Troubleshooting Guide

Problem: Array won't assemble after reboot

Symptoms:

cat /proc/mdstat
# Shows: Personalities : [raid10]
#        unused devices: <none>

Solutions:

  1. Check if disks are detected:
lsblk -o NAME,SIZE,TYPE,FSTYPE
# Verify sd{a,b,c,d}1 exist
  1. Try manual assembly:
sudo mdadm --assemble --scan --verbose
  1. Check configuration:
sudo cat /etc/mdadm/mdadm.conf | grep md0
# Should show ARRAY /dev/md0 ...
  1. Force assembly:
sudo mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

Problem: Slow rebuild speed

Symptoms:

cat /proc/mdstat
# Shows: speed=10000K/sec (very slow)

Solutions:

  1. Check speed limits:
cat /proc/sys/dev/raid/speed_limit_min
cat /proc/sys/dev/raid/speed_limit_max
  1. Increase limits:
echo 50000 | sudo tee /proc/sys/dev/raid/speed_limit_min
echo 500000 | sudo tee /proc/sys/dev/raid/speed_limit_max
  1. Check I/O load:
iostat -x 2
# If disks are busy with other I/O, rebuild will be slow

Problem: Mismatch count increasing

Symptoms:

cat /sys/block/md0/md/mismatch_cnt
# Shows: 42 (non-zero)

This indicates:

  • Possible bitrot (data corruption)

  • Failing disk

  • Memory errors

  • Bad SATA cable

Solutions:

  1. Run repair:
echo repair | sudo tee /sys/block/md0/md/sync_action
  1. Check SMART status:
sudo smartctl -a /dev/sda
sudo smartctl -a /dev/sdb
# Look for reallocated sectors, pending sectors
  1. Test individual disks:
sudo badblocks -sv /dev/sda1

Problem: Array degraded but no failed disk shown

Check detailed status:

sudo mdadm --detail /dev/md0
cat /sys/block/md*/md/sync_action

Possible causes:

  • Bitmap corruption

  • Filesystem errors

  • Cache coherency issues

Solution:

sudo mdadm --stop /dev/md0
sudo mdadm --assemble /dev/md0 --force

๐Ÿ“ Final Notes

What You've Learned

โœ… Conceptual understanding:

  • CORRECTED: RAID-10 mirror pairs work as 0โ†”1 and 2โ†”3 with near=2

  • set-A and set-B indicate striping roles, not mirror partners

  • Failure tolerance patterns (can lose 2 disks if from different pairs)

  • Difference between mdadm --level=10 and true nested RAID-1+0

โœ… Practical skills:

  • Creating production-grade RAID-10

  • Proper filesystem alignment

  • Monitoring and maintenance

  • Disaster recovery procedures

โœ… Best practices:

  • Write-intent bitmaps (--bitmap=internal)

  • Hot spare configuration

  • Auto-mount with failsafe options (nofail)

  • Regular integrity checks


Next Steps for Production

  1. Implement monitoring alerts:

    • Configure email notifications

    • Set up Nagios/Zabbix checks

    • Create runbooks for failures

  2. Document your setup:

    • Hardware inventory

    • Disk serial numbers

    • Recovery procedures

    • Contact information

  3. Test disaster scenarios:

    • Multiple disk failures

    • Power loss during rebuild

    • Full array recovery from scratch

  4. Establish backup system:

    • Regular backups to external storage

    • Test restore procedures

    • Document retention policies


๐ŸŽ“ Key Takeaways

Remember These Critical Points:

  1. Mirror Pairing in mdadm --level=10 with near=2:

    • โœ… CORRECT: Adjacent pairs (0โ†”1, 2โ†”3)

    • โŒ WRONG: Vertical pairs (0โ†”2, 1โ†”3)

    • The set-A/set-B labels indicate striping positions, not mirror partners

  2. mdadm --level=10 โ‰  True RAID-1+0:

    • mdadm version: Automatic adjacent mirrors (probabilistic failure tolerance)

    • Nested version: User-defined mirrors (guaranteed failure tolerance)

    • Choose nested for mission-critical systems

  3. Always use --bitmap=internal:

    • Prevents hours-long resyncs after power loss

    • Only ~1MB overhead per 256GB

    • Mandatory for production

  4. Filesystem alignment matters:

    • Calculate stride = chunk_size รท block_size

    • Calculate stripe-width = stride ร— number_of_data_disks

    • Impact: 20-30% performance difference

  5. RAID is NOT backup:

    • RAID = Availability (protects against disk failure)

    • Backup = Data protection (protects against everything else)

    • Always have external backups

  6. Always use nofail in fstab:

    • Without it: System won't boot if RAID fails

    • With it: System boots, you can fix the issue

    • Critical for non-root arrays

  7. Hot spares save downtime:

    • Automatic failover

    • Immediate rebuild

    • Essential for 24/7 systems

  8. Reasonable rebuild speeds:

    • HDDs: 100-200 MB/s

    • SATA SSDs: 300-500 MB/s

    • NVMe SSDs: 500 MB/s - 1 GB/s

    • Don't set too high (will starve normal I/O)


๐Ÿ”ง Teaching the Contradictions - What Was Fixed

Contradiction #1: Mirror Pairing (MAJOR FIX)

Original guide said:

Mirror Set 1 โ†’ Disk 0 โ†” Disk 2 (vertical pairing)
Mirror Set 2 โ†’ Disk 1 โ†” Disk 3 (vertical pairing)

Reality with near=2:

Mirror Pair 1 โ†’ Disk 0 โ†” Disk 1 (horizontal/adjacent pairing)
Mirror Pair 2 โ†’ Disk 2 โ†” Disk 3 (horizontal/adjacent pairing)

How to verify yourself:

# After creating array, fail a disk and check what happens
sudo mdadm --manage /dev/md0 --fail /dev/sda1  # Fail position 0
cat /proc/mdstat
# You'll see [_UUU] - position 0 down, others working
# Data served from position 1 (sdb1), NOT position 2

Visual proof:

If 0โ†”2 were mirrors (wrong):
  Fail sda1 โ†’ Data served from sdc1

Reality (0โ†”1 are mirrors):
  Fail sda1 โ†’ Data served from sdb1 โœ“

Contradiction #2: set-A and set-B Meaning

Original guide implied:

set-A = Mirror Set 1
set-B = Mirror Set 2

Actually means:

set-A = First position in each mirror pair (0, 2)
set-B = Second position in each mirror pair (1, 3)

These are striping labels, not mirror identifiers!

How to understand it:

mdadm --detail output:
  Position 0: set-A  โ”
  Position 1: set-B  โ”˜ These mirror each other
  Position 2: set-A  โ”
  Position 3: set-B  โ”˜ These mirror each other

set-A and set-B indicate how data is striped across the pairs,
not which disks mirror each other.

Contradiction #3: Mistake 5 (nofail)

Original guide said:

Mistake 5: Using nofail in fstab
(Then immediately contradicted itself)

Corrected:

Mistake 5: NOT using nofail in fstab

โœ… Always USE nofail for non-root RAID arrays

Why this matters:

# Without nofail in fstab:
UUID=... /mnt/raid10 ext4 defaults,noatime,nodiratime 0 2
# โ†’ System hangs if array fails to mount

# With nofail:
UUID=... /mnt/raid10 ext4 defaults,noatime,nodiratime,nofail 0 2
# โ†’ System boots even if array fails, you can investigate

Contradiction #4: Stripe-Width Terminology

Original guide said:

Stripe-width = Stride ร— Number of Stripe Groups
Number of stripe groups = 2 for RAID-10 with 4 disks

Corrected terminology:

Stripe-width = Stride ร— Number of Data Disks
Number of data disks = 2 for RAID-10 with 4 disks

(The other 2 disks hold mirrors, not unique data)

Why "data disks" is clearer:

  • RAID-10 with 4 disks: 2 store data, 2 store mirrors

  • Stripe-width should span all unique data

  • "Stripe groups" is non-standard terminology


Contradiction #5: Rebuild Speed Values

Original guide recommended:

dev.raid.speed_limit_min = 10000   (10 MB/s)
dev.raid.speed_limit_max = 500000  (500 MB/s)

But showed example output:

dev.raid.speed_limit_min = 1000000  (1000 MB/s)
dev.raid.speed_limit_max = 2000000  (2000 MB/s)

Why the example was wrong:

  • 1-2 GB/s is too aggressive for most hardware

  • Will starve normal I/O operations

  • Only suitable for high-end NVMe arrays with no other workload

Corrected recommendation:

# General purpose (balanced)
dev.raid.speed_limit_min = 50000   (50 MB/s)
dev.raid.speed_limit_max = 500000  (500 MB/s)

# Adjust based on hardware:
- HDDs: 100000-200000
- SATA SSDs: 300000-500000
- NVMe (idle system): 500000-1000000

Contradiction #6: Partition Type (Minor)

Original guide used:

Hex code: fd  (Linux RAID autodetect)

Note added:

  • This is legacy but still works

  • Modern systems can use 83 (Linux) or 8e (Linux LVM)

  • mdadm 3.0+ doesn't require autodetect type

  • Not wrong, just slightly outdated

Both work fine:

# Legacy (still works)
fdisk: t โ†’ fd

# Modern (also works)
fdisk: t โ†’ 83

๐ŸŽฏ Quick Summary of All Fixes

IssueOriginalCorrected
Mirror pairs0โ†”2, 1โ†”3 (vertical)0โ†”1, 2โ†”3 (horizontal)
set-A/set-BMirror identifiersStriping position labels
Mistake 5"Using nofail" (contradictory)"NOT using nofail"
Stripe-width term"Stripe groups""Data disks"
Rebuild speedExample showed 1-2 GB/sUse 50-500 MB/s
Partition typeOnly mentioned fdAdded note about 83/8e

๐Ÿงช How to Verify the Corrections Yourself

Test 1: Verify Mirror Pairing

# Create array
sudo mdadm --create /dev/md0 --level=10 --raid-devices=4 \
  --bitmap=internal /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

# Fail position 0
sudo mdadm --manage /dev/md0 --fail /dev/sda1

# Check which disk serves data
sudo dd if=/mnt/raid10/test.txt of=/dev/null
iostat -x 1 5
# You'll see sdb1 (position 1) active, NOT sdc1 (position 2)
# This proves 0โ†”1 are mirrors, not 0โ†”2

Test 2: Verify Failure Tolerance

# Start fresh
sudo mdadm --create /dev/md0 --level=10 --raid-devices=4 \
  --bitmap=internal /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1

# Test 1: Fail disks from same pair
sudo mdadm --manage /dev/md0 --fail /dev/sda1 /dev/sdb1
cat /proc/mdstat
# Result: Array FAILS (both mirrors of pair 1 gone)

# Recreate array, test 2: Fail disks from different pairs
sudo mdadm --manage /dev/md0 --fail /dev/sda1 /dev/sdc1
cat /proc/mdstat
# Result: Array SURVIVES (each pair still has one disk)

Test 3: Verify nofail Behavior

# Add to fstab WITHOUT nofail
UUID=... /mnt/raid10 ext4 defaults 0 2

# Stop array
sudo mdadm --stop /dev/md0

# Try to boot
sudo systemctl daemon-reload
sudo mount -a
# Result: Hangs waiting for array

# Now add nofail
UUID=... /mnt/raid10 ext4 defaults,nofail 0 2

# Try again
sudo mount -a
# Result: Continues, shows warning but doesn't hang

๐Ÿ“– Additional Resources

Official Documentation

  • Understanding RAID levels and their trade-offs

  • Linux kernel md driver architecture

  • Filesystem alignment for RAID arrays

  • Backup strategies for RAID systems

Community Support


โœ… Final Checklist

Before considering this guide complete, verify:

Understanding:

  • [ ] I understand how near=2 creates mirror pairs (0โ†”1, 2โ†”3)

  • [ ] I know the difference between mdadm --level=10 and nested RAID-1+0

  • [ ] I understand what set-A and set-B actually mean

  • [ ] I can calculate filesystem alignment parameters

  • [ ] I know why nofail is critical in fstab

Practical Skills:

  • [ ] I can create a RAID-10 array with proper parameters

  • [ ] I can simulate and recover from disk failures

  • [ ] I can configure hot spares

  • [ ] I can set up monitoring and alerts

  • [ ] I can configure auto-mount correctly

Production Readiness:

  • [ ] I have tested failure scenarios

  • [ ] I have backup systems in place

  • [ ] I have documented my configuration

  • [ ] I have monitoring alerts configured

  • [ ] I understand this is NOT a backup solution


๐ŸŽ‰ Congratulations!

You now have a corrected, production-ready understanding of RAID-10 with mdadm.

Key achievements:

  • โœ… Understand true mirror pairing behavior

  • โœ… Can build optimized RAID-10 arrays

  • โœ… Know how to handle failures and recoveries

  • โœ… Understand the difference between RAID and backup

  • โœ… Can deploy this knowledge in production

Remember: RAID provides availability, not data protection. Always maintain proper backups!


๐Ÿ“ž Questions or Issues?

If you encounter problems:

  1. Check the Troubleshooting Guide (above)

  2. Review the Quick Reference Commands

  3. Verify array status: sudo mdadm --detail /dev/md0

  4. Check system logs: dmesg | grep -i raid or journalctl -xe

  5. Consult community resources (listed above)

Stay safe, keep backups, and happy RAID-ing! ๐Ÿš€

Linux RAID-10 Setup Guide with mdadm