RAID is an array of multiple independent hard disk drives that provides high performance and fault tolerance. The RAID array appears to the host computer as a single storage unit or as multiple logical units. Data throughput improves because several disks can be accessed simultaneously. RAID systems also improve data storage availability and fault tolerance. Data loss caused by a hard drive failure can be recovered by reconstructing missing data from the remaining data or parity drives.
RAID Description
RAID (Redundant Array of Independent Disks) is an array, or group, of multiple independent hard drives that provide high performance and fault tolerance. A RAID disk subsystem improves I/O (input/output) performance and reliability. The RAID array appears to the host computer as a single storage unit or as multiple logical units. I/O is expedited because several disks can be accessed simultaneously.
RAID Benefits
RAID systems improve data storage reliability and fault tolerance compared to single-drive storage systems. Data loss resulting from a hard drive failure can be prevented by reconstructing missing data from the remaining hard drives. RAID has gained popularity because it improves I/O performance and increases storage subsystem reliability.
RAID Functions
Logical drives, also known as virtual disks, are arrays or spanned arrays that are available to the operating system. The storage space in a logical drive is spread across all the physical drives in the array.
NOTE: The maximum logical drive size for all supported RAID levels (0, 1, 5, 10, and 50) is 2 TB. You can
create multiple logical drives on the same physical disks.
Your SCSI hard drives must be organized into logical drives in an array and must be able to support the RAID level that you select. Below are some common RAID functions:
Creating hot spare drives.
Configuring physical arrays and logical drives.
Initializing one or more logical drives.
Accessing controllers, logical drives, and physical drives individually.
Rebuilding failed hard drives.
Verifying that the redundancy data in logical drives using RAID level 1, 5, 10, or 50 is correct.
Reconstructing logical drives after changing RAID levels or adding a hard drive to an array.
Selecting a host controller to work on.
Components and Features
RAID levels describe a system for ensuring the availability and redundancy of data stored on large disk subsystems. PERC 4/Di/Si and 4e/Di/Si support RAID levels 0, 1, 5, 10 (1+0), and 50 (5+0). See RAID Levels for detailed information about RAID levels.
Physical Array
A physical array is a group of physical disk drives. The physical disk drives are managed in partitions known as logical drives.
Logical Drive
A logical drive is a partition in a physical array of disks that is made up of contiguous data segments on the physical disks. A logical drive can consist of an entire physical array, more than one entire physical array, a part of an array, parts of more than one array, or a combination of any two of these conditions.
NOTE: The maximum logical drive size for all supported RAID levels (0, 1, 5, 10, and 50) is 2 TB. You can
create multiple logical drives within the same physical array.
RAID Array
A RAID array is one or more logical drives controlled by the PERC.
Channel Redundant Logical Drives
When you create a logical drive, it is possible to use disks attached to different channels to implement channel redundancy, known as Channel Redundant Logical Drives. This configuration might be used for disks that reside in enclosures subject to thermal shutdown.
For more information refer to the Dell OpenManage Array Manager or Dell OpenManage Storage Management user guides located at: htttp://support.dell.com.
NOTE: Channel redundancy applies only to controllers that have more than one channel and that attach
to an external disk enclosure.
NOTE: Make sure that the spans are in different backplanes, so that if one span fails, you do not lose the
whole array.
Fault Tolerance
Fault tolerance is the capability of the subsystem to undergo a single drive failure per span without compromising data integrity, and processing capability. The RAID controller provides this support through redundant arrays in RAID levels 1, 5, 10 and 50. The system can still work properly even with a single disk failure in an array, through performance can be degraded to some extent.
NOTE: RAID level 0 is not fault tolerant. If a drive in a RAID 0 array fails, the whole logical drive (all
physical drives associated with the logical drive) will fail.
Fault tolerance is often associated with system availability because it allows the system to be available during the failures. However, this means it is also important for the system to be available during the repair of the problem. To make this possible, PERC 4/Di/Si and 4e/Di/Si support hot spare disks, and the auto-rebuild feature.
A hot spare is an unused physical disk that, in case of a disk failure in a redundant RAID array, can be used to rebuild the data and re-establish redundancy. After the hot spare is automatically moved into the RAID array, the data is automatically rebuilt on the hot spare drive. The RAID array continues to handle requests while the rebuild occurs.
Auto-rebuild allows a failed drive to be replaced and the data automatically rebuilt by "hot-swapping" the drive in the same drive bay. The RAID array continues to handle requests while the rebuild occurs.
Consistency Check
The Consistency Check operation verifies correctness of the data in logical drives that use RAID levels 1, 5, 10, and 50. (RAID 0 does not provide data redundancy). For example, in a system with parity, checking consistency means computing the data on one drive and comparing the results to the contents of the parity drive.
NOTE: It is recommended that you perform a consistency check at least once a month.
Background Initialization
Background initialization is a consistency check that is forced when you create a logical drive. The difference between a background initialization and a consistency check is that a background initialization is forced on new logical drives. This is an automatic operation that starts 5 minutes after you create the drive.
Background initialization is a check for media errors on physical drives. It ensures that striped data segments are the same on all physical drives in an array. The background initialization rate is controlled by the rebuild rate set using the BIOS Configuration Utility. The default and recommended rate is 30%. Before you change the rebuild rate, you must stop the background initialization or the rate change will not affect the background initialization rate. After you stop background initialization and change the rebuild rate, the rate change takes effect when you restart background initialization.
Patrol Read
Patrol Read involves the review of your system for possible hard drive errors that could lead to drive failure, then action to correct errors. The goal is to protect data integrity by detecting physical drive failure before the failure can damage data. The corrective actions depend on the array configuration and type of errors.
Patrol Read starts only when the controller is idle for a defined period of time and no other background tasks are active, though it can continue to run during heavy I/O processes.
You can use the BIOS Configuration Utility to select the Patrol Read options, which you can use to set automatic or manual operation, or disable Patrol Read. Perform the following steps to select a Patrol Read option:
Select Objects>Adapter from the Management Menu.
The Adapter menu displays.
Select Patrol Read Options from the Adapter menu.
The following options display:
Patrol Read Mode
Patrol Read Status
Patrol Read Control
Select Patrol Read Mode to display the Patrol Read mode options:
Manual - In manual mode, you must initiate the Patrol Read.
Auto - In auto mode, the firmware initiates the Patrol Read on a scheduled basis.
Manual Halt - Use manual halt to stop the automatic operation, then switch to manual mode.
Disable - Use this option to disable Patrol Read.
If you use Manual mode, perform the following steps to initiate Patrol Read:
Select Patrol Read Control and press <Enter>.
Select Start and press <Enter>.
NOTE: Pause/Resume is not a valid operation when Patrol Read is set to Manual mode.
Select Patrol Read Status to display the number of iterations completed, the current state of
the Patrol Read (active or stopped), and the schedule for the next execution of Patrol Read.
Disk Striping
Disk striping allows you to write data across multiple physical disks instead of just one physical disk. Disk striping involves partitioning each drive storage space into stripes that can vary in size from 8 KB to 128 KB. These stripes are interleaved in a repeated sequential manner. The combined storage space is composed of stripes from each drive. PERC 4/Di/Si and 4e/Di/Si support stripe sizes of 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, and 128 KB. It is recommended that you keep stripe sizes the same across RAID arrays.
NOTE: Using a 2 KB or 4 KB stripe size is not recommended due to performance implications. Use 2 KB
or 4 KB only when required by the applications used. The default stripe size is 64 KB. Do not install an
operating system on a logical drive with less than a 16 KB stripe size.
For example, in a four-disk system using only disk striping (used in RAID level 0), segment 1 is written to disk 1, segment 2 is written to disk 2, and so on. Disk striping enhances performance because multiple drives are accessed simultaneously, but disk striping does not provide data redundancy.
Stripe width is the number of disks involved in an array where striping is implemented. For example, a four-disk array with disk striping has a stripe width of four.
Stripe Size
The stripe size is the length of the interleaved data segments that the RAID controller writes across multiple drives. PERC 4/Di/Si and 4e/Di/Si support stripe sizes of 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, and 128 KB.
NOTE: Using a 2 KB or 4 KB stripe size is not recommended due to performance implications. Use 2 KB
or 4 KB only when required by the applications used. The default stripe size is 64 KB. Do not install an
operating system on a logical drive with less than a 16 KB stripe size.
Disk Mirroring
With mirroring (used in RAID 1), data written to one disk is simultaneously written to another disk. If one disk fails, the contents of the other disk can be used to run the system and reconstruct the failed disk. The primary advantage of disk mirroring is that it provides 100% data redundancy. Because the contents of the disk are completely written to a second disk, it does not matter if one of the disks fails. Both disks contain the same data at all times. Either drive can act as the operational drive.
Disk mirroring provides 100% redundancy, but is expensive because each drive in the system must be duplicated. Figure 2-2 shows an example of disk mirroring.
Figure 2-2. Example of Disk Mirroring (RAID 1)
Parity
Parity generates a set of redundancy data from two or more parent data sets. The redundancy data can be used to reconstruct one of the parent data sets. Parity data does not fully duplicate the parent data sets. In RAID, this method is applied to entire drives or stripes across all disk drives in an array. The types of parity are shown in Table 2-1.
Table 2-1. Types of Parity
Parity Type
Description
Dedicated
The parity of the data on two or more disk drives is stored on an additional disk.
Distributed
The parity data is distributed across more than one drive in the system.
If a single disk drive fails, it can be rebuilt from the parity and the data on the remaining drives. RAID level 5 combines distributed parity with disk striping, as shown in Figure 2-3. Parity provides redundancy for one drive failure without duplicating the contents of entire disk drives, but parity generation can slow the write process.
Figure 2-3. Example of Distributed Parity (RAID 5)
Disk Spanning
Disk spanning allows multiple physical drives to function like one big drive. Spanning overcomes lack of disk space and simplifies storage management by combining existing resources or adding relatively inexpensive resources. For example, four 20 GB drives can be combined to appear to the operating system as a single 80 GB drive.
Spanning alone does not provide reliability or performance enhancements. Spanned logical drives must have the same stripe size and must be contiguous. In Figure 2-4, RAID 1 arrays are turned into a RAID 10 array.
NOTE: Make sure that the spans in a RAID 10 array are in different backplanes, so that if one span fails,
you won't lose the whole array.
Figure 2-4. Example of Disk Spanning
NOTE: Spanning two contiguous RAID 0 logical drives does not produce a new RAID level or add fault
tolerance. It does increase the size of the logical volume and improves performance by doubling the
number of spindles.
Spanning for RAID 10 or RAID 50
Table 2-2 describes how to configure RAID 10 and RAID 50 by spanning. The PERC 4/Di/Si and 4e/Di/Si family supports spanning for RAID 1 and RAID 5 only. The logical drives must have the same stripe size and the maximum number of spans is eight. The full drive size is used when you span logical drives; you cannot specify a smaller drive size.
Configure RAID 10 by spanning two contiguous RAID 1 logical drives. The RAID 1 logical drives must have the same stripe size.
50
Configure RAID 50 by spanning two contiguous RAID 5 logical drives. The RAID 5 logical drives must have the same stripe size.
Hot Spares
A hot spare is an extra, unused disk drive that is part of the disk subsystem. It is usually in standby mode, ready for service if a drive fails. Hot spares permit you to replace failed drives without system shutdown or user intervention. PERC 4/Di/Si and 4e/Di/Si implement automatic and transparent rebuilds of failed drives using hot spare drives, providing a high degree of fault tolerance and zero downtime.
NOTE: When running RAID 0 and RAID 5 logical drives on the same set of physical drives (a sliced
configuration), a rebuild to a hotspare will not occur after a drive failure until the RAID 0 logical drive is
deleted.
The PERC 4/Di/Si and 4e/Di/Si RAID management software allows you to specify physical drives as hot spares. When a hot spare is needed, the RAID controller assigns the hot spare that has a capacity closest to and at least as great as that of the failed drive to take the place of the failed drive. The failed drive is removed from the logical drive and marked ready awaiting removal once the rebuild to a hotspare begins. See Table 4-12 in Assigning RAID Levels for detailed information about the minimum and maximum number of hard drives supported by each RAID level for each RAID controller. You can make hot spares of the physical drives that are not in a RAID logical drive.
NOTE: If a rebuild to a hotspare fails for any reason, the hotspare drive will be marked as "failed". If the
source drive fails, both the source drive and the hot spare drive will be marked as "failed".
There are two types of hot spares:
Global Hot Spare
Dedicated Hot Spare
Global Hot Spare
A global hot spare drive can be used to replace any failed drive in a redundant array as long as its capacity is equal to or larger than the coerced capacity of the failed drive. A global hot spare defined on any channel should be available to replace a failed drive on both channels.
Dedicated Hot Spare
A dedicated hot spare can be used to replace a failed drive only in a selected array. One or more drives can be designated as member of a spare drive pool; the most suitable drive from the pool is selected for fail over. A dedicated hot spare is used before one from the global hot spare pool.
Hot spare drives can be located on any RAID channel. Standby hot spares (not being used in RAID array) are polled every 60 seconds at a minimum, and their status made available in the array management software. PERC 4/Di/Si and 4e/Di/Si offer the ability to rebuild with a disk that is in a system, but not initially set to be a hot spare.
Observe the following parameters when using hot spares:
Hot spares are used only in arrays with redundancy, for example, RAID levels 1, 5, 10, and 50.
A hot spare connected to a specific RAID controller can be used to rebuild a drive that is connected to the same controller only.
You must assign the hot spare to one or more drives through the controller's BIOS or use array management software to place it in the hot spare pool.
A hot spare must have free space equal to or greater than the drive it would replace. For example, to replace an 18 GB drive, the hot spare must be 18 GB or larger.
Disk Rebuilds
When a physical drive in a RAID array fails, you can rebuild the drive by recreating the data that was stored on the drive before it failed. The RAID controller uses hot spares to rebuild failed drives automatically and transparently, at user-defined rebuild rates. If a hot spare is available, the rebuild can start automatically when a drive fails. If a hot spare is not available, the failed drive must be replaced with a new drive so the data on the failed drive can be rebuilt. Rebuilding can be done only in arrays with data redundancy, which includes RAID 1, 5, 10, and 50.
The failed physical drive is removed from the logical drive and marked ready awaiting removal once the rebuild to a hotspare begins. If the system goes down during a rebuild, the RAID controller automatically restarts the rebuild after the system reboots.
NOTE: When the rebuild to a hotspare begins, the failed drive is often removed from the logical drive
before management applications, such as Dell OpenManage Storage Management, or Dell OpenManage
Storage Management, detect the failed drive. When this occurs, the events logs show the drive
rebuilding to the hotspare without showing the failed drive. The formerly failed drive will be marked as
"ready" after a rebuild begins to a hotspare.
NOTE: If a rebuild to a hotspare fails for any reason, the hotspare drive will be marked as "failed". If the
source drive fails, both the source drive and the hot spare drive will be marked as "failed".
An automatic drive rebuild will not start if you replace a drive during an online capacity expansion or RAID level migration. The rebuild must be started manually after the expansion or migration procedure is complete.
Rebuild Checkpoint
The Dell PERC firmware has a feature to resume a rebuild on a physical drive in case of an abrupt power loss or if the server rebooted in the middle of a rebuild operation. In any of the following cases, however, a rebuild will not resume:
A configuration mismatch is detected on the controller.
A reconstruction is also currently in progress.
The logical drive is now owned by the peer node.
Rebuild Rate
The rebuild rate is the percentage of the compute cycles dedicated to rebuilding failed drives. A rebuild rate of 100 percent means the system gives priority to rebuilding the failed drives.
The rebuild rate can be configured between 0 percent and 100 percent. At 0 percent, the rebuild is done only if the system is not doing anything else. At 100 percent, the rebuild has a higher priority than any other system activity. Using 0 or 100 percent is not recommended. The default rebuild rate is 30 percent.
Hot Swap
A hot swap is the manual replacement of a defective physical disk unit while the computer is still running. When a new drive has been installed, a rebuild will occur automatically if:
The newly inserted drive is the same size as or larger than the failed drive
It is placed in the same drive bay as the failed drive it is replacing
The RAID controller can be configured to detect the new disks and rebuild the contents of the disk drive automatically.
SCSI Physical Drive States
The Physical SCSI drive states are described in Table 2-3.
Table 2-3. SCSI Physical Drive States
State
Description
Online
The physical drive is working normally and is a part of a configured logical drive.
Ready
The physical drive is functioning normally but is not part of a configured logical drive and is not designated as a hot spare.
Hot Spare
The physical drive is powered up and ready for use as a spare in case an online drive fails.
Fail
A fault has occurred in the physical drive, placing it out of service.
Rebuild
The physical drive is being rebuilt with data from a failed drive.
Logical Drive States
The logical drive states are described in Table 2-4.
Table 2-4. Logical Drive States
State
Description
Optimal
The logical drive operating condition is good. All configured physical drives are online.
Degraded
The logical drive operating condition is not optimal. One of the configured physical drives has failed or is offline.
Failed
The logical drive has failed.
Offline
The logical drive is not available to the RAID controller.
Enclosure Management
Enclosure management is the intelligent monitoring of the disk subsystem by software and/or hardware. The disk subsystem can be part of the host computer or can reside in an external disk enclosure. Enclosure management helps you stay informed of events in the disk subsystem, such as a drive or power supply failure. Enclosure management increases the fault tolerance of the disk subsystem.
RAID Levels
The RAID controller supports RAID levels 0, 1, 5, 10, and 50. The supported RAID levels are summarized in the following section. In addition, it supports independent drives (configured as RAID 0.) The following sections describe the RAID levels in detail.
Summary of RAID Levels
RAID 0 uses striping to provide high data throughput, especially for large files in an environment that does not require fault tolerance.
RAID 1 uses mirroring so that data written to one disk drive is simultaneously written to another disk drive. This is good for small databases or other applications that require small capacity, but complete data redundancy.
RAID 5 uses disk striping and parity data across all drives (distributed parity) to provide high data throughput, especially for small random access.
RAID 10, a combination of RAID 0 and RAID 1, consists of striped data across mirrored spans. It provides high data throughput and complete data redundancy, but uses a larger number of spans.
RAID 50, a combination of RAID 0 and RAID 5, uses distributed parity and disk striping and works best with data that requires high reliability, high request rates, high data transfers, and medium-to-large capacity.
NOTE: Running RAID 0 and RAID 5 logical arrays on the same set of physical disks (a sliced
configuration) is not recommended. In the event of a disk failure, the RAID 0 logical drive will cause any
rebuild attempt to fail.
Selecting a RAID Level
To ensure the best performance, you should select the optimal RAID level when you create a system drive. The optimal RAID level for your disk array depends on a number of factors:
The number of physical drives in the disk array
The capacity of the physical drives in the array
The need for data redundancy
The disk performance requirements
RAID 0
RAID 0 provides disk striping across all drives in the RAID array. RAID 0 does not provide any data redundancy, but does offer the best performance of any RAID level. RAID 0 breaks up data into smaller blocks and then writes a block to each drive in the array. The size of each block is determined by the stripe size parameter, set during the creation of the RAID set. RAID 0 offers high bandwidth.
NOTE: RAID level 0 is not fault tolerant. If a drive in a RAID 0 array fails, the whole logical drive (all
physical drives associated with the logical drive) will fail.
By breaking up a large file into smaller blocks, the RAID controller can use several drives to read or write the file faster. RAID 0 involves no parity calculations to complicate the write operation. This makes RAID 0 ideal for applications that require high bandwidth but do not require fault tolerance. RAID 0 is also used to denote an "independent" or single drive.
Provides high data throughput, especially for large files. Any environment that does not require fault tolerance.
Strong Points
Provides increased data throughput for large files. No capacity loss penalty for parity.
Weak Points
Does not provide fault tolerance or high bandwidth. All data lost if any drive fails.
Drives
1 to 32
RAID 1
In RAID 1, the RAID controller duplicates all data from one drive to a second drive. RAID 1 provides complete data redundancy, but at the cost of doubling the required data storage capacity. Table 2-6 provides an overview of RAID 1.
Table 2-6. RAID 1 Overview
Uses
Use RAID 1 for small databases or any other environment that requires fault tolerance but small capacity.
Strong Points
Provides complete data redundancy. RAID 1 is ideal for any application that requires fault tolerance and minimal capacity.
Weak Points
Requires twice as many disk drives. Performance is impaired during drive rebuilds.
Drives
2
RAID 5
RAID 5 includes disk striping at the block level and parity. In RAID 5, the parity information is written to several drives. RAID 5 is best suited for networks that perform a lot of small input/output (I/O) transactions simultaneously.
RAID 5 addresses the bottleneck issue for random I/O operations. Because each drive contains both data and parity, numerous writes can take place concurrently. In addition, robust caching algorithms and hardware based exclusive-or assist make RAID 5 performance exceptional in many different environments.
Provides high data throughput, especially for large files. Use RAID 5 for transaction processing applications because each drive can read and write independently. If a drive fails, the RAID controller uses the parity drive to recreate all missing information. Use also for office automation and online customer service that requires fault tolerance. Use for any application that has high read request rates but low write request rates.
Strong Points
Provides data redundancy, high read rates, and good performance in most environments. Provides redundancy with lowest loss of capacity.
Weak Points
Not well suited to tasks requiring lot of writes. Suffers more impact if no cache is used (clustering). Disk drive performance will be reduced if a drive is being rebuilt. Environments with few processes do not perform as well because the RAID overhead is not offset by the performance gains in handling simultaneous processes.
Drives
3 to 28
RAID 10
RAID 10 is a combination of RAID 0 and RAID 1. RAID 10 consists of stripes across mirrored drives. RAID 10 breaks up data into smaller blocks, then mirrors the blocks of data to each RAID 1 RAID set. Each RAID 1 RAID set then duplicates its data to its other drive. The size of each block is determined by the stripe size parameter, which is set during the creation of the RAID set. Up to 8 spans can be supported by RAID 10.
Appropriate when used with data storage that needs 100% redundancy of mirrored arrays and that also needs the enhanced I/O performance of RAID 0 (striped arrays.) RAID 10 works well for medium-sized databases or any environment that requires a higher degree of fault tolerance and moderate to medium capacity.
Strong Points
Provides both high data transfer rates and complete data redundancy.
Weak Points
Requires twice as many drives as all other RAID levels except RAID 1.
Drives
2n, where n is greater than 1.
In Figure 2-5, logical drive 0 is created by distributing data across four arrays (arrays 0 through 3). Spanning is used because one logical drive is defined across more than one array. Logical drives defined across multiple RAID 1 level arrays are referred to as RAID level 10, (1+0). To increase performance, by enabling access to multiple arrays simultaneously, data is striped across arrays.
Using RAID level 10, rather than a simple RAID set, up to 8 spans can be supported, and up to 8 drive failures (one failure per span) can be tolerated, though less than total disk drive capacity is available. Though multiple drive failures can be tolerated, only one drive failure can be tolerated in each RAID 1 level array.
Figure 2-5. RAID 10 Level Logical Drive
RAID 50
RAID 50 provides the features of both RAID 0 and RAID 5. RAID 50 includes both parity and disk striping across multiple arrays. RAID 50 is best implemented on two RAID 5 disk arrays with data striped across both disk arrays.
RAID 50 breaks up data into smaller blocks, then stripes the blocks of data to each RAID 5 disk set. RAID 5 breaks up data into smaller blocks, calculates parity by performing an exclusive-or on the blocks, then writes the blocks of data and parity to each drive in the array. The size of each block is determined by the stripe size parameter, which is set during the creation of the RAID set.
RAID level 50 can support up to 8 spans and tolerate up to 8 drive failures (one failure per span), though less than total disk drive capacity is available. Though multiple drive failures can be tolerated, only one drive failure can be tolerated in each RAID 1 level array.
Appropriate when used with data that requires high reliability, high request rates, high data transfer, and medium to large capacity.
Strong Points
Provides high data throughput, data redundancy and very good performance.
Weak Points
Requires 2 to 8 times as many parity drives as RAID 5.
Drives
6 to 28
Dell supports the use of two channels with a maximum of 14 physical drives per channel.
Figure 2-6 provides an example of a RAID 50 level logical drive.
Figure 2-6. RAID 50 Level Logical Drive
RAID Configuration Strategies
The most important factors in RAID array configuration are:
Logical drive availability (fault tolerance)
Logical drive performance
Logical drive capacity
You cannot configure a logical drive that optimizes all three factors, but it is easy to choose a logical drive configuration that maximizes one factor at the expense of another factor. For example, RAID 1 (mirroring) provides excellent fault tolerance, but requires a redundant drive. The following subsections describe how to use the RAID levels to maximize logical drive availability (fault tolerance), logical drive performance, and logical drive capacity.
Maximizing Fault Tolerance
Fault tolerance is achieved through the ability to perform automatic and transparent rebuilds using hot spare drives, and hot swaps. A hot spare drive is an unused online available drive that PERC 4/Di/Si and 4e/Di/Si instantly plug into the system when an active drive fails. After the hot spare is automatically moved into the RAID array, the failed drive is automatically rebuilt on the spare drive. The RAID array continues to handle requests while the rebuild occurs.
A hot swap is the manual substitution of a replacement unit in a disk subsystem for a defective one, where the substitution can be performed while the subsystem is running hot swap drives. Auto-Rebuild in the BIOS Configuration Utility allows a failed drive to be replaced and automatically rebuilt by "hot-swapping" the drive in the same drive bay. The RAID array continues to handle requests while the rebuild occurs, providing a high degree of fault tolerance and zero downtime. Table 2-10 describes the fault tolerance features of each RAID level.
Table 2-10. RAID Levels and Fault Tolerance
RAID Level
Fault Tolerance
0
Does not provide fault tolerance. All data lost if any drive fails. Disk striping writes data across multiple disk drives instead of just one disk drive. It involves partitioning each drive storage space into stripes that can vary in size. RAID 0 is ideal for applications that require high performance but do not require fault tolerance.
1
Provides complete data redundancy. If one disk drive fails, the contents of the other disk drive can be used to run the system and reconstruct the failed drive. The primary advantage of disk mirroring is that it provides 100% data redundancy. Since the contents of the disk drive are completely written to a second drive, no data is lost if one of the drives fails. Both drives contain the same data at all times. RAID 1 is ideal for any application that requires fault tolerance and minimal capacity.
5
Combines distributed parity with disk striping. Parity provides redundancy for one drive failure without duplicating the contents of entire disk drives. If a drive fails, the RAID controller uses the parity data to reconstruct all missing information. In RAID 5, this method is applied to entire drives or stripes across all disk drives in an array. Using distributed parity, RAID 5 offers fault tolerance with limited overhead.
10
Provides complete data redundancy using striping across spanned RAID 1 arrays. RAID 10 works well for any environment that requires the 100 percent redundancy offered by mirrored arrays. RAID 10 can sustain a drive failure in each mirrored array and maintain drive integrity.
50
Provides data redundancy using distributed parity across spanned RAID 5 arrays. RAID 50 includes both parity and disk striping across multiple drives. If a drive fails, the RAID controller uses the parity data to recreate all missing information. RAID 50 can sustain one drive failure per RAID 5 array and still maintain data integrity.
Maximizing Performance
A RAID disk subsystem improves I/O performance. The RAID array appears to the host computer as a single storage unit or as multiple logical units. I/O is faster because drives can be accessed simultaneously. Table 2-11 describes the performance for each RAID level.
Table 2-11. RAID Levels and Performance
RAID Level
Performance
0
RAID 0 (striping) offers the best performance of any RAID level. RAID 0 breaks up data into smaller blocks, then writes a block to each drive in the array. Disk striping writes data across multiple disk drives instead of just one disk drive. It involves partitioning each drive storage space into stripes that can vary in size from 8 KB to 128 KB. These stripes are interleaved in a repeated sequential manner. Disk striping enhances performance because multiple drives are accessed simultaneously.
1
With RAID 1 (mirroring), each drive in the system must be duplicated, which requires more time and resources than striping. Performance is impaired during drive rebuilds.
5
RAID 5 provides high data throughput, especially for large files. Use this RAID level for any application that requires high read request rates, but low write request rates, such as transaction processing applications, because each drive can read and write independently. Since each drive contains both data and parity, numerous writes can take place concurrently. In addition, robust caching algorithms and hardware based exclusive-or assist make RAID 5 performance exceptional in many different environments.
Parity generation can slow the write process, making write performance significantly lower for RAID 5 than for RAID 0 or RAID 1. Disk drive performance is reduced when a drive is being rebuilt. Clustering can also reduce drive performance. Environments with few processes do not perform as well because the RAID overhead is not offset by the performance gains in handling simultaneous processes.
10
RAID 10 works best for data storage that need the enhanced I/O performance of RAID 0 (striped arrays), which provides high data transfer rates. Spanning increases the size of the logical volume and improves performance by doubling the number of spindles. The system performance improves as the number of spans increases. (The maximum number of spans is eight.) As the storage space in the spans is filled, the system stripes data over fewer and fewer spans and RAID performance degrades to that of a RAID 1 or RAID 5 array.
50
RAID 50 works best when used with data that requires high reliability, high request rates, and high data transfer. It provides high data throughput, data redundancy, and very good performance. Spanning increases the size of the logical volume and improves performance by doubling the number of spindles. The system performance improves as the number of spans increases. (The maximum number of spans is eight.) As the storage space in the spans is filled, the system stripes data over fewer and fewer spans and RAID performance degrades to that of a RAID 1 or RAID 5 array.
Maximizing Storage Capacity
Storage capacity is an important factor when selecting a RAID level. There are several variables to consider. Mirrored data and parity data require more storage space than striping alone (RAID 0). Parity generation uses algorithms to create redundancy and requires less space than mirroring. Table 2-12 explains the effects of the RAID levels on storage capacity.
Table 2-12. RAID Levels and Capacity
RAID Level
Capacity
0
RAID 0 (disk striping) involves partitioning each drive storage space into stripes that can vary in size. The combined storage space is composed of stripes from each drive. RAID 0 provides maximum storage capacity for a given set of physical disks.
1
With RAID 1 (mirroring), data written to one disk drive is simultaneously written to another disk drive, which doubles the required data storage capacity. This is expensive because each drive in the system must be duplicated.
5
RAID 5 provides redundancy for one drive failure without duplicating the contents of entire disk drives. RAID 5 breaks up data into smaller blocks, calculates parity by performing an exclusive-or on the blocks, then writes the blocks of data and parity to each drive in the array. The size of each block is determined by the stripe size parameter, which is set during the creation of the RAID set.
10
RAID 10 requires twice as many drives as all other RAID levels except RAID 1. RAID 10 works well for medium-sized databases or any environment that requires a higher degree of fault tolerance and moderate to medium capacity. Disk spanning allows multiple disk drives to function like one big drive. Spanning overcomes lack of disk space and simplifies storage management by combining existing resources or adding relatively inexpensive resources.
50
RAID 50 requires two to four times as many parity drives as RAID 5. This RAID level works best when used with data that requires medium to large capacity.
RAID Availability
RAID Availability Concept
Data availability without downtime is essential for many types of data processing and storage systems. Businesses want to avoid the financial costs and customer frustration associated with downed servers. RAID helps you maintain data availability and avoid downtime for the servers that provide that data. RAID offers several features, such as spare drives and rebuilds, that you can use to fix any hard drive problems, while keeping the server(s) running and data available. The following subsections describe these features.
Spare Drives
You can use spare drives to replace failed or defective drives in an array. A replacement drive must be at least as large as drive it replaces. Spare drives include hot swaps, hot spares, and cold swaps.
A hot swap is the manual substitution of a replacement unit in a disk subsystem for a defective one, where the substitution can be performed while the subsystem is running (performing its normal functions). The backplane and enclosure must support hot swap in order for the functionality to work.
Hot spare drives are physical drives that power up along with the RAID drives and operate in a standby state. If a hard drive used in a RAID logical drive fails, a hot spare automatically takes its place and the data on the failed drive is rebuilt on the hot spare. Hot spares can be used for RAID levels 1, 5, 10, and 50.
NOTE: If a rebuild to a hotspare fails for any reason, the hotspare drive will be marked as "failed". If the
source drive fails, both the source drive and the hot spare drive will be marked as "failed".
A cold swap requires that you power down the system before replacing a defective hard drive in a disk subsystem.
Sector Re-assignment
Sector reassignment is done automatically by either the drive or the RAID firmware whenever a media defect is encountered.
Rebuilding
If a hard drive fails in an array that is configured as a RAID 1, 5, 10, or 50 logical drive, you can recover the lost data by rebuilding the drive. If you have configured hot spares, the RAID controller automatically tries to use them to rebuild failed disks. Manual rebuild is necessary if no hot spares with enough capacity to rebuild the failed drives are available. You must insert a drive with enough storage into the subsystem before rebuilding the failed drive.
Configuration Planning
Factors to consider when planning a configuration are the number of hard disk drives the RAID controller can support, the purpose of the array, and the availability of spare drives.
Each type of data stored in the disk subsystem has a different frequency of read and write activity. If you know the data access requirements, you can more successfully determine a strategy for optimizing the disk subsystem capacity, availability, and performance.
Servers that support video on demand typically read the data often, but write data infrequently. Both the read and write operations tend to be long. Data stored on a general-purpose file server involves relatively short read and write operations with relatively small files.
Number of Hard Disk Drives
Your configuration planning depends in part on the number of hard disk drives that you want to use in a RAID array. The number of drives in an array determines the RAID levels that can be supported. See Table 4-12 in Assigning RAID Levels for detailed information about the minimum and maximum number of hard drives supported by each RAID level for each RAID controller.
Array Purpose
Important factors to consider when creating RAID arrays include availability, performance, and capacity. Define the major purpose of the disk array by answering questions related to these factors, such as the following, which are followed by suggested RAID levels for each situation:
Will this disk array increase the system storage capacity for general-purpose file and print servers? Use RAID 5, 10, or 50.
Does this disk array support any software system that must be available 24 hours per day? Use RAID 1, 5, 10, or 50.
Will the information stored in this disk array contain large audio or video files that must be available on demand? Use RAID 0.
Will this disk array contain data from an imaging system? Use RAID 0 or 10.
Fill out Table 2-13 to help you plan the array configuration. Rank the requirements for your array, such as storage space and data redundancy, in order of importance, then review the suggested RAID levels. Refer to Table 4-12 for the minimum and maximum number of drives allowed per RAID level.
Table 2-13. Factors to Consider for Array Configuration