Go to Table of Contents for Storage Management Online Help
Dell OpenManage Storage Management User's Guide
Common Troubleshooting Procedures
Specific Problem Situations and Solutions
This section contains troubleshooting procedures for common situations as well as for specific problems.
This section describes commands and procedures that can be used in troubleshooting. Topics covered include:
Verify that the power-supply cord and adapter cables are attached correctly. If the system is having trouble with read and write operations to a particular array (if the system hangs, for example), then make sure that the SCSI cables attached to the array are secure. If the connection is secure but the problem persists, you may need to replace a cable. See also the "Isolate SCSI device problems" section.
Make sure that the system meets all system requirements. In particular, verify that the correct levels of firmware and drivers are installed on the system. For more information on drivers and firmware, see the "Drivers and Firmware" section.
Storage Management is tested with the supported controller firmware and drivers. To avoid possible conflicts or inconsistencies between the controller firmware and drivers, it is recommended that you only use the supported versions. The most current versions can be obtained from the Dell support site at http://support.dell.com.
It is also recommended to obtain and apply the latest Dell PowerEdge Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge system documentation for more information.
If you receive a "timeout" alert related to a SCSI device or if you otherwise suspect that one of the SCSI devices is experiencing a hardware failure, then do the following to confirm the problem:
Use the Rescan controller task to update information for the controller and attached devices. This operation may take a few minutes if there are a number of devices attached to the controller. You will see a message "Getting hardware configuration. Please wait." while the rescan is occurring.
If this does not properly update the disk information, you may need to reboot your system.
You may need to replace a failed disk in the following situations:
If the failed disk is part of a redundant virtual disk, then the disk failure should not result in data loss. You should replace the failed disk immediately, however, as additional disk failures can cause data loss.
If the redundant virtual disk has a hot spare assigned to it, then the data from the failed disk is rebuilt onto the hot spare. After the rebuild, the former hot spare functions as a regular array disk and the virtual disk is left without a hot spare. In this case, you should replace the failed disk and make the replacement disk a hot spare.
![]() |
Note: If the failed disk is attached to a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, or CERC ATA100/4ch controller, you can attempt to recover data from the disk by using the procedure described in Using the Array Disk Online Command on Select Controllers before continuing with the following procedure. |
Replacing the Disk:
A rebuild is automatically initiated because the virtual disk is redundant.
Assigning a Hot Spare:
If a hot spare was already assigned to the virtual disk, then data from the failed disk may already be rebuilt onto the hot spare. In this case, you need to assign a new hot spare. See Assign and Unassign Dedicated Hot Spare and Assign and Unassign Global Hot Spare for more information.
If the failed disk is part of a non-redundant virtual disk (such as RAID 0), then the failure of a single disk will cause the entire virtual disk to fail. To proceed, you need to verify when your last backup was, and if there is any new data that has been written to the disk since that time.
If you have backed up recently and there is no new data on the drives that would be missed, you can restore from backup.
![]() |
Note: If the failed disk is attached to a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, or CERC ATA100/4ch controller, you can attempt to recover data from the disk by using the procedure described in Using the Array Disk Online Command on Select Controllers before continuing with the following procedure. |
Do the following:
If you do not have a suitable backup available, and if the failed disk is part of a virtual disk on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, or CERC ATA100/4ch controller, then you can attempt to retrieve data by right-clicking the failed disk and selecting Online from the pop-up menu.
The Online command attempts to force the failed disk back into a Ready state. If you are able to force the disk into a Ready state, you may be able to recover individual files. How much data you can recover depends on the extent of disk damage. File recovery is only possible if a limited portion of the disk is damaged.
There is no guarantee you will be able to recover any data using this method. A forced Online does not fix a failed disk. You should not attempt to write new data to the virtual disk.
After retrieving any viable data from the disk, replace the failed disk as described previously in Replacing a Failed Disk that is Part of a Redundant Virtual Disk or Replacing a Failed Disk that is Part of a Non-redundant Virtual Disk.
On a CERC SATA1.5/2s controller, a rebuild may not start automatically when you replace a failed array disk that is part of a RAID 1 virtual disk. In this circumstance, use the following procedure to replace the failed array disk and rebuild the redundant data.
If the drive that you mistakenly removed is part of a redundant virtual disk that also has a hot spare, then the virtual disk rebuilds automatically either immediately or when a write request is made. After the rebuild has completed, the virtual disk will no longer have a hot spare since data has been rebuilt onto the disk previously assigned as a hot spare. In this case, you should assign a new hot spare.
If the drive that you removed is part of a redundant virtual disk that does not have a hot spare, then replace the drive and do a rebuild.
See the following sections for information on rebuilding drives and assigning hot spares:
You can avoid removing the wrong drive by blinking the LED display on the drive that you intend to remove. See Blink and Unblink for information on blinking the LED display.
If you upgrade the Windows operating system on a server, you may find that Storage Management no longer functions after the upgrade. The installation process installs files and makes registry entries on the server that are specific to the operating system. For this reason, changing the operating system can disable Storage Management.
To avoid this problem, you should uninstall Storage Management before upgrading. If you have already upgraded without uninstalling Storage Management, however, you should uninstall Storage Management after the upgrade.
After you have uninstalled Storage Management and completed the upgrade, reinstall Storage Management using the Storage Management install media. You can download Storage Management from the Dell support site at http://support.dell.com.
This section contains additional trouble-shooting problem areas. Topics include:
A rebuild will not work in the following situations:
You might be attempting a RAID configuration that is not supported by the controller. Check the following:
If there are no virtual disks configured at boot time on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, 4/Di, or CERC ATA100/4ch controller on Windows 2000, the Windows disk driver may not be loaded. The solution is to reboot after creating the first virtual disk or create the first virtual disk in the BIOS. Use the Ctrl-m key sequence when prompted during the reboot to invoke the BIOS utility.
An array disk may display an error status if it has been damaged, taken offline, or was a member of a virtual disk that has been deleted or initialized. The following actions may resolve the error condition:
When a system in a cluster attempts to rebuild a failed disk but the rebuild fails, then another system takes over the rebuild. In this situation, you may notice that the rebuilt disk continues to be marked as failed on both systems even after the second system has rebuilt successfully. To resolve this problem, perform a rescan on both systems after the rebuild completes successfully.
When you do a Prepare to Remove command on an array disk attached to a PERC 4/Di controller, you may find that the disk does not display in the Storage Management tree view even after doing a rescan or a reboot.
In this case, do the following to redisplay the disk in the Storage Management tree view:
In some situations, a rebuild may complete successfully while also reporting errors. This may occur when a portion of the disk containing redundant (parity) information is damaged. The rebuild process can restore data from the healthy portions of the disk but not from the damaged portion.
When a rebuild is able to restore all data except data from damaged portions of the disk, it will indicate successful completion while also generating alert 2163. The rebuild may also report sense key errors. In this situation, take the following actions to restore the maximum data possible:
The following alerts or events are generated when a portion of an array disk is damaged:
This damage is discovered when the controller performs an operation that requires scanning the disk. Examples of operations that may result in these alerts are as follows:
If you receive an alerts 2146 through 2150 as the result of doing a rebuild or while the virtual disk is in a degraded state, then data cannot be recovered from the damaged disk without restoring from backup. If you receive alerts 2146 through 2150 under circumstances other than a rebuild, then data recovery may be possible. The following describes each of these situations.
Do the following if you receive alerts 2146 through 2150 during a rebuild or while the virtual disk is in a degraded state:
If you receive alerts 2146 through 2150 while performing an operation other than a rebuild, you should replace the damaged disk immediately to avoid data loss.
Do the following:
If you install a PERC 2/SC or 2/DC controller after you have already installed Storage Management, you may experience problems with the system hanging or performance problems. Reinstall Storage Management to resolve these problems.
If the system is hanging, timing out, or experiencing other problems with read and write operations, then there may be a problem with the controller cables or a SCSI device. For more information, see the "Cables attached correctly" and "Isolate SCSI device problems" sections.
If you have implemented channel redundancy on a PERC 2/SC, 2/DC, 3/SC, 3/DCL, 3/DC, 3/QC, 4/SC, 4/DC, 4e/DC, or 4/Di controller, a failure of one channel causes I/O to stop on the other channels included in the channel-redundant configuration. For the resolution to this problem, see Channel Redundancy on PERC 3/DCL, 3/DC, 3/QC, 4/DC, 4e/DC, 4/Di, and 4e/Di Controllers.
When a menu option is inactive, the task cannot be performed on the object at this time. Certain tasks or only valid for certain types of objects or at certain times. For example, a Check Consistency task can only be performed on a redundant virtual disk. Similarly, if a disk is already offline, the Offline menu option is inactive.
There may be other reasons why a task cannot be run at a certain time. For example, there may already be a task running on the object that must complete before additional tasks can be run.
Storage Management displays "The stripe depth is out of range" error message when you attempt to apply a RAID 0 or RAID 5 to more array disks than the controller can support in a single virtual disk. For example, the PERC 4/SC and 4/DC controllers can support up to 32 array disks in a virtual disk when using RAID 0 or RAID 5. Attempting to create a RAID 0 or RAID 5 using more than 32 array disks on these controllers causes this error message to be displayed.
Let autocheck run, but do not worry about the message. The reboot will complete after autocheck is finished. Depending on the size of your system, this may take about ten minutes.
Activating the Windows hibernation feature may cause Storage Management to display erroneous status information and error messages. This problem resolves itself when the Windows operating system recovers from hibernation.
Access can be denied here if you do not enter a user name and password that match an administrator account on the remote computer or if you mistype the login information. The remote system may also not be powered on or there may be network problems.
When connecting to a remote Windows Server 2003 system, you must log into the remote system using an account that has administrator privileges. By default, Windows Server 2003 does not allow anonymous (null) connections to access the SAM user accounts. Therefore, if you are attempting to connect using an account that has a blank or "null" password, the connection may fail.
Go to Table of Contents for Storage Management Online Help