Verify that the power-supply cord and adapter cables are attached correctly. If the system is having trouble with read and write operations to a particular virtual disk or non-RAID physical disk (if the system hangs, for example), then make sure that the cables attached to the corresponding enclosure or backplane are secure. If the connection is secure but the problem persists, you may need to replace a cable. See also "Isolate Hardware Problems."
On SAS controllers, you should verify that the cable configuration is valid. Refer to the SAS hardware documentation for valid cable configurations. If the cable configuration is invalid, you may receive alerts "2182" or "2356."
System Requirements
Make sure that the system meets all system requirements. In particular, verify that the correct levels of firmware and drivers are installed on the system. For more information on drivers and firmware, see "Drivers and Firmware."
Drivers and Firmware
Storage Management is tested with the supported controller firmware and drivers. In order to function properly, the controller must have the minimum required version of the firmware and drivers installed. The most current versions can be obtained from the Dell Support website at support.dell.com.
NOTE: You can verify which firmware and drivers are installed by selecting the
Storage object in the tree view and clicking the Information/Configuration tab. You
can also check the Alert Log for alerts relating to unsupported firmware and driver
versions.
It is also recommended to obtain and apply the latest Dell PowerEdge Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge system documentation for more information.
Isolate Hardware Problems
If you receive a "timeout" alert related to a hardware device or if you otherwise suspect that a device attached to the system is experiencing a failure, then do the following to confirm the problem:
Verify that the cables are correctly attached.
If the cables are correctly attached and you are still experiencing the problem, then disconnect the device cables and reboot the system. If the system reboots successfully, then one of the devices may be defective. Refer to the hardware device documentation for more information.
Replacing a Failed Disk
You may need to replace a failed disk in the following situations:
Replacing a Failed Disk that is Part of a Redundant Virtual Disk
If the failed disk is part of a redundant virtual disk, then the disk failure should not result in data loss. You should replace the failed disk immediately, however, as additional disk failures can cause data loss.
If the redundant virtual disk has a hot spare assigned to it, then the data from the failed disk is rebuilt onto the hot spare. After the rebuild, the former hot spare functions as a regular physical disk and the virtual disk is left without a hot spare. In this case, you should replace the failed disk and make the replacement disk a hot spare.
Replacing a Failed Physical Disk that is Part of a Nonredundant Virtual
Disk
If the failed physical disk is part of a nonredundant virtual disk (such as RAID 0), then the failure of a single physical disk will cause the entire virtual disk to fail. To proceed, you need to verify when your last backup was, and if there is any new data that has been written to the virtual disk since that time.
If you have backed up recently and there is no new data on the disks that would be missed, you can restore from backup.
Do the following:
Delete the virtual disk which is currently in a failed state.
Remove the failed physical disk.
Insert a new physical disk.
Create a new virtual disk.
Restore from backup.
Using the Physical Disk Online Command on Select Controllers
If you do not have a suitable backup available, and if the failed disk is part of a virtual disk on a controller that supports the Online physical disk task, then you can attempt to retrieve data by selecting Online from the failed disk's drop-down task menu.
The Online command attempts to force the failed disk back into a Online state. If you are able to force the disk into a Online state, you may be able to recover individual files. How much data you can recover depends on the extent of disk damage. File recovery is only possible if a limited portion of the disk is damaged.
There is no guarantee you will be able to recover any data using this method. A forced Online does not fix a failed disk. You should not attempt to write new data to the virtual disk.
If the physical disk that you mistakenly removed is part of a redundant virtual disk that also has a hot spare, then the virtual disk rebuilds automatically either immediately or when a write request is made. After the rebuild has completed, the virtual disk will no longer have a hot spare since data has been rebuilt onto the disk previously assigned as a hot spare. In this case, you should assign a new hot spare.
If the physical disk that you removed is part of a redundant virtual disk that does not have a hot spare, then replace the physical disk and do a rebuild.
See the following sections for information on rebuilding physical disks and assigning hot spares:
You can avoid removing the wrong physical disk by blinking the LED display on the physical disk that you intend to remove. See "Blink and Unblink (Physical Disk)" for information on blinking the LED display.
Resolving Microsoft® Windows® Upgrade Problems
If you upgrade the Microsoft Windows operating system on a server, you may find that Storage Management no longer functions after the upgrade. The installation process installs files and makes registry entries on the server that are specific to the operating system. For this reason, changing the operating system can disable Storage Management.
To avoid this problem, you should uninstall Storage Management before upgrading. If you have already upgraded without uninstalling Storage Management, however, you should uninstall Storage Management after the upgrade.
After you have uninstalled Storage Management and completed the upgrade, reinstall Storage Management using the Storage Management install media. You can download Storage Management from the Dell Support website support.dell.com.
Virtual Disk Troubleshooting
The following sections describe troubleshooting procedures for virtual disks.
The hot spare has been unassigned from the virtual disk. This could happen on some controllers if the hot spare was assigned to more than one virtual disk and has already been used to rebuild a failed physical disk for another virtual disk.
A physical disk has been removed, and the system has not yet attempted to write data to the removed disk. In this case, the system will not recognize the removal of a physical disk until it attempts a write operation to the disk. If the physical disk is part of a redundant virtual disk, then the system will rebuild the disk after attempting a write operation.
The virtual disk includes failed or corrupt physical disks. This situation may generate alert "2083." See alert "2083" for more information.
The rebuild rate setting is too low. If the rebuild rate setting is quite low and the system is processing a number of operations, then the rebuild may take an unusual amount of time to complete. See "Set Rebuild Rate" for more information.
The rebuild was cancelled. Another user can cancel a rebuild that you have initiated.
Cannot Create a Virtual Disk
You might be attempting a RAID configuration that is not supported by the controller. Check the following:
How many virtual disks already exist on the controller? Each controller supports a maximum number of virtual disks. See "Maximum Number of Virtual Disks per Controller" for more information.
Is there adequate available space on the disk? The physical disks that you have selected for creating the virtual disk must have an adequate amount of free space available.
The controller may be performing other tasks, such rebuilding a physical disk, that must run to completion before the controller can create the new virtual disk.
A Virtual Disk of Minimum Size is Not Visible to Windows Disk
Management
If you create a virtual disk using the minimum allowable size in Storage Management, the virtual disk may not be visible to Windows Disk Management even after initialization. This occurs because Windows Disk Management is only able to recognize extremely small virtual disks if they are dynamic. It is generally advisable to create virtual disks of larger size when using Storage Management.
Virtual Disk Errors on Linux
On some versions of the Linux operating system, the virtual disk size is limited to 1TB. If you create a virtual disk that exceeds the 1TB limitation, your system may experience the following behavior:
I/O errors to the virtual disk or logical drive
Inaccessible virtual disk or logical drive
Virtual disk or logical drive size is smaller than expected
If you have created a virtual disk that exceeds the 1TB limitation, you should do the following:
Back up your data.
Delete the virtual disk.
Create one or more virtual disks that are smaller than 1TB.
Restore your data from backup.
Whether or not your Linux operating system limits virtual disk size to 1TB depends on the version of the operating system and any updates or modifications that you have implemented. See your operating system documentation for more information.
Problems Associated With Using the Same Physical Disks for Both
Redundant and Nonredundant Virtual Disks
When creating virtual disks, you should avoid using the same physical disks for both redundant and nonredundant virtual disks. This recommendation applies to all controllers. Using the same physical disks for both redundant and nonredundant virtual disks can result in unexpected behavior including data loss.
NOTE: SAS controllers do not allow you to create redundant and nonredundant
virtual disks on the same set of physical disks.
Specific Problem Situations and Solutions
This section contains additional trouble-shooting problem areas. Topics include:
Physical Disk is Offline or Displays an Error Status
A physical disk may display an error status if it has been damaged, taken offline, or was a member of a virtual disk that has been deleted or initialized. The following actions may resolve the error condition:
If a user has taken the disk offline, then return the disk to Online status by executing the Online disk task.
Rescan the controller. This action updates the status of storage objects attached to the controller. If the error status was caused by deleting or initializing a virtual disk, rescanning the controller should resolve this problem.
Investigate whether there are any cable, enclosure, or controller problems preventing the disk from communicating with the controller. If you find a problem and resolve it, you may need to rescan the controller to return the disk to Online or Ready status. If the disk does not return to Online or Ready status, reboot the system.
A Disk is Marked as Failed When Rebuilding in a Cluster Configuration
When a system in a cluster attempts to rebuild a failed disk but the rebuild fails, then another system takes over the rebuild. In this situation, you may notice that the rebuilt disk continues to be marked as failed on both systems even after the second system has rebuilt successfully. To resolve this problem, perform a rescan on both systems after the rebuild completes successfully.
Receive a "Bad Block" Alert with "Replacement," "Sense," or
"Medium" Error
The following alerts or events are generated when a portion of a physical disk is damaged:
This damage is discovered when the controller performs an operation that requires scanning the disk. Examples of operations that may result in these alerts are as follows:
Consistency check
Rebuild
Virtual disk format
I/O
If you receive an alerts 2146 through 2150 as the result of doing a rebuild or while the virtual disk is in a degraded state, then data cannot be recovered from the damaged disk without restoring from backup. If you receive alerts 2146 through 2150 under circumstances other than a rebuild, then data recovery may be possible. The following describes each of these situations.
Alerts 2146 through 2150 Received during a Rebuild or while a Virtual Disk is Degraded
Do the following if you receive alerts 2146 through 2150 during a rebuild or while the virtual disk is in a degraded state:
Replace the damaged physical disk.
Create a new virtual disk and allow the virtual disk to completely
resynchronize. While the resynchronization is in progress, the status of the
virtual disk will be Resynching.
Restore data to the virtual disk from backup.
Alerts 2146 through 2150 Received while Performing I/O, Consistency Check, Format, or Other Operation
If you receive alerts 2146 through 2150 while performing an operation other than a rebuild, you should replace the damaged disk immediately to avoid data loss.
Do the following:
Back up the degraded virtual disk to a fresh (unused) tape.
Replace the damaged disk.
Do a rebuild.
Read and Write Operations Experience Problems
If the system is hanging, timing out, or experiencing other problems with read and write operations, then there may be a problem with the controller cables or a device. For more information, see "Cables Attached Correctly" and "Isolate Hardware Problems."
A Task Menu Option is Not Displayed
You may notice that the task menus do not always display the same task options. This is because Storage Management only displays those tasks that are valid at the time the menu is displayed. Some tasks are only valid for certain types of objects or at certain times. For example, a Check Consistency task can only be performed on a redundant virtual disk. Similarly, if a disk is already offline, the Offline task option is not displayed.
There may be other reasons why a task cannot be run at a certain time. For example, there may already be a task running on the object that must complete before additional tasks can be run.
A Corrupt Disk or Drive Message Suggests Running autocheck During a
Reboot
Let autocheck run, but do not worry about the message. The reboot will complete after autocheck is finished. Depending on the size of your system, this may take about ten minutes.
Erroneous Status and Error Messages after a Windows Hibernation
Activating the Windows hibernation feature may cause Storage Management to display erroneous status information and error messages. This problem resolves itself when the Windows operating system recovers from hibernation.
Storage Management May Delay Before Updating Temperature Probe
Status
In order to display the enclosure temperature and temperature probe status, Storage Management polls the enclosure firmware at regular intervals to obtain temperature and status information. On some enclosures, there is a short delay before the enclosure firmware reports the current temperature and temperature probe status. Because of this delay, Storage Management may require one or two minutes before displaying the correct temperature and temperature probe status.
You are Unable to Log into a Remote System
Access can be denied here if you do not enter a user name and password that match an administrator account on the remote computer or if you mistype the login information. The remote system may also not be powered on or there may be network problems.
Cannot Connect to Remote Windows Server 2003 System
When connecting to a remote Windows Server 2003 system, you must log into the remote system using an account that has administrator privileges. By default, Windows Server 2003 does not allow anonymous (null) connections to access the SAM user accounts. Therefore, if you are attempting to connect using an account that has a blank or null password, the connection may fail.
Reconfiguring a Virtual Disk Displays Error in Mozilla Browser
When reconfiguring a virtual disk using the Mozilla browser, the following error message may display:
Although this page is encrypted, the information you
have entered is to be sent over an unencrypted
connection and could easily be read by a third party.
You can disable this error message by changing a Mozilla browser setting. To disable this error message:
Select Edit and then Preferences.
Click Privacy and Security.
Click SSL.
Uncheck the "Sending form data from an unencrypted page to an
unencrypted page" option.
Physical Disks Display Under Connector Not Enclosure Tree Object
You can resolve this problem by restarting the Server Administrator service or by rebooting the system. For more information on restarting the Server Administrator service, see the Dell OpenManage Server Administrator User's Guide.