This chapter contains status message information, troubleshooting procedures, and common problems and solutions. It also has a separate section for troubleshooting the Dell PowerVault 660F and 224F storage systems.
If a disk or volume fails, it is important to repair the disk or volume as quickly as possible to avoid data loss. Because time is critical, Array Manager makes it easy for you to locate problems quickly. In the Status column of the list view, you can view the status of a disk or volume. The status also appears in the graphical view of each disk or volume. If the status is not Healthy for volumes or Online for disks, use this section to determine the problem and then fix it. Topics include:
One of the following disk status descriptions will always appear in the Status column of the disk in the right pane of the console window. If there is a problem with a disk, you can use this troubleshooting chart to diagnose and correct the problem
.
Status
Meaning
Online
The disk is accessible and has no known problems. This is the normal disk status. No user action is required. Both dynamic disks and basic disks display the Online status.
Online (Errors)
This status indicates that the disk is in an error state or that I/O errors have been detected on a region of the disk. All the volumes on the disk will display Failed or Failed Redundancy status, and you may not be able to create new volumes on the disk. Only dynamic disks display this status.
Right-click on the failed disk and select Reactivate Disk to bring the disk to an Online status and bring all the volumes to a Healthy status.
Offline
The disk is not accessible. The disk may be corrupted or intermittently unavailable. An error icon appears on the offline disk. Only dynamic disks display the Offline status.
If the disk status is Offline and a separate corresponding icon titled Missing, Disk appears, the disk was recently available on the system but can no longer be located or identified. The Missing disk may be corrupted, powered down, or disconnected, or the disk may be a virtual disk that has been deleted.
Unreadable
The disk is not accessible. The disk may have experienced hardware failure, corruption, or I/O errors. The disk's copy of the system's disk configuration database may be corrupted. An error icon appears on the Unreadable disk. Both dynamic and basic disks display the Unreadable status.
Disks may display the Unreadable status while they are spinning up or when Array Manager is rescanning all the disks on the system. In some cases, an Unreadable disk has failed and is not recoverable. For dynamic disks, the Unreadable status usually results from corruption or I/O errors on part of the disk, rather than failure of the entire disk. You can rescan the disks (using the Rescan Disks command) or reboot the computer to see if the disk status changes.
Unrecognized
The disk has an original equipment manufacturer's (OEM) signature and Array Manager will not allow you to use this disk. For example, a disk from a UNIX system displays the Unrecognized status. Only Unknown disk types display the Unrecognized status.
Foreign Disk
The disk has been moved to your computer from another Microsoft® Windows NT® or Windows® 2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click on the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.
Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show the Failed Redundancy or Failed error condition.
One of the following volume status descriptions will always appear in the graphical view of the volume and in the Status column of the volume in list view. If there is a problem with a volume, you can use this troubleshooter to diagnose and correct the problem.
Status
Meaning
Healthy
The volume is accessible and has no known problems. This is the normal volume status. No user action is required. Both dynamic volumes and basic volumes display the Healthy status.
Healthy (At Risk)
The volume is currently accessible, but I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the Healthy (At Risk) status. A warning icon appears on the volume. Only dynamic volumes display the Healthy (At Risk) status.
When the volume status is Healthy (At Risk), an underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to Online status, the volume should return to the Healthy status.
Initializing
The volume is being initialized. Dynamic volumes display the Initializing status.
No user action is required. When initialization is complete, the volume's status becomes Healthy. Initialization should be completed very quickly.
Resynching
The volume's mirrors are being resynchronized so that both mirrors contain identical data. Both dynamic and basic mirrored volumes display the Resynching status.
No user action is required. When resynchronization is complete, the mirrored volume's status returns to Healthy. Resynchronization may take some time, depending on the size of the mirrored volume. Although you can access a mirrored volume while resynchronization is in progress, you should avoid making configuration changes (such as breaking a mirror) during resynchronization.
Regenerating
Data and parity are being regenerated for the RAID-5 volume. Both dynamic and basic RAID-5 volumes display the Regenerating status.
No user action is required. When regeneration is complete, the RAID-5 volume's status returns to Healthy. You can access a RAID-5 volume while data and parity regeneration is in progress.
Failed Redundancy
The data on the volume is no longer fault tolerant because one of the underlying disks is not online. A warning icon appears on the volume with Failed Redundancy. The Failed Redundancy status applies only to mirrored or RAID-5 volumes. Both dynamic and basic volumes display the Failed Redundancy status.
You can continue to access the volume using the remaining online disks, but if another disk that contains the volume fails, you will lose the volume and its data. To avoid such loss, you should attempt to repair the volume as soon as possible.
A Failed Redundancy status will also display if a disk was moved and the volume on it spanned more than the single disk. To correct the problem, you must move the entire disk set that contains all the appropriate volumes.
Failed Redundancy (At Risk)
The data on the volume is no longer fault tolerant, and I/O errors have been detected on the underlying disk. If an I/O error is detected on any part of a disk, all volumes on the disk display the (At Risk) status. A warning icon appears on the volume. Only dynamic mirrored or RAID-5 volumes display the Failed Redundancy (At Risk) status.
When the volume status is Failed Redundancy (At Risk), the underlying disk's status is usually Online (Errors). To return the underlying disk to the Online status, reactivate the disk (using the Reactivate Disk command). Once the disk is returned to the Online status, the volume status should change to Failed Redundancy.
Failed
The volume cannot be started automatically. An error icon appears on the failed volume. Both dynamic and basic volumes display the Failed status.
Formatting
The volume is being formatted using the specifications you chose for formatting.
No Media
No media has been inserted into the CD-ROM or removable drive. The volume status will become Online when you insert the appropriate media into the CD-ROM or removable drive. Only CD-ROM or removable disk types display the No Media status.
These definitions appear in the Status line and indicate the condition of array disks.
Status line entry
Status indication
Unknown
May signify a problem or indicate a transitional state. Additionally, a new disk that had previously been formatted or initialized by another type of RAID controller may show this state.
Ready
Means the array disk is operational. For PERC 2/SC, 2/DC, 3/DCL, 3/DC, and 3/QC controllers, Ready status applies to operational array disks that are not part of a virtual disk.
For the PERC 2, PERC 2/Si, PERC 3/Si, and PERC 3/Di controllers, operational array disks display Ready status regardless of whether they are a part of a virtual disk or not.
Failed
Not operational. A disk needs repair, has been removed, or has another problem that prevents operation.
Online
Operational. Applies to array disks contained in a virtual disk on PERC 2/SC, 2/DC, 3/DCL, 3/DC, and 3/QC controllers.
Offline
The drive is not available to the RAID controller.
Degraded
Refers to a fault-tolerant array/virtual disk that has a failed disk.
Recovering
Refers to state of recovering from bad blocks on disks.
Removed
Indicates that array disk has been removed.
Resynching
This state definition appears during the following types of disk operations: Transform Type, Reconfiguration, and Check Consistency.
Rebuilding
Refers to part of a virtual disk being rebuilt. The Global status is used on multiple objects.
No Media
CD-ROM or removable disk has no media. The Global status is used on multiple objects.
Formatting
Refers to array disk in process of formatting.
Diagnostics
Indicates that diagnostics are running. The Global status is used on multiple objects.
Reconstructing
The configuration of a virtual disk has been changed. The individual array disks within the virtual disk are being modified to support the changes. The data on the virtual disk will be saved. You cannot cancel a virtual disk reconstruction.
Initializing
Applies only to virtual disks on PERC 2/SC, 2/DC, 3/DCL, 3/DC, and 3/QC controllers. This prepares the virtual disk for use by Array Manager by deleting the configuration information on this virtual disk. The data on the virtual disk will be lost.
Use Rescan to update disk information. This operation may take a few minutes if there are a number of devices attached to the system. You will see a message "Getting hardware configuration. Please wait." while the rescan is occurring.
If this does not properly update the disk information, you may need to reboot your system.
Reboot your machine to update the list of existing disks.
Right-click the disk marked Missing or Offline dynamic disk.
Use Rescan to change the disk status to Online (errors).
Right-click the disk marked Missing or Offline dynamic disk. Select Reactivate Disk from the context menu. The disk should be marked Online after the disk is reactivated.
For any volumes that are not Healthy, right-click the volume from the context menu and select Reactivate Volume.
A RAID 5 volume's status can appear asFailed Redundancyand the disk's status is Offline. The disk's name may be Missing, and an error icon (X) appears on the missing or offline disk. In this case, do the following.
Rescan the disk to make sure the disk, controller, or cable problem is fixed.
Try to reactivate the disk by right-clicking on the disk and selecting Reactivate Disk.
If the volume remains as Failed Redundancy or Failed, right-click on the volume, then select Reactivate Volume. If all disks on this volume are Online, the volume should be brought back to a healthy state. See Reactivate a Dynamic Volume for more information on the consequences of reactivating a volume.
Reactivating a volume attempts to restart all volumes regardless of the volume's state. If data corruption exists, you can reactivate the volume and then run the chkdsk utility. However, in the case of a mirrored or RAID-5 volume, reactivating a volume with stale data can cause that data to be used when it is inaccurate.
Reactivating a volume should be done only if you understand that the volume's data, which might be corrupted, will be restored. For example, if one mirror in a mirrored volume fails and data is written to the remaining mirror, the data is now out of sync. Then, if the remaining mirror (the one with accurate data) fails and the first mirror is reactivated, the stale data becomes "real" data.
For this reason, it is important to act on data failures as soon as possible. You should use care when reactivating volumes.
If the disks are not online, use the Rescan and then the Reactivate Disk commands to return the disk to the Online status. If this succeeds, the volume automatically restarts and returns to the Healthy status. A mirrored volume repairs itself by resynchronizing the data in its mirrors. A RAID-5 volume repairs itself by regenerating its parity and data.
If the disk returns to the Online status but the volume does not return to the Healthy status, you can reactivate the volume manually (using the Reactivate Volume command).
If the volume is a mirrored or RAID-5 volume with stale data, bringing the underlying disk online will not automatically restart the volume. If the disks that contain non-stale data are disconnected, you should bring those disks online first (to allow the data to become synchronized). Otherwise, restart the mirrored or RAID-5 volume manually (using the Reactivate Disk command), and then run Chkdsk.exe. To run Chkdsk.exe, click Start, click Run, type chkdsk, and then click OK.
If the disk does not return to the Online status and the volume does not return to the Healthy status, there may be something wrong with the disk. You should replace the failed mirror or RAID-5 disk region. To replace the failed mirror in a mirrored volume, use the Remove Mirror commandto remove the failed mirror, then use the Add Mirror command to create a new mirror on another disk. To replace the failed disk region in a RAID-5 volume, use the Repair RAID-5 Volume command.
Right-click on volume, then click Repair RAID-5 volume.
A message appears that indicates that the repair will be attempted if there is another dynamic disk with adequate unallocated space. Click Yesto confirm the repair.
The volume should be brought back to a healthy state.
You should be able to repair a RAID-5 volume if it is in a state of Failed Redundancy, and if there is unallocated space on another dynamic disk available. To avoid data loss, you should attempt to repair the volume as soon as possible.
Make sure that the underlying physical disk is turned on, plugged in, and attached to the computer. No other user action is possible for basic volumes unless the volumes are mirrored or RAID-5 volumes that were originally created in NT Disk Administrator. The repair of these volumes is covered in the next topic.
Use Microsoft Windows NT Disk Administrator to repair basic mirrored or RAID-5 volumes if you are running Windows NT 4.0. For Windows 2000, there is a command available form the context menu for repairing basic mirrored or RAID-5 volumes.
CAUTION! In Windows NT 4.0, Disk Administrator should never be
used while Array Manager is running, especially if there are
tasks running on the controller at the time. Data loss can
occur if both applications are running simultaneously.
Array Manager is tested with the controller firmware and drivers provided on the CD. To avoid possible conflicts or inconsistencies between the controller firmware and drivers, it is recommended to use these firmware and driver versions, or later. The most current versions can be obtained from Dell's web site at:
http://support.dell.com/us/en/filelib/
It is also recommended to obtain and apply the latest Dell PowerEdge Server System BIOS on a periodic basis to benefit from the most recent improvements. Please refer to the Dell PowerEdge System Documentation for more information.
Note: If you are using the Dell PowerVault 660F RAID controller and
the PowerVault 224F enclosure, see the next major section, Dell
PowerVault 660F and 224F Storage Systems Troubleshooting, for
additional issues specific to that controller and enclosure.
How many virtual disks exist? You can create a maximum of 8 virtual disks on one PERC, PERC 2/SC, or PERC 2/DC controller. You can create a maximum of 24 virtual disks on a PERC 2, PERC 2/Si, PERC 3/Si, or PERC 3/Di controller, and 40 virtual disks on a PERC 3/DCL, PERC 3/DC, or PERC 3/QC controller.
Is there adequate unallocated space on the disk? You must have adequate available disk space to create a virtual disk.
Is there adequate unallocated space on three or more disks? You must have at least three disks to create a RAID-5 volume.
Are you using an NT Workstation or Windows 2000 Professional machine? You cannot create a RAID-5 volume on those machines. This restriction applies to software RAID only (that is, dynamic volumes). You need to use an NT Server or Windows 2000 Server machine. Another option is that you can implement a hardware RAID-5 virtual disk and then create a volume on the virtual disk.
Is there adequate unallocated space on two different disks? You must have two disks to create a mirrored volume. You also can create a mirror only on a simple or spanned volume.
Are you using an NT Workstation or Windows 2000 Professional machine? You cannot create a RAID-1 volume on those machines. This restriction applies to software RAID only (that is, dynamic volumes). You need to use an NT Server or Windows 2000 Server machine. Another option is that you can implement a hardware RAID-1 virtual disk and then create a volume on the virtual disk.
Microsoft Windows NT/2000 is not aware of the status of these disks. Most likely, the virtual disks that were associated with these have been deleted.
Check:
To remove these error status objects from the Disks node, the computer must be restarted to allow Windows NT/2000 to find the current information.
Situation:
If the type of disk shows No Signature, you need to write a signature to the disk. When creating a new virtual disk, the software must write a signature to the virtual disk that prepares it for use. This signature is not written automatically in case this disk has been merged from another operating system and the configuration information needs to be kept intact.
Check:
To write the configuration data to a disk, right-click on the disk under the Disks node and choose Write Signature.
Once you have repaired the disk, controller, or cable problem, you need to:
Rescan to see the disk within Array Manager. If Array Manager finds the disk, this should bring the disk Online. If Array Manager does not find the disk, a reboot may be required.
Reactivate Disk to bring all the volumes on the disk to the Healthy status.
The remote computer that you were connected to has been disconnected from your console. Most often, there is a problem with the network connection and the transmissions timed out. This can also occur if the remote machine was restarted or the service on the remote machine was stopped.
Check:
Make sure that the remote machine is turned on and is available to the network, and that the service is started. Reconnect to the resource.
The installation detects any drivers that you have installed for PowerEdge RAID controllers. If these drivers (and/or the card itself) are installed after the software is installed, support for the controller will need to be added.
Check:
Close the console. Open the Array Manager Service Manager and check the box next to the appropriate controller. This action will restart the service, and the disks should be available the next time you launch the console.
If this was a virtual disk, then check that the virtual disk still exists. If it no longer exists, use the Remove Diskcommand to remove the disk from the list of disks.
Repair any disk, controller, or cable problems and make sure that the physical disk is turned on, plugged in, and attached to the computer. From the View pull-down menu, select Rescan. The disk should change from Offline to Online, but the volumes remain Failed. (If they do not change to Online, you may need to reboot.) Right-click on the disk and select Reactivate Disk. The disk status changes to Healthy. (You can also select each volume one at a time and select Reactivate Volume.) It is recommended you do a chkdsk.
If the disk status remains Offline and Missing and you determine that the disk has a problem that cannot be repaired, you can remove the disk from the system (using the Remove Disk command). However, before you can remove the disk, you must delete all volumes on the disk. You can save any mirrored volumes on the disk by removing the mirror that is on the Missing disk instead of the entire volume. Deleting a volume destroys the data in the volume, so you should remove a disk only if you are absolutely certain that the disk is permanently damaged and unusable.
Use the Reactivate Disk command to bring the disk back online. If the disk status remains Offline, check the cables and disk controller, and make sure that the physical disk is healthy. Correct any problems and try to reactivate the disk again. If the disk reactivation succeeds, any volumes on the disk should automatically return to the Healthy status.
The disk has been moved to your computer from another Microsoft Windows NT/2000 computer and has not been set up for use. Only dynamic disks display this status. To add the disk so that it can be used, right-click on the disk and select Merge Foreign Disk. All existing volumes on the disk will be visible and accessible.
Because a volume can span more than one disk (e.g., a mirrored volume), it is important that you first verify your disk configurations and then move the entire disk set that the volume is on. If only part of the disk set is moved, some of the volumes will show Failed Redundancy or Failed error condition.
The Help file uses a technology known as HTML Help, a Microsoft standard. Some software will attempt to update the core files with an older version of HTML Help and make Array Manager's Help file unusable. The required HTML Help update is located on the Array Manager CD-ROM in the Help Update folder. Double-click on HHUPD.EXE and follow the instructions.
Let autocheck run, but do not worry about the message. Autocheck will finish and the reboot will be complete. If you have a large system (more than 1 gigabyte), this may take about 10 minutes.
This occurs when you log in to the local computer originally as a local user, local administrator, or domain user and the remote computer is not in your domain or a trusted domain. The Windows security model does not allow you to have access under these circumstances. The workaround is to log in to your local computer with an account that has the same user name and password as an administrator account on the remote computer.
Access can be denied here if you do not type in a user name and password that match a local or domain administrator account on the remote computer or if you mistype the login information.
Another situation where you may get an error message is when you have just done a client-only installation of Array Manager and you bring up the Array Manager client and attempt to connect to a remote server that has Windows 2000 Disk Management.
Array Manager assumes that its client will connect first to a remote server running Array Manager before connecting to a system running Windows 2000 Disk Management.
Once you connect to a server with Array Manager, you will then be able to connect successfully to a remote system running Disk Management.
Windows 2000 Disk Management is the disk and volume management program that comes with Windows 2000. Because Array Manager and Disk Management are related programs, Array Manager is able to remotely manage the storage on a Windows 2000 computer with Disk Management.
If you are having problems connecting to a NetWare® server, use the ping and nslookup TCP/IP network diagnostic tools to determine whether the managed node system is accessible from the console and whether the system running the managed server has a legal DNS name. If the managed server does not have a DNS name, you can check the Hosts file on the client to see whether the server is listed. Otherwise, you will need to use the IP address.
When you want to connect to a NetWare server, Array Manager expects the server to be identified by one of three types of entries:
A DNS entry name
An IP address
A host name from a Hosts file listing
If you identify the name of the machine by a NetWare server's name that is not one of the three items above, the connection will fail. It is suggested that the name assigned to the NetWare server be the same name as its DNS or Hosts file entry.
Note that the DNS and Hosts file entries do not allow for a computer name that consists of all numbers. In addition, the DNS name does not allow a computer name that starts with a number. If the NetWare server has a numeric name or a name that starts with a number, you can use the IP address to identify that server. You can also put quotation marks around the computer's name for the entry in DNS or the Hosts file (such as "12345").
The Hosts file has to be on the client computer that has the Array Manager console.
This section presents possible problem situations with accompanying solutions for the Dell PowerVault 660F and 224Fstorage systems. The problem situations are organized as follows:
The situations in the first three topics are categorized by their event number. A brief discussion of event messages is included at the beginning of this section in the topic Event Monitoring and Logging. The fourth topic describes general problems not related to a specific event.
Event messages help identify significant incidents such as an array disk failure or an array disk addition. Event monitoring and logging starts when the Array Manager managed node starts up. If the managed node service (Disk Management Service) stops in Microsoft Windows NT or the Array Manager Service stops in NetWare, then event monitoring and logging stops. If array disks are S.M.A.R.T. (Self Monitoring Analysis and Reporting Technology) enabled, the RAID controllers check array disks for failure predictions, and if found, pass this information on to the Array Manager console. Array Manager immediately displays an alert icon on the array disk and also raises an alert under the Events tab and in the Windows NT event log. Windows NT has three event logs; the Dell OpenManage Array Manager uses the Application log.
Note: When a controller's I/O is paused, Array Manager does not
receive S.M.A.R.T. events.
Try rescanning the controller: from the Array Manager tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem object, and then select Rescan from the context menu that comes up. This action will update the controller status within the GUI.
If the controller has been removed and reinserted, check to see that the controller is inserted correctly: the DB9 connector should be located at the top of the module. For details, see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Also, check that all cable connections are correctly and firmly connected. Try to rebuild again: right-click the Array Disk storage object in the tree view, and then select Rebuild from the context menu that comes up.
If controller and connections are correct and the problem continues, contact customer service.
If the controller failed, replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual.
If the controller was removed, reinsert it according to the Dell PowerVault 660F and 224F Storage Systems Service Manual.
If the controller has been powered off, restore power.
Then access the Array Manager console, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the controller status.
If Enclosure Management has been enabled, check to see whether one or more LS modules has failed. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting tips.
If the controller entered Conservative Cache mode because of user's intended action, proceed per user's intent. When finished, right-click the controller and select either Enable Partner controller or Enable BBUto exit Conservative Cache mode.
If the BBU battery is low, recondition the battery. If the battery needs to be replaced, see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.
If there is an Expand Capacity or Add Virtual Disk operation in progress, wait until this activity has finished. Then access the Array Manager console, and in the tree view, click to expand the Arrays storage object, right-click the PV660F Subsystem storage object to bring up the context menu, and select Rescan. This action will update the controller.
The controller is waiting for the Enable Partner command before completing the startup process. If the controller option Auto Restore is set, the Enable Partner command is not needed. If the Auto Restore option is not set, the partner controller will wait for the Enable Partner command.
Couldn't allocate chunk of memory.
SCSI communication failed.
Mismatch in the number of array disk channels present.
Mismatch in the number of host channels present.
Mismatch in the firmware version.
Mismatch in the firmware header type.
Memory read of partner controller failed.
Mismatch in cache memory size.
Received Disable Partner controller command.
Negotiation finished, but nexus not entered in time.
Write-back sync to partner controller, channels 0-5.
Mismatch in firmware build.
Device cables are crossed.
Partner controller removal detected while nexus active.
Partner controller missing at negotiation time.
BBU powerfailed before failover finished.
BBU powerfailed before relinquish finished.
Lock time-out.
Lock SCSI failed.
General ctoc (controller to controller) message failure.
Failed for some unknown reason.
Note: Nexus refers to the state in which both redundant controllers
are in communication. In this state, each controller can copy write-back
data to its partner controller and can determine whether the other
controller is operating.
Go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the partner controller. If the situation does not improve, try one of the following (Rescan as before when necessary after troubleshooting the partner controller):
The controller is shipped with Auto Restore set. To see whether the partner is enabled, right-click the controller. If Enable Partner is on the menu, then the partner controller is disabled: click Enable Partner to enable the partner controller. If Disable Partner is on the menu, then the partner controller is enabled; use the following troubleshooting solutions to fix the event.
Unless a firmware or memory mismatch is suspected, reboot the disabled partner controller. If a firmware or memory mismatch is the issue, see below.
If memory problems exist, replace the memory module if necessary. See the Dell PowerVault 660F and 224F Storage Systems Service Manual.
Check the firmware version of the good controller: in the Array Manager tree view right-click the Controller and click the Propertiescommand on the context menu that appears. Replace the failed controller; see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the controller status. Check the firmware version of the new controller; if there is a mismatch, update both controllers with the latest firmware. See Firmware version mismatch in this chapter for instructions.
Check the memory size of the good controller: right-click the Controller in the tree view and click the Propertiescommand on the context menu that appears.Replace the failed controller with a controller of the same memory size; see the Dell PowerVault 660F and 224F Storage Systems Service Manual. Go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the controller status.
Check that the partner controller is present. If it is not, reinstall the partner controller. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for slot module installation instructions.
If the user gave the Disable Partner command, continue per user's intent.
If none of these solutions apply, contact customer service.
Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the controller status.
Replace the controller according to the Dell PowerVault 660F and 224F Storage Systems Service Manual. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the controller status.
Lost access to data on the Fibre Channel. The Fibre Channel cable may have been disconnected.
LS module has failed.
I/O module has failed.
Note: LS modules are cards installed in slots at the front of the
enclosure. Each LS module has an SES processor that monitors
environmental functions and a Loop Redundancy Circuit (LRC)
function, which maintains the viability of the Fibre Channel loop.
Check that the Fibre Channel cable is connected to the controller and the switch box. If not, reconnect it as necessary. If the Fibre Channel cable is connected, try replacing the cable. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide fortroubleshooting the LS and I/O modules.
When troubleshooting is complete, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the Fibre Channel.
To locate a fan, right-click the bad fan and click Properties. The Enclosure ID field indicates the ID number of the enclosure where this fan is located.
See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
After troubleshooting or replacing the ACM, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the fan status within the GUI.
See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the power supply. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
Note: LS modules are cards installed in slots at the front of the
enclosure. Each LS module has an SES processor that monitors
environmental functions and a Loop Redundancy Circuit (LRC)
function, which maintains the viability of the Fibre Channel loop.
After troubleshooting or replacing the power supply, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the power supply status within the GUI.
Check all fans to see whether they are functioning properly. If yes, check that the ambient temperature is within limit. If necessary, adjust the room temperature. If the problem persists, power-cycle the system. If this does not solve the problem, replace the affected Advanced Cooling Module (ACM). See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the ACM. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
After fixing the temperature problem, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the temperature and/or the fan status within the GUI.
The LS module connection may be broken or the management hardware is bad. LS modules are cards installed in slots at the front of the enclosure. Each LS module has an SES processor that monitors environmental functions and a Loop Redundancy Circuit (LRC) function, which maintains the viability of the Fibre Channel loop
Follow the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for troubleshooting an LS module. For part replacement, see the Dell PowerVault 660F and 224F Storage Systems Service Manual.
After resolving the hardware problem and providing corrective action, go to the Dell OpenManage Array Manager interface and click to expand the Arraysstorage object, right-click the PV660F Subsystemstorage object, and select Rescan. This action will update the enclosure status within the GUI.
Make sure shelf ID switches on all PV660s and PV224s in the subsystem are set to different numbers. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to set shelf IDs.
If this message occurs without power failure, replace the BBU.
To replace the BBU, see the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
The BBU requires two reconditioning cycles prior to first time use. This reconditioning process will take several hours and cannot be interrupted. Refer to the Recondition command in the chapter Configuring the Dell PowerVault 660F RAID Controller for detailed instructions on performing a BBU recondition.
After troubleshooting or replacing the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the BBU status within the GUI.
The BBU requires two reconditioning cycles prior to using for the first time. This reconditioning process will take several hours and cannot be interrupted. Refer to the Recondition command in the chapter Configuring the Dell PowerVault 660F RAID Controller for detailed instructions on performing a BBU recondition.
Recondition or replace the BBU. See the Dell PowerVault 660F and 224F Storage Systems Service Manual for part replacement procedures.
After troubleshooting or replacing the BBU, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescanfrom the context menu that comes up. This action will update the BBU.
Rescan the subsystem. Click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the replaced drive has been recognized.
If the virtual disk is offline, try forcing it online with the Force Online command. Right-click the disk and select Force Online from the context menu that appears. See the Force Onlinecommand in the chapter Configuring the Dell PowerVault 660F RAID Controller for detailed instructions.
If you cannot force the virtual disk online, remove and replace the affected hard drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing a drive.
If you still get a hard disk error after replacing the drive, contact customer service.
Replace and rebuild the drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive.
A drive is usually manually taken offline to replace it. If the drive was physically removed from the enclosure, replace and rebuild the drive (using a drive at least as large as the other disk drives in the virtual disk).
See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
Check all cables, making sure they are correctly and firmly connected and that none are crossed. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for cabling procedures.
Replace the affected disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
It is not possible to recover this physical drive. Replace the disk drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive still does not work, contact customer service.
Check that the new array disk is seated properly. If not, remove and reinsert the disk.
If the problem persists, see the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the Fibre Channel hard disk drives.
When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan.This action will update the array disk status within the GUI.
Remove and reinsert the physical drive. Then go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the replacement drive has been recognized.
If the drive is still missing or not found, try replacing the drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive does not work, contact customer service.
If this event appears for all existing drives, then a Loop ID problem may be present. See the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide on how to troubleshoot the I/O module.
If the drive has failed, replace the drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If the replacement drive does not work, contact customer service.
Try performing a Consistency Check again. If the problem persists, replace the disk drive. It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
Try performing a consistency check again. If the problem persists, replace the disk drive(s). It may be helpful to look at the other events that were generated in order to identify the malfunctioning drive(s). See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
Replace the affected disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If you have a non-fault-tolerant virtual disk, a single array disk failure may have caused the virtual drive to go offline. If you have a fault tolerant virtual disk, multiple array disk failures may have caused the virtual drive to go offline.
Verify through the LED lights that power is supplied to the enclosure.
Identify the location of the failed drive(s). If necessary, refer to the Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide.
Replace the array disk(s) if necessary. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
It may not be possible to recover from this error. Contact customer service.
Replace the array disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
When completed, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan to verify that the virtual disk has been initialized and is recognized.
Replace the array disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
Replace the array disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
If the problem persists, replace the hard disk. See the topic Procedure for Replacing a Drive at the beginning of this section for instructions on replacing and rebuilding a drive.
Add drives to the system. See the topic Dell PowerVault 660F and 224F Storage Systems Installation and Troubleshooting Guide for instructions on adding new drives.
After adding the new drives, go to the Array Manager console and click to expand the Arrays storage object, right-click the PV660F Subsystem storage object, and select Rescan.
This action will update the drive status within the GUI. You are now ready to create a hot spare.
Disk groups containing one virtual disk are the only disk groups that can be expanded. Disk groups containing two or more virtual disks can not be expanded.
The full capacity of the disk group must be used before additional disk groups can be created.
Redundant controllers must be in a failover condition. To do this, send a Disable Partner controller command.
There must be at least one available hot spare whose capacity is greater than or equal to the smallest array disk in the disk group.
All of the available space on the NT disk has already been used in the creation of one or more volumes for that disk. A volume cannot be created on an NT disk when there is no available used space.
A volume that has been marked as a primary partition cannot be deleted. Primary partitions are protected because they contain a bootable operating system.
Both controllers must have the same version of firmware to operate in a redundant configuration. If a failed controller is replaced with a controller with a different version of firmware, the replacement controller will not be allowed to start and will be disabled by the existing controller.
Use the following procedure to download a common firmware image:
If the replacement controller has an older version of firmware, download the same firmware version as the that on the existing controller. This may not be the latest version of firmware available.
If the replacement controller has more recent firmware, power off the subsystem, exchange controllers, and download the newer firmware to the existing controller.
Note: If an empty enclosure is available, the firmware can be
downloaded to the replacement controller without having to power off
the subsystem. However, this will not work if the replacement
controller has more recent firmware.
If the Enclosure Management Advanced controller option is already enabled, perform a Rescan on the PV660F Subsystem storage object. If this isn't successful, perform a Reset on the controller object. See the Reset command in the Configuring the Dell PowerVault 660F RAID Controller chapter for details.
This table describes the events that are generated by the controller. The events are displayed in the Events tab of the Array Manager console or through Windows NT Event Viewer or Windows 2000 Event Viewer.
Event
Description
Severity
Cause
700
A hard disk has been placed online.
Information
Rebuild completed. Device was configured. Manual online was done.
702
Hard disk error found.
Warning
A bad sector was found on the physical media. Mechanical failure on the device. Host SCSI device detected illegal instruction. Target device generated unknown phase sequence.
703
Hard disk PFA condition found; this disk may fail soon.
Warning
Physical device predicted some future failure. External RAID logical device may have become critical.
704
An automatic rebuild has started.
Information
A physical device failed and spare was available. A physical device failed and no spare was available. A spare was added.
705
A rebuild has started.
Information
Client started the rebuild on user's request. User replacd the failed device and 'raidbld' started the rebuild.
706
Rebuild is over.
Information
Rebuild completed successfully.
707
Rebuild is cancelled.
Warning
User cancelled the rebuild. Higher priority rebuild started.
708
Rebuild stopped with error.
Warning
Because of some unknown error on the controller, rebuild failed.
709
Rebuild stopped with error. New device failed.
Warning
New physical device failed. New physical device may not be compatible with MDAC hardware/firmware.
710
Rebuild stopped because logical drive failed.
Error
At least one more physical device failed in the virtual disk. Bad data table overflow.
711
A hard disk has failed.
Serious
A physical device failed. A user action caused the physical device to fail.
712
A new hard disk has been found.
Information
A physical device has been powered on. A new physical device has been added. Controller was powered on. Controller was added. System has rebooted.
713
A hard disk has been removed.
Information
User removed an unconfigured physical device. An unconfigured physical device failed. A controller was removed. A controller powered off.
718
SCSI command timeout on hard device.
Warning
Physical device has been removed. Physical device failed. Command time out value is not correct.
721
Parity error found.
Warning
A physical device did not generate proper parity. The controller failed,did not check parity properly. Cable failed. Improper cable length. Another physical device interfered. Some outside environment affected the data on the cable (e.g., radio frequency signal). Terminator is not connected. Improper termination.
749
Physical device status changed to offline.
Warning
Not available.
750
Physical device status changed to Hot Spare.
Warning
Not available.
751
Physical device status changed to rebuild.
Warning
Not available.
752
Physical device ID did not match.
Warning
Not available.
753
Physical device failed to start.
Warning
Not available.
756
Physical drive missing on startup.
Serious
Physical drive missing.
758
Physical drive is switching from a channel to the other channel.
Warning
Physical drive removed or channel failed.
761
Device Loop ID Conflict (Soft Addressing) Detected.
Serious
Device Loop ID Conflict detected on disk channel resulting in Soft Addressing. Potential data corruption.
762
Consistency check is started.
Information
User started a consistency check. 'Raidbld' started consistency check.
763
Consistency check is finished.
Information
Consistency check completed successfully without detecting any errors.
764
Consistency check is cancelled.
Warning
User cancelled the consistency check.
765
Consistency check on logical drive error.
Error
Inconsistent data was found. Bad sectors were found. A physical device reliability problem.
766
Consistency check on logical drive failed.
Error
A logical device became critical. A logical device failed.
767
Consistency check failed due to physical device failure.
Serious
A physical device failed.
768
Logical drive has been made offline.
Serious
One/multiple physical device(s) failed.
769
Logical drive is critical.
Error
One physical device failed.
770
Logical drive has been placed online.
Information
Rebuild completed. User set the physical device online. New configuration was added.
778
Logical drive initialization started.
Information
User started the initialization.
779
Logical drive initialization done.
Information
Initialize operation completed successfully.
780
Logical drive initialization cancelled.
Warning
User cancelled the initialization.
781
Logical drive initialization failed.
Error
One/multiple physical device(s) failed. Controller has been removed. Controller has been powered off.
784
Expand Capacity Started.
Information
User started the Online RAID Expansion operation.
785
Expand Capacity Completed.
Information
Online RAID Expansion completed.
786
Expand Capacity stopped with error.
Error
Multiple physical devices failed.
787
Bad Blocks found.
Critical
Bad sector was found on a physical device during: consistency check/rebuild/RAID Expansion operation.
789
System drive type changed.
Information
A new configuration has been added. RAID migration completed. RAID Expansion completed on RAID-1.
791
System drive LUN mapping has been written to config.
Warning
Not available.
797
Fan failure.
Serious
Cable connection broken. Bad fan.
798
Fan has been restored.
Information
Faulty fan has been replaced. Cable is connected properly.
800
Storage cabinet fan is not present.
Information
Enclosure Management connection is broken. Management hardware is bad. Fan is not present.
801
Power supply failure.
Serious
Cable connection is broken. Bad power supply.
802
Power supply has been restored.
Information
Faulty power supply has been replaced.
804
Storage cabinet power supply is not present.
Information
Management connection is broken. Management hardware is bad. Power supply is not present.
806
Temperature is above 50 degrees Celsius.
Warning
Room temperature is high. Bad fan.
807
Normal temperature has been restored.
Information
Faulty fan has been replaced. Room temperature was reduced.
809
Storage cabinet temperature sensor is not present.
Information
Enclosure management connection is broken. Management hardware is bad. Sensor is not present.
818
Fan failure.
Serious
Cable connection broken. Bad fan.
819
Fan has been restored.
Information
Faulty fan has been replaced. Cable is connected properly.
820
Fan is not present.
Information
Enclosure Management connection is broken. Management hardware is bad. Fan is not present.
821
Power supply failure.
Serious
Cable connection is broken. Bad power supply.
822
Power supply has been restored.
Information
Faulty power supply has been replaced.
823
Power supply is not present.
Information
Management connection is broken. Management hardware is bad. Power supply is not present.
825
Temperature is above working limit.
Warning
Room temperature is high. Bad fan.
826
Normal temperature has been restored.
Information
Faulty fan has been replaced. Room temperature was reduced.
827
Temperature sensor is not present.
Information
Enclosure management connection is broken. Management hardware is bad. Sensor is not present.
828
Enclosure access critical.
Warning
Enclosure management connection is broken. Management hardware is bad.
829
Enclosure access has been restored.
Information
Not available.
831
Enclosure Soft Addressing Detected.
Serious
Enclosure has duplicate loop IDs (Soft Addressing). Potential data corruption.
832
Enclosure services ready.
Information
Not available.
836
Array management server software started successfully.
Information
The server system (or array management utility server) started.
838
Internal log structures getting full,PLEASE SHUTDOWN AND RESET THE SYSTEM IN THE NEAR FUTURE.
Warning
Too many configuration changes occurred since the last boot.
840
Controller has been reset.
Warning
Controller failed. Controller was removed from the system. Controller has been powered off.
843
BBU Present.
Information
A BBU unit was found on the controller.
.
844
BBU Power Low.
Warning
BBU does not have enough power to enable the write data cache.
845
BBU Power OK.
Information
BBU has enough power to enable the write data cache.
846
Controller is gone. System is disconnecting from this controller.
Critical
Controller was removed from the system. Controller has been powered off.
847
Controller powered on.
Information
Controller was set online.
848
Controller is online.
Information
New controller has been installed.
849
Controller is gone. System is disconnecting from this controller.
Critical
Controller is dead. Controller has been removed. Controller has been powered off.
850
Controller's partner is gone, controller is in failover mode now.
Warning
Controller was set offline.
851
BBU reconditioning is started.
Information
User started a BBU reconditioning.
852
BBU reconditioning is finished.
Information
BBU reconditioning completed successfully.
854
BBU reconditioning is canceled.
Information
User cancelled the BBU reconditioning.
855
Controller firmware mismatch.
Serious
Replacement controller with downlevel firmware installed.
857
WARM BOOT failed.
Serious
Memory error detected during warm boot scan. Possible data loss.
858
Controller entered Conservative Cache Mode.
Warning
Not available.
859
Controller entered Normal Cache Mode.
Warning
Not available.
860
Controller Device Start Complete.
Warning
Not available.
861
Soft ECC error Corrected.
Warning
Not available.
862
Hard ECC error Corrected.
Warning
Not available.
863
BBU Recondition Needed.
Serious
Not available.
864
Controller's Partner Has Been Removed.
Warning
Not available.
866
Updated partner's status.
Warning
Not available.
867
Relinquished partner.
Warning
Not available.
868
Inserted Partner.
Warning
Not available.
869
Dual Controllers Enabled.
Warning
Not available.
870
Killed Partner.
Warning
Not available.
871
Dual Controllers entered Nexus.
Warning
Not available.
872
Controller Boot ROM Image needs to be reloaded.
Serious
Wrong firmware image file downloaded. MAC address changed.
873
Controller is using default non-unique world-wide name.
Critical
MAC address lost or not set.
882
Automatic reboot count has changed.
Information
Controller has rebooted. Automatic reboot has rearmed itself or was reconfigured.
883
Channel Failed.
Warning
Cable disconnected.
884
Channel Online.
Warning
Cable reconnected.
885
Back End SCSI Bus Dead.
Serious
Lost access to data on SCSI bus.
886
Back End SCSI Bus Alive.
Information
Regained access to data on SCSI bus.
887
Back End Fibre Dead.
Serious
Lost access to data on Fibre Channel.
888
Back End Fibre Alive.
Information
Regained access to data on Fibre Channel.
889
Event Log Empty.
Warning
Tried to read past last entry.
890
Event Log Entries Lost.
Warning
Tried to read an entry that does not exist in the event log.