Manuals

Manuals
Unsolicited event log messages: Dell OpenManage™ Data Supervisor, Event Monitor, and Integrator Installation and Operation Guide

Back to Contents Page

Unsolicited event log messages: Dell OpenManage™ Data Supervisor, Event Monitor, and Integrator Installation and Operation Guide

OverviewInformational CodesSoft CodesError CodesFatal Error Codes


Overview

The table that follows lists the meaning of the hexadecimal codes and messages that can appear in the unsolicited event log. These codes described events on a CRU basis and are categorized as follows:

  • Informational codes - Requires no action. They are useful for service providers in helping to establish history.
  • Soft codes (also know as threshold codes) - Are normal and require no action unless they occur frequently.
  • Error codes - Typically require action by you or a service provider.

Informational Codes

Code Meaning
0x601 SP powerup. The SP has completed its powerup initialization sequence.
0x602 Specified disk module has been enabled and is ready for use. This message appears after you rebuild or register the LUN to which the module belongs.
0x603 Array started rebuilding the RAID 5, RAID 3, RAID 1, or RAID 1/0 LUN to which the disk module belongs, or the disk module is a hot spare.
0x604 Array has finished rebuilding a RAID 5, RAID 3, RAID 1, or RAID 1/0 LUN.
0x606 Unit shutdown for trespass. The SP has shut down the LUN containing this drive module at the request of the peer SP because of a trespass operation.
0x607 Unit shutdown for change bind. The SP has shut down the LUN containing this drive to change the unit's operating parameters. This error appears only if the SP is operating in Target Addressing Mode.
0x608 Specified drive module is powered up and ready for binding or registering the LUN to which it belongs.
0x609 Disk module is being formatted as required to operate as an SP disk.
0x610 Disk module could not be physically formatted, and thus cannot be used in the array. Make sure the disk module is a valid model. If the model is valid, then consult your service provider for recovery steps.
0x611 The specified PROM revision was loaded into the SP as the SP powered up. The SP SCSI ID is in the extended status of the field. Logged at each SP preppy.
0x613 CRU equalize started. The SP has started a rebuild/equalize operation.
0x614 CRU equalize completed. The SP has completed a rebuild/equalize operation.
0x615 CRU equalize aborted. The SP has aborted a rebuild/equalize operation. 66Bh Single bit error detected. The SP has detected a recoverable single bit error.
0x616 Licensed Internal Code (Flare) revision installed. The revision information is in the extended status. Logged once after a new revision of LIC is installed.
0x617 Disk module controller code installed. The disk module controller code shown in the extended status was installed on this drive. The revision information is the ASCII value of each digit; for example, 0123 displays as 0x30313233.
0x621 Array has begun the background checkpoint verification of the accuracy and completeness of the disk module parity check data. This message may appear after you replace an SP or transfer control of Guns from one SP to another.
0x622 Array has completed background checkpoint verification of the accuracy and completeness of the parity check data in a RAID 5, RAID 3, RAID 1/0 or RAID 1 LUN.
0x623 Fan pack disable/door open. The SP in the array has detected that the fan module is open or has been disconnected. The array will shut down if the fan module is not operational within 2 minutes.
0x624 Fan pack enable/door close. The SP in the array has detected that the fan module is now closed or has been reconnected.
0x630 Array has detected that a fan module has been installed or replaced.
0x631 Array has detected that a VSC has been installed or replaced.
0x632 An ac box has been installed.
0x633 Array has detected an increase in a fan's speed, perhaps because the temperature rose or another fan failed.
0x634 Speed of a fan has returned to normal.
0x635 Logical sector data error. The array has detected a data inconsistency in a disk sector.
0x636 SPS or BBU was removed from the array.
0x637 SPS or BBU is recharging.
0x638 SPS or BBU has become ready.
0x639 Array cache has become ready.
0x640 Array has finished reconstructing the disk mirror.
0x641 Background rebuild operation has aborted before it was complete.
0x642 Background verify aborted. A background verify operation has been terminated due to an abnormal event, such as the failure of one of the disks in the LUN.
0x643 SP initializing. The SP just detected the presence of its peer SP after a period of time when it was not present.
0x644 SP inserted. The SP was informed by its peer SP that the peer SP completed its power initialization sequence.
0x645 A disk module was bound. The extended status is the unit number. This is logged once for each disk module at the completion of a bind operation.
0x646 A disk module was unbound. The extended status is the unit number. This is logged once for each disk module at the completion of an unbind operation.
0x647 Fuse bad. The extended status is the fuse number. This is logged at each powerup for each bad fuse detected by an SP.
0x648 Termpower low. The extended status is the SCSI bus number. This is logged at each powerup for each SCSI bus that the SP detects with a low Termpower level.
0x650 CRU signature error occurred.
0x654 Array started dumping the write cache to the vault disks.
0x657 Array finished dumping the write cache to the vault disks.
0x658 Array caching was enabled by the array or system operator.
0x659 Array caching was disabled by the array or system operator. The array disables write caching if the SPS or BBU is not fully charged or an SP, vault disk, or fan fails; see error 0x908.
0x660 Power removed. After ac power is turned off or an ac power failure occurs, the array dumps the write cache to the vault and turns off power to the SPS or BBU (it does this to minimize drain on the SPS or BBU).
0x661 SPS or BBU sniffing enabled. This array is now allowed to shutdown the SPS or BBU in order to test the SPS or BBU.
0x662 SPS or BBU sniffing disabled. SPS or BBU shutdown testing is now disabled on this array.
0x663 SPS or BBU self-test started.
0x664 In preparation for its weekly SPS or BBU test, the array disabled write caching. This message is followed by a 0x663 message.
0x665 Configured for single SP. The array has just been configured for non-mirrored write caching.
0x666 Configured for dual SP. The array has just been configured for mirrored write caching.
0x667 Cache recovering. The SP, which is operating in non-mirrored write caching mode, is recovering the contents of the write cache following a reboot.
0x668 Cache recovered. The SP, which is operating in non-mirrored write caching mode, has successfully recovered the contents of the write cache following a reboot.
0x66A Soft Vault Load Failure. The vault load failed when no cache dirty pages existed. This situation occurs most often when you change both the RAID-3 memory size and the write cache size at the same time.
0x66B Front end fibre link up. The front-end FC interface started running.
0x66C General front end fibre link unsolicited. This message is intended for development personnel. If you receive it, contact your support organization.
0x66D Peer SP timed out. The Host SP has timed out waiting for the peer SP to reply to a request.
0x680 Invalid data sector read. The hardware found a bad checksum on a data sector.
0x681 Invalid parity sector read. The hardware found a bad checksum on a parity sector.
0x682 Invalid sector read. The hardware found a bad checksum on a sector. There is not enough information to tell whether the sector holds data or parity information.
0x683 Data sector reconstructed. The hardware reconstructed a data sector that had a checksum or write stamp error.
0x684 Parity sector reconstructed. The hardware reconstructed a parity sector that had a checksum or write stamp error.
0x685 Hard error. The hardware detected an error other than a parity or write stamp error.
0x686 Command complete. A command completed after a soft error was corrected.
0x687 Stripe reconstructed. Inconsistent write/time stamps in a RAID group were corrected.
0x688 Command dropped. An optional command was dropped.
0x689 Sector reconstructed. On a read from a RAID 1 mirrored pair, a corrupted sector was reconstructed.
0x68A Uncorrectable parity sector. Sector reconstructed. On a read from a RAID 1 mirrored pair, a corrupted sector was reconstructed.
0x68B Uncorrectable data sector. Sector reconstructed. On a read from a RAID 1 mirrored pair, a corrupted sector was reconstructed.
0x68C Hard read checksum error. Sector reconstructed. A hard checksum error was detected on a data transfer from the host.
0x68D Soft read checksum error. A soft checksum error was detected on a data transfer from the host.
0x68E Inconsistent stripe. Inconsistent write or time stamps were detected in a RAID group.
0x68F Inconsistent time stamps. Inconsistent time stamps were detected in a RAID group on a verify.
0x690 Drive failed. A drive was shut down.
0x691 Checksum error on device read. A checksum error was detected on a read from an individual disk. There was no data transfer involved.
0x692 Incoherent stripe. Data and parity ware not consistent in a RAID group.
0x693 Uncorrectable stripe. Inconsistent write or time stamps could not be corrected in a RAID group.
0x694 Parity Invalidated. Parity has been invalidated in a RAID group.
0x695 Uncorrectable Sector. An uncorrectable sector was detected on a RAID 1 mirrored pair.
0x696 Mirror sector invalidated. A sector on a RAID 1 mirrored pair was invalidated on a rebuild.
0x6C0 Back end fibre loop failure. This SP's back end fibre loop is off line due to an NPORT primitive.
0x6C1 Back end fibre loop failure. This SP's back end fibre loop is off line due to a loop initialization primitive.
0x6C2 Back end fibre loop failure. The SP determined that the fibre loop is hung.
0x6C3 Back end fibre loop discovery ok. The SP determined that the fibre loop is operational following the discovery phase.
0x6C4 Back end fibre loop error. A loop node detected a failure condition.
0x6C5 Back end fibre loop error. The SP did not discover a node on its first try and will try again. A drive did not login as expected.
0x6C6 Back end fibre loop error. The SP did not discover a node on its second try. A drive did not login as expected.
0x6C7 Fibre Channel unknown event. This message indicates an undefined error condition.
0x6C9 Back end fibre loop error. An unknown event was detected on the fibre loop.
0x6CA Front end fibre loop bad. The Fibre Channel front end has gone off line due to the receipt of an unexpected loop event.
0x6CB Front end fibre initiator gone. The SP attempted to communicate with an initiator which has ceased responding.
0x6CC Front end fibre link down. The fibre front end is down.
0x6CD Front end fibre link up. The fibre front end is up.
0x6D0 Back end Fibre loop event. The SP has initiated a loop failover.
0x6D1 Back end fibre loop event. Loop failover administratively denied.
0x6D2 Back end fibre loop event. Peer SP is no longer using this SP's loop.
0x6D3 Back end fibre loop event. This SP is no longer using remote loop.
0x6D4 Back end fibre loop event. The peer SP has completed failover to this SP's loop.
0x6D5 Back end fibre loop event. This SP has completed failover to remote loop.
0x6E0 Front end fibre loop event. A front end hub port has been closed.
0x6E1 Front end fibre loop event. A front end hub port has been opened.
0x6E2 Front end fibre loop error. The SP's fibre loop failed to initialize.
0x6E3 Fibre Channel initiator gone. The SP tried to communicate with an initiator that is no longer responding.
0x6E4 Fibre Channel loop down. A loop down was detected. This is a normal occurrence followed by a loop up during loop initialization.
0x6E5 Fibre Channel loop up. The fibre loop is up and ready for communication.
0x6E6 Fibre Channel loop time-out. An internal time-out has occurred in the Fibre Channel interface. The SP will wait for the condition to clear and then resume operation.
0x6E7 Fibre Channel LIP timeout. The Fibre Channel interface timed out during loop initialization. The SP will continue taking action to bring the link completely up.
0x6E8 Fibre Channel link up timeout. The Fibre Channel interface timed out waiting for the link to come up. The SP will continue taking action to bring the link completely up.
0x6E9 Front end fibre loop event. Front end fibre loop error message SP threshold exceeded.
0x6EA Fibre loop initiated. The SP started the loop initialization protocol (LIP) procedure.
0x6EB Fibre chip reset. The SP reset its Fibre Channel interface chip.
0x6EC Overlapped command detected. Two commands with the same ID (OX_ID) were received. The SP logs the originator out.
0x6ED FE fibre inbound frames dropped. One or more inbound frames were dropped due to extraordinary internal conditions - typically excessive front end traffic directed to the array.
0x6EE FE fibre soft ALPA. Some event caused reinitialization of the loop and during the reinitialization the SP received a new soft ALPA that differed from the previously held soft ALPA. This does not occur in a hard addressing environment.


Soft Codes

note.gif (1135 bytes) NOTE: These codes are also known as thresholded codes.

Code Meaning
0x801 Soft SCSI error. This code indicates that an abnormal SCSI bus or disk drive event was detected. A retry of the operation cleared the condition.
0x802 Illegal SCSI bus interrupt. An inconsistent interrupt situation has been detected on the SP.
0x803 Recommend disk replacement. This disk CRU has sent status to the SP indicating that it believes it may be susceptible to a fault in the near future. We recommend replacing the drive.
0x804 Single Bit Error. The SP's tolerance level for single bit errors in the write cache has been exceeded.
0x805 Single bit error. The SP's tolerance level for single bit errors in the read cache has been exceeded.
0x820 Soft media error. This disk CRU has reported a media defect which was successfully cleared by the SP.
0x840 Disk sector invalidated. The LUN that owns this disk CRU encountered a condition which required the firmware to invalidate a sector on the unit. The firmware did this to ensure that incorrect data is not returned to the host in the future.
0x850 Enclosure state change. The SP detected that a DPE enclosure changed state.
0x851 Enclosure address error. The SP detected that a DPE enclosure has an invalid address. Enclosure chain shunted prior to failing enclosure.
0x852 Enclosure duplicate address error. The SP detected that two or more DPE enclosures have the same address. Enclosure chain shunted prior to failing enclosure.


Error Codes

Code Meaning
0x901 Parity Invalidated. Parity has been invalidated in a RAID group.
0x903 Fan removed. The fan module shut down or was removed.
0x904 VSC removed. A VSC unit has been shut down or removed from the array.
0x905 Chassis over temperature. The array found internal temperature too high. It tries to correct an over temperature condition by increasing fan speed. Check for any obvious problems, such as obstruction of cooling vents or excessive room temperature.
0x906 Unit shutdown. A failure in a CRU (which may be a fan or disk module), has made further access to the LUN impossible. If this unit has redundant CRUs (for example, it is a RAID 5 LUN), a failure in two CRUs is needed to produce this error. The SP shut down the LUN and the server can no longer access it.

If this message appears along with 0x905 and/or 0xa06 message, replacing a defective fan module may restore access to the LUN. If the problem is with disk modules, do not replace the disk modules; instead call your service provider.
0x907 Fatal firmware error. A fatal firmware error has occurred; as a result, the program running in the SP has reset the SP. The SP was restarted and continued normally. Consult your service provider.
0x908 Fault - cache disabling. The array is disabling write caching because of a system fault. The problem might be one of the following:

  - SPS or BBU is not ready (not present and fully charged);
  - One or more vault disks is missing or being rebuilt;
  - Fan fault occurred; or
  - SP failed.

To recover, either identify the problem and fix it, or wait for the array to fix it (for example, wait for the SPS or BBU to reach full charge or for the vault disks to be rebuilt). When the fault no longer exists, the array automatically re-enables array write caching.
0x909 Vault dump failure. A fault caused the array to try and dump the vault. The write cache dump failed because two or more vault disks are missing or have failed.
Try replacing one or more disk modules in the vault. A power failure, or double SP failure, while the vault is failed and the caching is enabled makes any LUN that has pages in the cache inaccessible; for any such LUN, you need to replace the bad modules; unbind and rebind the unit; make the unit available to the operating system; and load the lost data onto the unit from backup. At system powerup, error 0x90A occurs for the inaccessible LUNs.
0x90A Cannot assign - cache dirty. This message follows one of messages 921 through 924; that message explains the cause. The unit is inaccessible.

Look for two faulty modules or scrambled vault modules. If the error persists, you may need to unbind the unit to which the dirty pages are destined; then rebind the failed unit, make it available to the operating system, and reload data onto the unit from backup.
0x90B Cache initialization failed. The array cannot define the write cache because the existing write cache contains modified unwritten (dirty) pages.

This error can occur if you try to change cache parameters while the write cache is active; if so, disable the write cache; wait for the write cache to be disabled; and retry. If the problem is not corrected, check for one or more failed LUNs and if you find one, fix it. If that is not the problem, you may need to unbind the LUNs to which the unwritten pages belong; the IDs of the LUNs are part of the accompanying 0x90A error message.
0x90C Image larger than memory. The write cache was dumped to the vault, but cannot be restored to SP memory because an SP has too little memory to accept the cache image. This can happen if an SP fails and you replace it with an SP that has less memory than the one you removed.
To recover, remove the SP that has the inadequate amount of memory, insert the correct amount of memory on it, and reinsert it.
0x90D SPS or BBU removed. SPS or BBU failed or was removed. The write cache is dumped to the vault, then disabled and flushed to disk. The write cache cannot be enabled until the problem is corrected either by replacing the SPS or BBU, if it failed, or by reinstalling it. When the fault is fixed, the write cache is re-enabled automatically.
0x90E SPS or BBU disabled, says ready. SPS or BBU test was unable to turn off the SPS or BBU. The SPS or BBU is probably faulty. The write cache is dumped to the vault, then disabled, and flushed to disk. The write cache cannot be enabled until the problem is fixed. Replace the SPS or BBU. When the fault is fixed, the write cache is re-enabled automatically.
0x90F Cache recovered with errors. A non-mirrored write cache recovery failed to recover the write cache pages for some, but not all, cached LUNs. It does not apply to an array with two SPs. Contact your service provider.
0x910 Cache recovery failed. A non-mirrored write cache failed to recover information for all cached LUNs. It does not apply to an array with two SPs. Contact your service provider.
0x920 Hard media error. The disk module has reported a media defect that the SP could not clear. You should replace the disk module.
0x921 Vault load failed. The SP encountered errors while trying to load the write cache image from disk. This message may indicate multiple disk failures. Probably any LUN with write-cached pages will be inaccessible and must be unbound. To identify such a LUN, look at the SP unsolicited event log for a message that identifies a disk module in the physical unit. Usually the log message specifies the first disk module in the unit.
0x922 Vault load inconsistent. The SP found inconsistencies in the write cache image on disk. This may indicate a failure or abort of the cache dump. Probably any LUN with write-cached pages will be inaccessible and must be unbound. To identify such a LUN, look at the SP unsolicited event log for a message that identifies a disk module in the physical unit. Usually the log message specifies the first disk module in the unit.
0x923 Vault load failed - bitmap ok. The SP successfully read the control portion of the write cache image on disk, but found the data portion to be incomplete. This means that a failure or abort occurred during the write cache dump. Probably any LUN with write-cached pages will be inaccessible and must be unbound. To identify such a LUN, look at the SP unsolicited event log for a message that identifies a disk module in the physical unit. Usually the log message specifies the first disk module in the unit.
0x924 Vault disks scrambled. The SP found the vault disks containing the cache image to be in a different order than when the cache image was dumped to disk. This means the disks were swapped at power down. You must restore disks to their original order before the SP can load the cache image.
0x925h Single board cache; need PROM update. The SP, which is operating in non-mirrored write caching mode, has too low a PROM revision. Write caching cannot be enabled. Update the PROM code.
0x926 R3 cannot assign, no memory. The SP does not have enough memory available for the RAID 3 LUN. This error occurs when the array is powered up after SP memory has been removed, or when ownership of the unit is transferred to a peer SP that does not have enough memory.
0x927 Can't Assign. The revision of LIC you are running does not support "old" (pre-Revision 9.X) RAID 3 LUNs. To use the new RAID 3, you must use the current revision of LIC to unbind the LUN and rebind it as a RAID 3 LUN with RAID 3 memory.

If the old RAID 3 LUN has data you want, use an older revision of LIC to access the LUN, back up its data (to tape, for example). Then, using the newer revision of LIC, unbind and rebind the LUN as explained above. Finally, load the backed-up data onto the newly bound RAID 3 LUN.
0x928 R3 cannot initialize, no memory. The SP does not have enough memory available for the RAID 3 LUN. This error occurs when the array is powered up after SP memory has been removed, or when ownership of the unit is transferred to a peer SP that does not have enough memory.
0x929 Front end fibre link down. The Fibre Channel front end failed or is inoperable.
0x937 Command failed. A command failed for the reason explained in the extended status word.
0x938 Only RAID 3 LUNs or hot spares can be assigned in a array optimized for RAID 3 bandwidth.


Fatal Error Codes

Code Meaning
0xA02 Failed SCSI bus. An internal SCSI bus has failed. The CRU number displayed corresponds to the bus number (A0 means bus A, B0 means bus B, and so on). The failure resulted from a bad cable or cable connection, bad terminator, bad SCSI chip on an SP, or a bad device. All disk modules on that internal bus are now inaccessible by the SP. A RAID 5, RAID 3, RAID 1, or RAID 1/0 LUN, or software mirror can continue if the other disk modules are on other internal buses. It is unlikely that the other SP (if any) will be able to use the bus. Consult your service provider.
0xA05 NOVRAM uninitialized. The nonvolatile memory on the SP is not initialized. The SP has reinitialized this memory to its default state. Reboot the SP or power cycle the array to make the SP functional.
0xA06 Chassis shutdown. Second VSC failure or fan module inoperative for more than 2 minutes. The SP is powering down all modules in the chassis. Someone must correct the problem - perhaps by inserting a new fan module - before powering up again.
0xA07 Drive failure. The specified disk module has been powered down by the SP, has failed, or has been removed from the chassis.
0xA08 Database synchronization error. The SP cannot determine the correct virtual configuration of all LUNs in the array. Some LUNs may be unusable. Contact your service provider.
0xA09 Drive too small. For a redundant LUN, a replacement disk module was inserted, but it has a smaller capacity than the other disk modules in the LUN. The rebuild operation cannot begin until someone moves the replacement disk module and inserts a module of the correct size.
0xA11 Peer SP removed. The other SP in this chassis has failed. You can force the working SP to take over the failed SP's LUNs via the secondary route.
0xA12 Cache memory hard error. The SP, which is operating in non-mirrored write caching mode, has detected a nonrecoverable memory fault in the cache memory area.
0xA13 CRU type unsupported. The disk module is a type that is not supported by the current SP software. You cannot bind the disk module into a LUN or use it for host I/O. Replace disk module with a type of disk module that is supported.


Back to Contents Page

Laptops | Desktops | Business Laptops | Business Desktops | Workstations | Servers | Storage | Services | Monitors | Printers | LCD TVs | Electronics
© 2012 Dell | About Dell | Terms & Conditions | Unresolved Issues | Privacy Statement | Ads and Emails | Dell Recycling | Contact | Site Map | Feedback
AT | AU | BE | BR | CA | CH | CL | CN | CO | DE | DK | ES | FR | HK | IE | IN | IT | JP | KR | ME | MX | MY | NL | NO | PA | PR | RU | SE | SG | UK | VE | ALL

snWEB7