Manuals

Manuals
Troubleshooting: Dell PowerEdge Expandable RAID Controller 4/SC, 4/DC, and 4e/DC User's Guide

Back to Contents Page

Troubleshooting

Dell™ PowerEdge™ Expandable RAID Controller 4/SC, 4/DC, and 4e/DC User's Guide

  Logical Drive Degraded

  System CMOS Boot Order

  General Problems

  Hard Disk Drive Related Issues

  Drive Failures and Rebuilds

  SMART Error

  SCSI Cable and Connector Problems

  Audible Warnings

  BIOS Error Messages

  Battery Messages

  Light-emitting Diode (LED) Description


To get help with problems with your RAID controller, you can contact your Dell™ Service Representative or access the Dell Support web site at support.dell.com.


Logical Drive Degraded

A logical drive is in a degraded condition when one hard drive in its span has failed or is offline. For example, when a RAID 10 logical drive consisting of two spans of two drives each can sustain a drive failure in each span and be a degraded logical drive. The RAID controller has the fault tolerance to undergo a single failure in each span without compromising data integrity or processing capability.

The RAID controller provides this support through redundant arrays in RAID levels 1, 5, 10 and 50. The system can still work properly even with a single disk failure in an array, though performance can be degraded to some extent.

To recover from a degraded logical drive, rebuild the failed drive in each array. Upon successful completion of the rebuild process, the logical drive state changes from degraded to optimal. For the rebuild procedure, see Rebuilding Failed Hard Drives in RAID Configuration and Management.


System CMOS Boot Order

If you intend to boot to the controller, ensure it is set appropriately in the system's CMOS boot order. Refer to the system documentation for your individual system.

NOTE: Only the first eight logical drives can be used as bootable devices.

General Problems

Table 7-1 describes general problems you might encounter, along with suggested solutions.

Table 7-1. General Problems 

Problem

Suggested Solution

The device displays in Device Manager but has a yellow bang (exclamation point).

Reinstall the driver. See the driver installation procedures in Driver Installation.

Windows driver does not appear in Device Manager.

Power off the system and reset the card.

"No Hard Drives Found" message appears during a CD-ROM installation of Windows 2000 or Windows 2003 because of the following causes:

  1. The drive is not native in the operating system.
  2. The logical drives are not configured properly.
  3. The controller BIOS is disabled.

The corresponding solutions to the three causes of the message are:

  1. Press <F6> to install the RAID Device Driver during installation.
  2. Enter the BIOS Configuration Utility or to configure the logical drives. See RAID Configuration and Management for procedures to configure the logical drives.
  3. Enter the BIOS Configuration Utility to enable the BIOS. See RAID Configuration and Management for procedures to configure the logical drives.

The BIOS Configuration Utility does not detect a replaced physical drive in a RAID 1 array and offer the option to start a rebuild.

After the drive is replaced, the utility shows all drives online and all logical drives reporting optimal state. It does not allow rebuild because no failed drives are found.

This occurs if you replace the drive with a drive that contains data. If the new drive is blank, this problem does not occur.

Perform the following steps to solve this problem:

  • Access the BIOS Configuration Utility and select Objects—> Physical Drive to display the list of physical drives.
  • Use the arrow key to select the newly inserted drive, then press <Enter>.

The menu for that drive displays.

  • Select Force Offline and press <Enter>.

This changes the physical drive from Online to Failed.

  • Select Rebuild and press <Enter>.

After rebuilding is complete, the problem is resolved and the operating system will boot.

The system takes a long time to boot during a RAID Level Migration or Check Consistency operation.

This is normal behavior during a RAID level migration or consistency check.

PERC 4 controllers can appear as a hotpluggable controllers in the Safely Remove Hardware Menu.

In PCI Hotplug-capable systems, PERC 4 controllers can appear as a hotpluggable controllers in the Safely Remove Hardware Menu. However, Dell PERC controllers do not support this feature and hot add/remove operations should not be attempted.


Hard Disk Drive Related Issues

Table 7-2 describes hard drive related problems you might encounter, along with suggested solutions.

Table 7-2. Hard Disk Drive Issues 

Problem

Suggested Solution

The system does not boot from the RAID controller.

If the system does not boot from the controller, check the boot order in the BIOS.

One of the hard drives in the array fails often.

This could result from one or two problems:

  • If the same drive fails:
    • Format the drive.
    • Check the enclosure or backplane for damage.
    • Check the SCSI cables.
    • Replace the hard drive.
  • Drives in the same slot keep failing:
    • Check the enclosure or backplane for damage.
    • Check the SCSI cables.
    • Replace the cable or backplane, if necessary.

Critical Array Status Error is reported during boot-up.

One or more of your logical drives is degraded. To recover from a degraded logical drive, rebuild the failed drive in each array. Upon successful completion of the rebuild process, the logical drive state changes from degraded to optimal. See Logical Drive Degraded in this section for more information. See Rebuilding Failed Hard Drives in RAID Configuration and Management for information about rebuilding failed drives.

FDISK reports much lower drive capacity in the logical drive.

Some versions of FDISK (such as DOS 6.2) do not support large disk drives. Use a version that supports large disk sizes or use a disk utility in your operating system to partition your disk.

Cannot rebuild a fault tolerant array.

This could result from any of the following:

  • The replacement disk is too small or bad. Replace the failed disk with a good drive.
  • The enclosure or backplane could be damaged. Check the enclosure. or backplane.
  • The SCSI cables could be bad. Check the SCSI cables.

Fatal errors or data corruption are reported when accessing arrays.

Contact Dell Technical Support.


Drive Failures and Rebuilds

Table 7-3 describes issues related to drive failures and rebuilds.

Table 7-3. Drive Failure and Rebuild Issues

Issue

Suggested Solution

Rebuilding a hard disk drive after a single drive failure

If you have configured hot spares, the RAID controller automatically tries to use them to rebuild failed disks. Manual rebuild is necessary if no hot spares with enough capacity to rebuild the failed drives are available.You must insert a drive with enough storage into the subsystem before rebuilding the failed drive. You can use the BIOS Configuration Utility or Dell OpenManage® Array Manager to perform a manual rebuild of an individual drive.

Refer to Rebuilding Failed Hard Drives in RAID Configuration and Management for procedures for rebuilding a single hard disk drive.

Rebuilding hard disk drives after a multi-drive failure

Multiple drive errors in a single array typically indicate a failure in cabling or connection and could involve the loss of data. It is possible to recover the logical drive from a multiple drive failure. Perform the following steps to recover the logical drive:

  1. Shut down the system, check cable connections, and reset hard drives.

Be sure to follow safety precautions to prevent electrostatic discharge.

  1. If the system logs are available, try to identify the order in which the drives failed in the multiple drive failure scenario.
  2. Force the first drive online, then the second (if applicable), and continue till you reach the last disk.
  3. Perform a rebuild on the last disk.

You can use the BIOS Configuration Utility or Dell OpenManage® Array Manager to perform a manual rebuild of multiple drives.

See Rebuilding Failed Hard Drives in RAID Configuration and Management for procedures for rebuilding a single hard disk drive.

A drive is taking longer than expected to rebuild.

An array may take longer to rebuild when under high stress; for example, when there is one rebuild I/O operation for every five host I/O operations.

A node in a clustering environment fails during a rebuild.

In a clustering environment, if a node fails during a rebuild, the rebuild is re-started by another node. The rebuild on the second mode starts at zero percent.


SMART Error

Table 7-4 describes issues related to the Self-Monitoring Analysis and Reporting Technology (SMART). SMART monitors the internal performance of all motors, heads, and hard drive electronics and detects predictable hard drive failures.

Table 7-4. SMART Error

Problem

Suggested Solution

A SMART error is detected in fault-tolerant RAID array

Perform the following steps:

  1. Force the hard disk drive offline.
  2. Replace it with a new drive.
  3. Perform a rebuild.

See Rebuilding Failed Hard Drives in RAID Configuration and Management for rebuild procedures.

A SMART error is detected in non-fault-tolerant RAID array

Perform the following steps:

  1. Back up your data.
  2. Delete the logical drive.

See Deleting Logical Drives in RAID Configuration and Management for the procedure for deleting a logical drive.

  1. Replace the affected hard disk drive with a new drive.
  2. Recreate the logical drive.

See Simple Array Setup or Advanced Array Setup in RAID Configuration and Management for procedures for creating logical drives.

  1. Restore the backup.

SCSI Cable and Connector Problems

If you are having problems with your SCSI cables or connectors, first check all the cable connections thoroughly. Check for any bent pins or crimps in the cable resulting from poor cable management or improper use of cable management arms. If still having a problem, contact your Dell representative for information.


Audible Warnings

The RAID controller has a speaker that generates warnings to indicate events and errors. Table 7-5 describes the warnings.

NOTE: The audible warnings will not function if the alarm has been set to silent or disabled.

Table 7-5. Audible Warnings 

Tone Pattern

Meaning

Examples

Three seconds on and one second off

A logical drive is offline.

One or more drives in a RAID 0 configuration failed.

Two or more drives in a RAID 1 or 5 configuration failed.

One second on and one second off

A logical drive is running in degraded mode.

One drive in a RAID 5 configuration failed.

One second on and three seconds off

An automatically initiated rebuild has been completed.

A hard drive in a RAID 1 or 5 configuration failed and was rebuilt.


BIOS Error Messages

In PERC RAID controllers, the BIOS (option ROM) provides INT 13h functionality (disk I/O) for the logical drives connected to the controller, so that you can boot from or access the drives without the need of a driver. Table 7-6 describes the error messages and warnings that display for the BIOS.

Table 7-6. BIOS Errors and Warnings 

Message

Meaning

BIOS Disabled. No Logical Drives Handled by BIOS

This warning displays after you disable the option ROM in the configuration utility so that the BIOS will not hook Int13h and thus will not provide any I/O functionality to the logical drives.

Press <Ctrl><M> to Enable BIOS

When the BIOS is disabled, you are given the option to enable it by entering the configuration utility. You can change the setting to enabled in the configuration utility.

Configuration of NVRAM and drives mismatch

Run View/Add Configuration option of Configuration Utility

Press a key to enter Configuration Utility

If your boot-time BIOS options are set to Auto mode for BIOS configuration autoselection, the BIOS detects a mismatch of configuration data on the NVRAM and disks and this warning displays. You have to enter the configuration utility to resolve the mismatch before continuing.

Perform the following steps to resolve the mismatch:

  1. Press <Ctrl><M> to enter the BIOS Configuration Utility.
  2. Select Configure—> View/Add Configuration from the Management Menu.

The options Disk or NVRAM display.

  1. Select either Disk to use the configuration data on the hard disk or NVRAM to use the configuration on the NRVAM.

NOTE: This message will display if any changes are made to the logical disk configuration in a clustered environment while one node is not in an up state. Accept the configuration from the disk.

Adapter at Baseport xxxx is not responding"

where xxxx is the baseport of the adapter

If the adapter does not respond for any reason but is detected by the BIOS, it displays this warning and continues.

Shut down the system and try to reset the card. If this message still occurs, contact Dell Technical Support.

Insufficient Memory to Run BIOS. Press a Key to Continue

The BIOS needs some memory at POST to run properly. The BIOS allocates this memory either using PMM or another method. If the BIOS still cannot allocate the memory, it stops execution, displays this warning, then continues. This warning is very rare.

Insufficient Memory on the Adapter for the Current Configuration

If there is insufficient memory installed on the adapter, this warning displays and the system continues with another adapter. You should check to make sure the memory is properly installed and sufficient.

Shut down the system and try to reset the card. If this message still occurs, contact Dell Technical Support.

Memory/Battery problems were detected. The adapter has recovered, but cached data was lost. Press any key to continue.

This message occurs under the following conditions:

  • the adapter detects that the cache in the controller cache has not yet been written to the disk subsystem
  • the boot block detects an ECC error while performing its cache checking routine during initialization
  • the controller then discards the cache rather than sending it to the disk subsystem because the data integrity cannot be guaranteed

To resolve this problem, allow the battery to charge fully. If the problem persists, the battery or adapter DIMM might be faulty. In that case, contact Dell Technical Support.

x Logical Drive(s) Failed"

where x is the number of logical drives failed.

When the BIOS detects logical drives in the failed state, it displays this warning. You should check to determine why the logical drives failed and correct the problem. No action is taken by the BIOS.

x Logical Drives Degraded

where x is the number of logical drives degraded.

When the BIOS detects logical drives in a degraded state, it displays this warning. You should try to make the logical drives optimal. No action is taken by the BIOS.

Following SCSI ID's are not responding

Channel- ch1: id1, id2, .........

Channel- ch2: id1, id2, .........

.

where chx is channel number and id1 is first id that failed, id2 is second and so on.

When the BIOS determines that previously configured physical drives are not connected to the adapter, the BIOS displays this warning. You can connect the devices or take some other corrective action. The system continues to boot.

Adapter(s) Swap detected for Cluster/Non-Cluster mismatch

This warning displays when the BIOS detects a cluster/non-cluster mismatch in a cluster environment,

Warning: Battery voltage low

When the battery voltage is low, the BIOS displays this warning. You should check the battery. See Battery Messages for information about battery problems.

Warning: Battery temperature high

When the battery temperature is high, the BIOS displays this warning. Your system is too hot. Check the air temperature and remove any obstructions to airflow. See messages below link.

Warning: Battery life low

Your RAID battery has a maximum number of charge and discharge cycles. When the BIOS displays this warning, the battery has reached the maximum number of cycles. Replace the battery.

Following SCSI ID's have same data

Channel- ch1: id1, id2, .........

Channel- ch2: id1, id2, .........

.

where chx is channel number and id1 is first id that has same data, id2 is second and so on.

This message displays when you perform drive roaming and the SCSI IDs have the same data.

Error:Following SCSI Disk not found and No Empty Slot Available for mapping it

No mapping done by firmware"

Channel- ch1: id1, id2, .........

Channel- ch2: id1, id2, .........

.

where chx is channel number and id1 is first id that was not found, id2 is second and so on.

This message displays when you perform drive roaming and no empty slot is available for the drive(s)


Battery Messages

The PERC 4e/DC and 4/DC RAID controllers offer a battery backup unit to protect the integrity of the cached data by providing backup power if there is a complete AC power failure or a brief power outage.

Messages can display to inform you of a situation with the battery that may require attention. There are three kinds of messages: messages that do not warrant battery replacement, messages that might warrant battery replacement, and messages that do warrant battery replacement. Table 7-7 describes the messages that can display in the BIOS Configuration Utility.

Table 7-7. Battery Messages 

Message

Suggested Solution

Battery Messages That Do Not Warrant Battery Replacement

BIOS Configuration Utility:

Warning: Battery temperature high

The battery temperature is high - (Minimal exposure with 60 C Limit)

The battery is behaving properly. Do not replace the battery

The possible reasons for the high temperature are:

  • The battery was recently in fast-charge state during heavy system usage
  • The server environment temperature is too high
  • There are possible airflow obstructions

Do NOT replace the battery

Battery Messages That May Warrant Battery Replacement

BIOS Configuration Utility:

Warning: Battery voltage low

When the battery voltage is low, the BIOS displays this warning message about low battery voltage at POST. This is typical for a battery with low charge. The options are:

  1. Do NOT replace battery when: (i.e. This message is expected when)
    • The battery or card is new or just installed
    • The server had been recently shut down with data in the cache memory.
  2. .REPLACE battery when:
    • The message displays after the card has been plugged in for more than 48 hours and,
    • The temperature status in Ctrl-M is GOOD, meaning it does not report as Out-of-Range,

OR:

    • The battery voltage low message appears and,
    • The message displays after the card has been plugged in for more than 48 hours and,
    • The card is more than 2 years old.

The following screen shot displays how to check battery information in the BIOS Configuration Utility.

Battery Messages That Do Warrant Battery Replacement

BIOS Configuration Utility:

Warning: Battery life low

Your RAID battery has a maximum number of charge and discharge cycles. When the BIOS displays this warning, the battery has reached the maximum number of cycles. Replace the battery.


Light-emitting Diode (LED) Description

When you start the system, the boot block and firmware perform a number of steps that load the operating system and allow the system to function properly. The boot block contains the operating system loader and other basic information needed during startup.

As the system boots, eight LEDs on the controllers indicate the status of the boot block and firmware initialization and whether the system performed the steps correctly. If there is an error during startup, you can use the LED display to identify it. The eight LEDs for PERC 4/SC are D12 to D19. The LEDs for PERC 4/DC and 4e/DC are D17 to D24 and are located on the back of the controller.

Table 7-8 displays the LEDs, execution states, and LED patterns for the boot block for each PERC 4controller. Table 7-9 displays the LEDs, execution states, and LED patterns during firmware initialization. The execution state describes the status of the steps involved in the boot block or firmware initialization.The LEDs display in hexadecimal format and in an ON/OFF format so you can determine the corresponding execution state.

For example, the 0x01 LED pattern displays as OFF OFF OFF OFF OFF OFF OFF ON. For PERC 4/SC, LEDs D12 through D18 are off, and D19 is on for that execution state. After you identify the LED pattern, check the execution state for more information

Table 7-8. Boot Block States 

LED

Execution State

LED Pattern for PERC 4/SC, 4/DC, and 4e/DC

4/SC:

D12

D13

D14

D15

D16

D17

D18

D19

4/DC,4e/DC:

D17

D18

D19

D20

D21

D22

D23

D24

0x01

Setup 8-bit Bus for access to Flash and 8-bit devices successful

OFF

OFF

OFF

OFF

OFF

OFF

OFF

ON

0x03

Serial port initialization successful

OFF

OFF

OFF

OFF

OFF

OFF

ON

ON

0x04

Spd (cache memory) read successful

OFF

OFF

OFF

OFF

OFF

ON

OFF

OFF

0x05

SDRAM refresh initialization sequence successful

OFF

OFF

OFF

OFF

OFF

ON

OFF

ON

0x07

Start ECC initialization and memory scrub

OFF

OFF

OFF

OFF

OFF

ON

ON

ON

0x08

End ECC initialization and memory scrub

OFF

OFF

OFF

OFF

ON

OFF

OFF

OFF

0x10

SDRAM is present and properly configured. About to program ATU.

OFF

OFF

OFF

ON

OFF

OFF

OFF

OFF

0x11

CRC check on the firmware image successful. Continue to load firmware.

OFF

OFF

OFF

ON

OFF

OFF

OFF

ON

0x12

Initialization of SCSI chips successful.

OFF

OFF

OFF

ON

OFF

OFF

ON

OFF

0x13

BIOS protocols ports initialized. About to load firmware.

OFF

OFF

OFF

ON

OFF

OFF

ON

ON

0x17

Firmware is either corrupt or BIOS disabled. Firmware was not loaded.

OFF

OFF

OFF

ON

OFF

ON

ON

ON

0x19

Error ATU ID programmed.

OFF

OFF

OFF

ON

ON

OFF

OFF

ON

0x55

System Halt: Battery Backup Failure

OFF

ON

OFF

ON

OFF

ON

OFF

ON

Table 7-9. Firmware Initialization States 

LED

Execution State

LED Pattern for PERC 4/SC, 4/DC, and 4e/DC

4/SC:

D12

D13

D14

D15

D16

D17

D18

D19

4/DC,4e/DC:

D8

D9

D10

D11

D12

D13

D14

D15

0x1

Begin Hardware Initialization

OFF

OFF

OFF

OFF

OFF

OFF

OFF

ON

0x3

Begin Initialize ATU

OFF

OFF

OFF

OFF

OFF

OFF

ON

ON

0x7

Begin Initialize Debug Console

OFF

OFF

OFF

OFF

OFF

ON

ON

ON

0xF

Set if Serial Loopback Test is successful

OFF

OFF

OFF

OFF

ON

ON

ON

ON

System or Enclosure LEDs

LEDs display on the front of the system or enclosure to indicate the condition of the physical disks and slots in the system. Table 7-10 displays the condition, status indicator pattern, and whether the drive is removable.

Table 7-10. System or Enclosure LEDs and Condition 

Condition

Status Indicator Pattern

Drive Removable?

Slot empty, ready for insertion or removal

Off

Yes

Drive online, prepare for operation

Steady green

No

Drive identify

Flashes green four times per second

No

Prepare for removal

Flashes green twice per second at equal intervals

Yes

Drive rebuild

Flashes green twice per second at unequal intervals

No

Drive fails

Flashes amber four times per second

Yes

Predicted failure

Flashes green, then amber, then off, repeating this sequence every two seconds

No


Back to Contents Page

 

snWEB3