Reliability, availability, and serviceability

This topic provides an overview of the server reliability, availability, and serviceability (RAS) features.

Three important computer design features are reliability, availability, and serviceability (RAS). The RAS features help to ensure the integrity of the data that is stored in the server, the availability of the server when you need it, and the ease with which you can diagnose and correct problems.

Your server has the following RAS features:

3-year parts and 3-year labor limited warranty
24-hour support center
Automatic error retry and recovery
Automatic restart on nonmaskable interrupt (NMI)
Automatic restart after a power failure
Backup basic input/output system switching under the control of the integrated management module (IMM)
Built-in monitoring for fan, power, temperature, voltage, and power-supply redundancy
Cable-presence detection on most connectors
Chipkill memory protection
Corrected machine check interrupt (CMCI)
Single-device data correction (SDDC) for x4 DRAM technology DIMMs (available on 16 GB DIMMs only). Ensures that data is available on a single x4 DRAM DIMM after a hard failure of up to two DRAM DIMMs. One x4 DRAM DIMM in each rank is reserved as a space device.
Diagnostic support for ServeRAID and Ethernet adapters
DRAM single device data correction (SDDC)
Dynamic memory migration
Enhanced DRAM single device data correction (SDDC+1)
Enhanced DRAM double device data correction (SDDC+1)
Error codes and messages
Error correcting code (ECC) L3 cache and system memory
Failed DIMM identification
Full Array Memory Mirroring (FAMM) redundancy
Hot-swap cooling fans with speed-sensing capability
Hot-swap hard disk drives
Hot-swap and redundant power supplies
Integrated baseboard management controller (BMC) subsystem
Integrated management module (IMM)
LCD system information display panel
Light path LEDs for DIMMs, microprocessors, PCIe adapters, hard disk drives, solid state drives, power supplies, fans, PCIe modules, and I/O modules
Memory address parity protection
Memory demand and patrol scrubbing
Memory error correcting code and parity test
Memory downsizing (non-mirrored memory). After a restart of the server after the memory controller detects a non-mirrored uncorrectable error and the memory controller cannot recover operationally, the IMM logs the uncorrectable error and informs POST. POST logically maps out the memory with the uncorrectable error, and the server restarts with the remaining installed memory.
Memory mirroring and memory rank sparing support
Memory thermal throttling
Menu-driven setup, system configuration, and redundant array of independent disks (RAID) configuration programs
Microprocessor built-in self-test (BIST), internal error signal monitoring, internal thermal trip signal monitoring, configuration checking, and microprocessor and voltage regulator module failure identification through light path diagnostics
Nonmaskable interrupt (NMI) button
Operating system memory on-lining (capacity change)
Parity checking on the PCIe buses
PCIe hot-add and remove support
PCIe hot-plug (microprocessor 2 and 3 only)
Power management: compliance with Advanced Configuration and Power Interface (ACPI)
Power-on self-test (POST)
Predictive Failure Analysis (PFA) alerts on memory, SAS/SATA hard disk drives or solid state drives, and fans.
Redundant Ethernet capabilities with failover support
Redundant hot-swap power supplies and redundant hot-swap fans
Redundant network interface card (NIC) support
Remind button to temporarily turn off the system-error LED
Remote system problem-determination support
ROM-based diagnostics and upgrade of flash ROM-based code and diagnostics
ROM checksums
Serial Presence Detection (SPD) on memory, VPD on system board, power supply, and hard disk drive or solid state drive backplanes, microprocessor and memory expansion tray, and Ethernet adapters
Single-DIMM isolation of excessive correctable error or multi-bit error by the Unified Extensible Firmware Interface (UEFI)
SMI clock failover
SMI lane failover
SMI packet retry
Solid-state drives
Standby voltage for systems-management features and monitoring
Startup (boot) from LAN through remote initial program load (RIPL) or dynamic host configuration protocol/boot protocol (DHCP/BOOTP)
System auto-configuring from the configuration menu
System-error logging (UEFI/POST and IMM)
Systems-management monitoring through the Inter-Integrated Circuit (I2C) protocol bus
Temperature and fan monitoring
Uncorrectable error (UE) detection
Upgradeable POST, Unified Extensible Firmware Interface (UEFI), diagnostics, IMM firmware, and read-only memory (ROM) resident code, locally or over the LAN
Vital product data (VPD) on the microprocessor and memory expansion modules, PCIe expansion modules, base I/O module, storage and I/O module, power supplies, and SAS/SATA (hot-swap hard disk drive or solid state drive) backplanes
Wake on LAN capability

Give documentation feedback