Skip to main content

Reliability, availability, and serviceability

This topic provides an overview of the server reliability, availability, and serviceability (RAS) features.

Three important computer design features are reliability, availability, and serviceability (RAS). The RAS features help to ensure the integrity of the data that is stored in the server, the availability of the server when you need it, and the ease with which you can diagnose and correct problems.

Your server has the following RAS features:
  • 3-year parts and 3-year labor limited warranty
  • 24-hour support center
  • Automatic error retry and recovery
  • Automatic restart on nonmaskable interrupt (NMI)
  • Automatic restart after a power failure
  • Backup basic input/output system switching under the control of the integrated management module (IMM)
  • Built-in monitoring for fan, power, temperature, voltage, and power-supply redundancy
  • Cable-presence detection on most connectors
  • Chipkill memory protection
  • Corrected machine check interrupt (CMCI)
  • Single-device data correction (SDDC) for x4 DRAM technology DIMMs (available on 16 GB DIMMs only). Ensures that data is available on a single x4 DRAM DIMM after a hard failure of up to two DRAM DIMMs. One x4 DRAM DIMM in each rank is reserved as a space device.
  • Diagnostic support for ServeRAID and Ethernet adapters
  • DRAM single device data correction (SDDC)
  • Dynamic memory migration
  • Enhanced DRAM single device data correction (SDDC+1)
  • Enhanced DRAM double device data correction (SDDC+1)
  • Error codes and messages
  • Error correcting code (ECC) L3 cache and system memory
  • Failed DIMM identification
  • Full Array Memory Mirroring (FAMM) redundancy
  • Hot-swap cooling fans with speed-sensing capability
  • Hot-swap hard disk drives
  • Hot-swap and redundant power supplies
  • Integrated baseboard management controller (BMC) subsystem
  • Integrated management module (IMM)
  • LCD system information display panel
  • Light path LEDs for DIMMs, microprocessors, PCIe adapters, hard disk drives, solid state drives, power supplies, fans, PCIe modules, and I/O modules
  • Memory address parity protection
  • Memory demand and patrol scrubbing
  • Memory error correcting code and parity test
  • Memory downsizing (non-mirrored memory). After a restart of the server after the memory controller detects a non-mirrored uncorrectable error and the memory controller cannot recover operationally, the IMM logs the uncorrectable error and informs POST. POST logically maps out the memory with the uncorrectable error, and the server restarts with the remaining installed memory.
  • Memory mirroring and memory rank sparing support
  • Memory thermal throttling
  • Menu-driven setup, system configuration, and redundant array of independent disks (RAID) configuration programs
  • Microprocessor built-in self-test (BIST), internal error signal monitoring, internal thermal trip signal monitoring, configuration checking, and microprocessor and voltage regulator module failure identification through light path diagnostics
  • Nonmaskable interrupt (NMI) button
  • Operating system memory on-lining (capacity change)
  • Parity checking on the PCIe buses
  • PCIe hot-add and remove support
  • PCIe hot-plug (microprocessor 2 and 3 only)
  • Power management: compliance with Advanced Configuration and Power Interface (ACPI)
  • Power-on self-test (POST)
  • Predictive Failure Analysis (PFA) alerts on memory, SAS/SATA hard disk drives or solid state drives, and fans.
  • Redundant Ethernet capabilities with failover support
  • Redundant hot-swap power supplies and redundant hot-swap fans
  • Redundant network interface card (NIC) support
  • Remind button to temporarily turn off the system-error LED
  • Remote system problem-determination support
  • ROM-based diagnostics and upgrade of flash ROM-based code and diagnostics
  • ROM checksums
  • Serial Presence Detection (SPD) on memory, VPD on system board, power supply, and hard disk drive or solid state drive backplanes, microprocessor and memory expansion tray, and Ethernet adapters
  • Single-DIMM isolation of excessive correctable error or multi-bit error by the Unified Extensible Firmware Interface (UEFI)
  • SMI clock failover
  • SMI lane failover
  • SMI packet retry
  • Solid-state drives
  • Standby voltage for systems-management features and monitoring
  • Startup (boot) from LAN through remote initial program load (RIPL) or dynamic host configuration protocol/boot protocol (DHCP/BOOTP)
  • System auto-configuring from the configuration menu
  • System-error logging (UEFI/POST and IMM)
  • Systems-management monitoring through the Inter-Integrated Circuit (I2C) protocol bus
  • Temperature and fan monitoring
  • Uncorrectable error (UE) detection
  • Upgradeable POST, Unified Extensible Firmware Interface (UEFI), diagnostics, IMM firmware, and read-only memory (ROM) resident code, locally or over the LAN
  • Vital product data (VPD) on the microprocessor and memory expansion modules, PCIe expansion modules, base I/O module, storage and I/O module, power supplies, and SAS/SATA (hot-swap hard disk drive or solid state drive) backplanes
  • Wake on LAN capability