Intel
®
SRMK2 Internet Server Technical Product Specification 70
3) If the error is an SBE, the CNB30LE automatically corrects the data before it is returned to
memory. The CNB30LE memory controller scrubs the memory location where the error
occurred to correct the SBE, and the BIOS will record the SBE via an SMI to the SEL. If
the error is an MBE this condition is considered fatal, and after the error is logged an NMI is
generated telling the OS to handle this fatal error.
7.3.3.3 ECC Memory Initialization
The system BIOS handles ECC memory initialization. All memory locations, including System
Management RAM and shadow memory region, are unconditionally initialized during POST (set to
0). Error detection is disabled while ECC memory is initializing to prevent false alarms caused by
uninitialized memory bytes. If hard errors are detected during the memory test, the memory
partition containing the errors is resized to eliminate the failing locations.
7.3.3.4 ECC and SMI Support
During normal operation, any SBEs (single-bit errors) that are detected and are handled by the
SMI support code. The SMI code logs the SBE to the system event log. Scrubbing is always
automatically enabled when ECC memory is detected. The row containing the failing location is
scrubbed by the memory controller reading the corrected data and writing back the correct data
automatically. If a read from shadow memory results in an SBE, the BIOS must enable writes to
that area, scrub the location, and disable writes. Scrubbing helps to prevent a single-bit
correctable error from turning into multiple-bit errors in the future. Scrubbing an entire row is a
time consuming operation and might affect correct functioning of certain operating systems. If
MBEs are detected, the BIOS SMI handler will log an event into the SEL (System Event Log)
and then generate an NMI to the OS.
7.3.3.5 Logging System Events
The BIOS can log critical and informational events to nonvolatile memory. This area is managed
by the BIOS and can be accessed by an OS NVRAM driver. A critical event is one that might
result in the system being shut down to prevent catastrophic side effects from propagating to other
parts of the system. Multi-bit and parity errors in the memory subsystem are considered critical
errors, as are most errors that traditionally generate a Non-Maskable Interrupt (NMI). In the
SRMK2 these errors are first routed to System Management Interrupt (SMI). These errors
include I/O channel check, software generated NMI, and PCI SERR and PERR events.
During POST, the BIOS initializes System Management RAM (SMRAM) with error handling and
logging code. The processor has a private area of SMRAM dedicated to it for SMI processing.
The DRAM controller and OSB4 are programmed to generate an SMI for PCI SERR and PERR,
software generated NMI, I/O channel check, and ISA watchdog time-out and NMIs generated by
the PAC. The PAC generates an SERR if parity/ECC errors are observed in the memory
subsystem. The PAC generates an interrupt if a single-bit correctable error is observed in the
memory subsystem. The OSB4 can be programmed to generate an SMI on this interrupt. When
these errors are detected, the SMI routines log the error or event in a manner that is transparent
to the OS and then causes an NMI to be generated for certain events, so the OS can respond
appropriately. The BIOS also logs an event on another type of memory error called Single Bit
Error (SBE). For this error, the BIOS will not generate an NMI to the OS.
Comentarios a estos manuales