I have a server that kernel panics every few days.
mcelog tells me:
Hardware event. This is not a software error. MCE 0 CPU 6 BANK 8 MISC 0 TIME 1317928482 Thu Oct 6 15:14:42 2011 MCG status: MCi status: Error overflow Uncorrected error MCi_MISC register valid Processor context corrupt MCA: MEMORY CONTROLLER AC_CHANNEL0_ERR Transaction: Address/Command error Memory address parity error Memory corrected error count (CORE_ERR_CNT): 21763 Memory transaction Tracker ID (RTId): 0 Memory DIMM ID of error: 0 Memory channel ID of error: 0 Memory ECC syndrome: 0 STATUS ea1540c0008000b0 MCGSTATUS 0 MCGCAP 1c09 APICID 20 SOCKETID 1 CPUID Vendor Intel Family 6 Model 44
I’m going to try a BIOS update. After that, I’m not sure what to try next. Disabling the 2nd CPU will probably keep me up and running for now.
If this is really a CPU error it is propably broken somehow.
You could try an Intel-microcode-update first.
This looks like a motherboard memory controller error so I’d be looking to change the motherboard. Searching for
MEMORY CONTROLLER AC_CHANNEL0_ERR gets you this and various other similar references.