At a networksite we migrated a few weeks ago from VMware ESXi 5.1 to version 5.5, there was a server that crashed daily with a BSOD/bugcheck error. We couldn’t find a cause for this directly, but it seems there is a memory leak caused by VmWare tools VMCI driver VMware vShield Endpoint TDI manager in VMware ESXi 5.5 and this problem is still here with an update to version 5.5 update 1. The problem lies in the VMCI driver, that comes when you install VMware tools with the option ‘complete’ instead of ‘typical’. A quick way to fix this is to uninstall the VMCI driver for VMware vShield Endpoint TDI manager. This can simply be done in the Control Panel of Windows Server. Click on ‘Uninstall a program’
Select the VMware Tools and click on Change.
Choose for Modify and click Next.
Search for the VMCI Driver and select ‘Entire feature will be unavailable’ for the vShield Drivers. Click Next and confirm the changes to be made and watch the magic happen. After the changes are made a reboot of the guest OS is needed. In the eventviewer you get a bugcheck something like this:
The computer has rebooted from a bugcheck. The bugcheck was: 0x0000000a (0x0000000000000350, 0x0000000000000002, 0x0000000000000001, 0xfffff80001676f03). A dump was saved in: C:WindowsMEMORY.DMP.
And if you use WinDbg to analyze the dump you get a report a bit like:
1: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* IRQL_NOT_LESS_OR_EQUAL (a) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If a kernel debugger is available get the stack backtrace. Arguments: Arg1: 0000000000000350, memory referenced Arg2: 0000000000000002, IRQL Arg3: 0000000000000001, bitfield : bit 0 : value 0 = read operation, 1 = write operation bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status) Arg4: fffff80001676f03, address which referenced memory Debugging Details: ------------------ WRITE_ADDRESS: 0000000000000350 CURRENT_IRQL: 2 FAULTING_IP: nt!KeAcquireSpinLockAtDpcLevel+43 fffff800`01676f03 f0480fba2900 lock bts qword ptr [rcx],0 DEFAULT_BUCKET_ID: WIN7_DRIVER_FAULT BUGCHECK_STR: 0xA PROCESS_NAME: System ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) amd64fre TRAP_FRAME: fffff88001ff58e0 -- (.trap 0xfffff88001ff58e0) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000350 rdx=fffffa800c387cf0 rsi=0000000000000000 rdi=0000000000000000 rip=fffff80001676f03 rsp=fffff88001ff5a70 rbp=0000000000000001 r8=fffffa8007a15980 r9=0000000000000000 r10=fffff880009cdb80 r11=fffff88001ff5c08 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl nz na pe nc nt!KeAcquireSpinLockAtDpcLevel+0x43: fffff800`01676f03 f0480fba2900 lock bts qword ptr [rcx],0 ds:00000000`00000350=???????????????? Resetting default scope LAST_CONTROL_TRANSFER: from fffff80001681169 to fffff80001681bc0 STACK_TEXT: fffff880`01ff5798 fffff800`01681169 : 00000000`0000000a 00000000`00000350 00000000`00000002 00000000`00000001 : nt!KeBugCheckEx fffff880`01ff57a0 fffff800`0167fde0 : 00000000`00000000 00000000`00000000 00000000`00000000 fffff880`009cd180 : nt!KiBugCheckDispatch+0x69 fffff880`01ff58e0 fffff800`01676f03 : 00000000`c0000239 fffff880`02f13bdd fffffa80`0c4d9cc0 00000000`00000070 : nt!KiPageFault+0x260 fffff880`01ff5a70 fffff880`02f78a07 : fffffa80`07a15980 00000000`00000001 00000000`00000000 fffffa80`07a15980 : nt!KeAcquireSpinLockAtDpcLevel+0x43 fffff880`01ff5ac0 fffff800`016855d1 : fffffa80`0c387e53 00000000`00000005 00000000`00000000 fffffa80`0c387cf0 : netbt!AcceptCompletionRoutine+0x47 fffff880`01ff5b20 fffff880`02f4060c : fffffa80`0c5f55e0 fffffa80`0c3c3800 fffffa80`0c387cf0 00000000`00000000 : nt!IopfCompleteRequest+0x341 fffff880`01ff5c10 fffff800`019781d3 : fffffa80`07179ab0 fffff800`018272d8 fffffa80`06d10040 fffffa80`06d10040 : vnetflt+0x160c fffff880`01ff5c80 fffff800`0168b261 : fffff800`01827200 fffff800`01978101 fffffa80`06d10000 fffff800`018272d8 : nt!IopProcessWorkItem+0x23 fffff880`01ff5cb0 fffff800`0191e2ea : e00df00f`001f0116 fffffa80`06d10040 00000000`00000080 fffffa80`06d099e0 : nt!ExpWorkerThread+0x111 fffff880`01ff5d40 fffff800`016728e6 : fffff880`01edf180 fffffa80`06d10040 fffff880`01ee9fc0 000b7419`000a1901 : nt!PspSystemThreadStartup+0x5a fffff880`01ff5d80 00000000`00000000 : fffff880`01ff6000 fffff880`01ff0000 fffff880`01ff59e0 00000000`00000000 : nt!KxStartSystemThread+0x16 STACK_COMMAND: kb FOLLOWUP_IP: netbt!AcceptCompletionRoutine+47 fffff880`02f78a07 488d4b60 lea rcx,[rbx+60h] SYMBOL_STACK_INDEX: 4 SYMBOL_NAME: netbt!AcceptCompletionRoutine+47 FOLLOWUP_NAME: MachineOwner MODULE_NAME: netbt IMAGE_NAME: netbt.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4ce79386 FAILURE_BUCKET_ID: X64_0xA_netbt!AcceptCompletionRoutine+47 BUCKET_ID: X64_0xA_netbt!AcceptCompletionRoutine+47 ANALYSIS_SOURCE: KM FAILURE_ID_HASH_STRING: km:x64_0xa_netbt!acceptcompletionroutine+47 FAILURE_ID_HASH: {3a74f055-ea53-e758-90a1-3e612f992e18} Followup: MachineOwner --------- 1: kd> lmvm netbt start end module name fffff880`02f4f000 fffff880`02f94000 netbt (pdb symbols) C:ProgramDatadbgsymnetbt.pdb3D581F5A08614A7CB02D71638469228D2netbt.pdb Loaded symbol image file: netbt.sys Image path: SystemRootSystem32DRIVERSnetbt.sys Image name: netbt.sys Timestamp: Sat Nov 20 10:23:18 2010 (4CE79386) CheckSum: 00041134 ImageSize: 00045000 Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4 1: kd> .trap 0xfffff88001ff58e0 NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000350 rdx=fffffa800c387cf0 rsi=0000000000000000 rdi=0000000000000000 rip=fffff80001676f03 rsp=fffff88001ff5a70 rbp=0000000000000001 r8=fffffa8007a15980 r9=0000000000000000 r10=fffff880009cdb80 r11=fffff88001ff5c08 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl nz na pe nc nt!KeAcquireSpinLockAtDpcLevel+0x43: fffff800`01676f03 f0480fba2900 lock bts qword ptr [rcx],0 ds:00000000`00000350=???????????????
Sources i’ve used: Windows Bugcheck Analysis, MCSEboard.de and VMware Community.
UPDATE: You can also see if your server is showing this in the eventlog:
Log Name: System Source: AFD Date: DD-MM-YYYY HH:MM:SS Event ID: 16001 Task Category: None Level: Warning Keywords: Classic User: N/A Computer: hostname.domain.local Description: A TDI filter (Drivervnetflt) was detected. This filter has not been certified by Microsoft and may cause system instability. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="AFD" /> <EventID Qualifiers="32768">16001</EventID> <Level>3</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="YYYY-MM-DDTHH:MM:SS.000000000Z" /> <EventRecordID>41300</EventRecordID> <Channel>System</Channel> <Computer>hostname.domain.local</Computer> <Security /> </System> <EventData> <Data>DeviceAfd</Data> <Data>Drivervnetflt</Data> <Binary>000000000200300000000000813E0080000000000000000000000000000000000000000000000000</Binary> </EventData> </Event>
UPDATE 2: Thanks for Peter Chang’s comment, there is now an official update available, more info at VMware KB2077302.
Comments