Dr.watson MiniDump Analysis Using WinDBG

5
Dr. Watson(Postmortem Debugger) Mini Dump Analysis using WinDBG Debugging Steps using WinDBG 1) Open the .dmp file using WinDBG 2) Make sure your symbol server path is set correctly(File->Symbol File Path), below is what I have set it to srv*C:\Temp\Symbols*http://msdl.microsoft.com/download/symbols; 3) You should save the workspace to save your symbol path information unless you are using _NT_SYMBOL_PATH otherwise you will loose the information next time you start WinDBG. Remember to save it without opening a dump file to use it for the next debugging session 4) Run !analyze -v 5) 0:182> !analyze -v FAULTING_IP: kernel32!RaiseException+53 77e4bee7 5e pop esi EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress: 77e4bee7 (kernel32!RaiseException+0x00000053) ExceptionCode: e0434f4d (CLR exception) ExceptionFlags: 00000001 NumberParameters: 1 Parameter[0]: 8007000e DEFAULT_BUCKET_ID: CLR_EXCEPTION ERROR_CODE: (NTSTATUS) 0xe0434f4d - <Unable to get error code text> MANAGED_STACK: !dumpstack -EE OS Thread Id: 0xdd0 (182) TEB information is not available so a stack size of 0xFFFF is assumed Current frame: ChildEBP RetAddr Caller,Callee ADDITIONAL_DEBUG_TEXT: Followup set via attribute from Frame ffffffff on thread ffffffff FAULTING_THREAD: 00000dd0 PRIMARY_PROBLEM_CLASS: CLR_EXCEPTION BUGCHECK_STR: APPLICATION_FAULT_CLR_EXCEPTION LAST_CONTROL_TRANSFER: from 79f071ac to 77e4bee7 STACK_TEXT: 15d9f418 79f071ac e0434f4d 00000001 00000001 kernel32!RaiseException+0x53 15d9f478 79f9293a 75c9cb08 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x2a8 15d9f4b0 7a00e74a 15d9f508 7a3c441c 9ae216b2 mscorwks!UnwindAndContinueRethrowHelperAfterCatch+0x70 15d9f558 793a858c 0874cd34 00000000 793af628 mscorwks!JIT_NewFast+0x105 15d9f578 79e7c74b 00000000 15d9f5e0 15d9f608 mscorlib_ni+0x2e858c 15d9f588 79e7c6cc 00000000 00000000 15d9f714 mscorwks!CallDescrWorker+0x33 15d9f608 79f00eca 00000000 00000000 15d9f714 mscorwks!CallDescrWorkerWithHandler+0xa3 15d9f628 79f00e75 00000000 00000000 15d9f714 mscorwks!DispatchCallBody+0x1e 15d9f68c 79f00f03 00000000 00000000 15d9f714 mscorwks!DispatchCallDebuggerWrapper+0x3d 15d9f6c0 79f02978 00000000 00000000 15d9f714 mscorwks!DispatchCallNoEH+0x51

description

min

Transcript of Dr.watson MiniDump Analysis Using WinDBG

Page 1: Dr.watson MiniDump Analysis Using WinDBG

Dr. Watson(Postmortem Debugger) Mini Dump Analysis using WinDBG

Debugging Steps using WinDBG

1) Open the .dmp file using WinDBG2) Make sure your symbol server path is set correctly(File->Symbol File Path), below is

what I have set it tosrv*C:\Temp\Symbols*http://msdl.microsoft.com/download/symbols;

3) You should save the workspace to save your symbol path information unless you are using _NT_SYMBOL_PATH otherwise you will loose the information next time you start WinDBG. Remember to save it without opening a dump file to use it for the next debugging session

4) Run !analyze -v

5) 0:182> !analyze -v

FAULTING_IP: kernel32!RaiseException+5377e4bee7 5e pop esi

EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)ExceptionAddress: 77e4bee7 (kernel32!RaiseException+0x00000053) ExceptionCode: e0434f4d (CLR exception) ExceptionFlags: 00000001NumberParameters: 1 Parameter[0]: 8007000e

DEFAULT_BUCKET_ID: CLR_EXCEPTION

ERROR_CODE: (NTSTATUS) 0xe0434f4d - <Unable to get error code text>

MANAGED_STACK: !dumpstack -EEOS Thread Id: 0xdd0 (182)TEB information is not available so a stack size of 0xFFFF is assumedCurrent frame: ChildEBP RetAddr Caller,Callee

ADDITIONAL_DEBUG_TEXT: Followup set via attribute from Frame ffffffff on thread ffffffff

FAULTING_THREAD: 00000dd0

PRIMARY_PROBLEM_CLASS: CLR_EXCEPTION

BUGCHECK_STR: APPLICATION_FAULT_CLR_EXCEPTION

LAST_CONTROL_TRANSFER: from 79f071ac to 77e4bee7

STACK_TEXT: 15d9f418 79f071ac e0434f4d 00000001 00000001 kernel32!RaiseException+0x5315d9f478 79f9293a 75c9cb08 00000000 00000000 mscorwks!RaiseTheExceptionInternalOnly+0x2a815d9f4b0 7a00e74a 15d9f508 7a3c441c 9ae216b2 mscorwks!UnwindAndContinueRethrowHelperAfterCatch+0x7015d9f558 793a858c 0874cd34 00000000 793af628 mscorwks!JIT_NewFast+0x10515d9f578 79e7c74b 00000000 15d9f5e0 15d9f608 mscorlib_ni+0x2e858c15d9f588 79e7c6cc 00000000 00000000 15d9f714 mscorwks!CallDescrWorker+0x3315d9f608 79f00eca 00000000 00000000 15d9f714 mscorwks!CallDescrWorkerWithHandler+0xa315d9f628 79f00e75 00000000 00000000 15d9f714 mscorwks!DispatchCallBody+0x1e15d9f68c 79f00f03 00000000 00000000 15d9f714 mscorwks!DispatchCallDebuggerWrapper+0x3d15d9f6c0 79f02978 00000000 00000000 15d9f714 mscorwks!DispatchCallNoEH+0x51

Page 2: Dr.watson MiniDump Analysis Using WinDBG

15d9f720 79ef3207 15d9f870 7a3b57d8 00000000 mscorwks!AddTimerCallback_Worker+0x7915d9f734 79ef31a3 15d9f810 15d9f7bc 79f91478 mscorwks!Thread::DoADCallBack+0x32a15d9f7c8 79ef30c3 15d9f810 9ae21bee 7a3b57d8 mscorwks!Thread::ShouldChangeAbortToUnload+0xe315d9f804 79ef4826 15d9f810 00000001 00000000 mscorwks!Thread::ShouldChangeAbortToUnload+0x30a15d9f82c 79ef48cf 00000001 79f02928 15d9f870 mscorwks!Thread::ShouldChangeAbortToUnload+0x33e15d9f844 79f02901 00000001 79f02928 15d9f870 mscorwks!ManagedThreadBase::ThreadPool+0x1315d9f88c 79f0288c 106bcc38 00000001 00000001 mscorwks!AddTimerCallbackEx+0x8315d9f8a0 79f02850 106bcc38 00000001 9ae21b36 mscorwks!AddTimerCallback+0x1015d9f8dc 79f024cf 2a023e70 9ae21ac2 79f02030 mscorwks!ThreadpoolMgr::AsyncTimerCallbackCompletion+0x6415d9f928 79f0202a 00000000 00000000 77e619d1 mscorwks!UnManagedPerAppDomainTPCount::DispatchWorkItem+0xb215d9f93c 79f021a0 9ae21a7e 00000000 79f0207d mscorwks!ThreadpoolMgr::ExecuteWorkRequest+0xaf15d9f994 79f95a2e 00000000 7c81c5f0 15d9fa48 mscorwks!ThreadpoolMgr::WorkerThreadStart+0x22315d9ffb8 77e64829 106f61d8 00000000 00000000 mscorwks!Thread::intermediateThreadProc+0x4915d9ffec 00000000 79f959e8 106f61d8 00000000 kernel32!BaseThreadStart+0x34

6) RaiseTheExceptionInternalOnly method signature is RaiseTheExceptionInternalOnly(OBJECTREF, BOOL, BOOL)7) 2nd parameter evaluates to whether the exception is re-thrown and the 3rd parameter tells you if this is a stack overflow exception but in this case both of them are false8) ExceptionCode: e0434f4d (CLR exception) which applies to any unhandled CLR exception e.g DivideByZero9) mini dump reports 1204 threads, as shown below0:182> ~ 0 Id: 968.1128 Suspend: 1 Teb: 7ffdf000 Unfrozen 1 Id: 968.c34 Suspend: 1 Teb: 7ffde000 Unfrozen 2 Id: 968.8e0 Suspend: 1 Teb: 7ffdd000 Unfrozen 3 Id: 968.a90 Suspend: 1 Teb: 7ffdc000 Unfrozen...................................................................................... 1200 Id: 968.24b0 Suspend: 1 Teb: 7fbd6000 Unfrozen 1201 Id: 968.1bc4 Suspend: 1 Teb: 7fae4000 Unfrozen 1202 Id: 968.25ac Suspend: 1 Teb: 7faba000 Unfrozen 1203 Id: 968.1b7c Suspend: 1 Teb: 7fab8000 Unfrozen

More than 200 threads have the following stack trace10)

0:182> ~*k1201 Id: 968.1bc4 Suspend: 1 Teb: 7fae4000 UnfrozenChildEBP RetAddr 7b22ebec 7c827d0b ntdll!KiFastSystemCallRet7b22ebf0 77e61d1e ntdll!NtWaitForSingleObject+0xc7b22ec60 79e789c6 kernel32!WaitForSingleObjectEx+0xac7b22eca4 79e7898f mscorwks!PEImage::LoadImage+0x1af7b22ecf4 79e78944 mscorwks!CLREvent::WaitEx+0x1177b22ed08 79fbc82a mscorwks!CLREvent::Wait+0x177b22ed1c 79fbc6da mscorwks!SVR::GCHeap::WaitUntilGCComplete+0x347b22ed58 79fc51bf mscorwks!Thread::RareDisablePreemptiveGC+0x1a07b22ed8c 79fc529b mscorwks!Thread::UserSleep+0xb17b22ee2c 0deb1401 mscorwks!ThreadNative::Sleep+0xceWARNING: Frame IP not in any known module. Following frames may be wrong.7b22ef18 793b0d1f 0xdeb14017b22ef20 793740ab mscorlib_ni+0x2f0d1f7b22ef30 79e7c74b mscorlib_ni+0x2b40ab7b22ef40 79e7c6cc mscorwks!CallDescrWorker+0x337b22efc0 79e7c8e1 mscorwks!CallDescrWorkerWithHandler+0xa37b22f0fc 79e7c783 mscorwks!MethodDesc::CallDescr+0x19c7b22f118 79e7c90d mscorwks!MethodDesc::CallTargetWorker+0x1f7b22f12c 79eb300f mscorwks!MethodDescCallSite::Call_RetArgSlot+0x18

Page 3: Dr.watson MiniDump Analysis Using WinDBG

7b22f2f8 79eb2f31 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x9b7b22f3a8 79373ff7 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0xf9

11)There are 2 GC threads (2 and 3) since this process is running on Windows Server 2003 with 2 CPUs so one GC thread per CPU.

0:182> ~2 kbChildEBP RetAddr Args to Child 00b7fd88 7c827d0b 77e61d1e 000000e8 00000000 ntdll!KiFastSystemCallRet00b7fd8c 77e61d1e 000000e8 00000000 00000000 ntdll!NtWaitForSingleObject+0xc00b7fdfc 79e789c6 000000e8 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac00b7fe40 79e7898f 000000e8 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x1af00b7fe90 79e78944 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x11700b7fea4 79efa6c3 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x1700b7fee8 79efb5b7 001568f8 00000006 00000000 mscorwks!SVR::t_join::join+0xae00b7ff28 79efa9ba 00000002 00000000 001568f8 mscorwks!SVR::gc_heap::mark_phase+0x16f00b7ff54 79efaf60 ffffffff 00000000 001568f8 mscorwks!SVR::gc_heap::gc1+0x4600b7ff74 79efa72f 00000002 00000000 001568f8 mscorwks!SVR::gc_heap::garbage_collect+0x24600b7ff98 79fc8583 79fc8559 77e64590 79fc857c mscorwks!SVR::gc_heap::gc_thread_function+0x6a00b7ffb8 77e64829 001568f8 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9200b7ffec 00000000 79fc8538 001568f8 00000000 kernel32!BaseThreadStart+0x34

0:182> ~3 kbChildEBP RetAddr Args to Child 00c7fc9c 79efd1d0 55b4de68 00000001 001568f8 mscorwks!SVR::gc_heap::mark_object_simple1+0xdf00c7fce4 79efba46 0108ffcc 00000001 09063c94 mscorwks!SVR::gc_heap::mark_object_simple+0x25400c7fd10 79fbb49d 09063c94 0108ffcc 00000000 mscorwks!SVR::GCHeap::Promote+0x11500c7fd28 79fc3a38 00c7fdd0 09063c94 00000000 mscorwks!GcEnumObject+0x2d00c7fd48 79fc39e2 00153050 79fbb475 00c7fdd0 mscorwks!Module::EnumStaticGCRefs+0x5900c7fd74 79fc3ada 79fbb475 00c7fdd0 00174ea8 mscorwks!DomainAssembly::EnumStaticGCRefs+0x3800c7fd98 79fc3a9b 79fbb475 00c7fdd0 2a03d8b0 mscorwks!AppDomain::EnumStaticGCRefs+0x2900c7fdb4 79fc399b 79fbb475 00c7fdd0 79efba5a mscorwks!SystemDomain::EnumAllStaticGCRefs+0x4e00c7fddc 79efb559 00000000 00000002 00000002 mscorwks!CNameSpace::GcScanRoots+0x1a200c7fe28 79efa9ba 00000002 00000000 00157d30 mscorwks!SVR::gc_heap::mark_phase+0x11100c7fe54 79efaf60 ffffffff 00000000 00157d30 mscorwks!SVR::gc_heap::gc1+0x4600c7fe74 79efa72f 00000002 00000000 00157d30 mscorwks!SVR::gc_heap::garbage_collect+0x24600c7fe98 79fc8583 00000000 7c8276bf 00c7fef8 mscorwks!SVR::gc_heap::gc_thread_function+0x6a00c7ffb8 77e64829 00157d30 00000000 00000000 mscorwks!SVR::gc_heap::gc_thread_stub+0x9200c7ffec 00000000 79fc8538 00157d30 00000000 kernel32!BaseThreadStart+0x34

12) Mark_Phase signature is mark_phase (int gen_number, BOOL mark_only_pparameters passed are 2, 0, that means GC was triggered for generation 2

13)Thread # 345 has triggered garbage collection for generation 2 with a request to allocate memory from GC Heap(see highleted in red)

0:182> ~345 kb 100ChildEBP RetAddr Args to Child 2484e0bc 7c827d0b 77e61d1e 00000100 00000000 ntdll!KiFastSystemCallRet2484e0c0 77e61d1e 00000100 00000000 00000000 ntdll!NtWaitForSingleObject+0xc2484e130 79e789c6 00000100 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac2484e174 79e7898f 00000100 ffffffff 00000000 mscorwks!PEImage::LoadImage+0x1af2484e1c4 79e78944 ffffffff 00000000 00000000 mscorwks!CLREvent::WaitEx+0x1172484e1d8 79efafc5 ffffffff 00000000 00000000 mscorwks!CLREvent::Wait+0x172484e1fc 79efad3d ffffffff 71ca0000 00157dd4 mscorwks!SVR::gc_heap::wait_for_gc_done+0x622484e224 79fd1cf3 00000002 00000000 00157d30 mscorwks!SVR::GCHeap::GarbageCollectGeneration+0x1b52484e2b4 79ef98cf 14fa94e0 0000000c 00000000 mscorwks!SVR::gc_heap::try_allocate_more_space+0x4642484e2d0 79ef9b9c 14fa94e0 0000000c 00000000 mscorwks!SVR::gc_heap::allocate_more_space+0x2e2484e2f0 79e7510e 14fa94e0 0000000c 00000000 mscorwks!SVR::GCHeap::Alloc+0x542484e30c 79e85806 0000000c 00000000 00000000 mscorwks!Alloc+0x602484e34c 79e858cb 79102290 abbf001a 000007d8 mscorwks!FastAllocateObject+0x38

Page 4: Dr.watson MiniDump Analysis Using WinDBG

2484e3f0 793ab271 01061034 2484e484 00000000 mscorwks!JIT_NewFast+0x9e2484e444 793aaef0 f1dcc000 ffffffbc 00000000 mscorlib_ni+0x2eb2712484e618 793b0d1f 3a5a2074 793740ab 00000000 mscorlib_ni+0x2eaef02484e620 793740ab 00000000 14fa94a0 2484e640 mscorlib_ni+0x2f0d1f2484e630 79e7c74b 00000000 00000000 2484e6c0 mscorlib_ni+0x2b40ab2484e640 79e7c6cc 2484e710 00000000 2484e6e0 mscorwks!CallDescrWorker+0x332484e6c0 79e7c8e1 2484e710 00000000 2484e6e0 mscorwks!CallDescrWorkerWithHandler+0xa32484e7fc 79e7c783 796c01c0 2484e92c 2484e84c mscorwks!MethodDesc::CallDescr+0x19c2484e818 79e7c90d 796c01c0 2484e92c 2484e84c mscorwks!MethodDesc::CallTargetWorker+0x1f2484e82c 79eb300f 2484e84c abbf0a12 14fa94a0 mscorwks!MethodDescCallSite::Call_RetArgSlot+0x182484e9f8 79eb2f31 2484ea8c abbf0942 09060054 mscorwks!ExecuteCodeWithGuaranteedCleanupHelper+0x9b2484eaa8 79373ff7 2484ea4c 408301c8 0109dd28 mscorwks!ReflectionInvocation::ExecuteCodeWithGuaranteedCleanup+0xf92484eac0 79373ede 4a34402c 2484eaf0 2484eb1c mscorlib_ni+0x2b3ff700000000 00000000 00000000 00000000 00000000 mscorlib_ni+0x2b3ede

Analysis

1. Application has thrown exception at the top of stack2. In CLR 2.0, unhandled exceptions at the top of the stack on any thread will terminate

the application3. Exception has occurred after GC was triggered4. Mark Phase was triggered for Generation 2 that means it was under severe memory

pressure5. since there are more than 1200 threads that itself amounts to 1.2 GB of virtual memory

for just the user mode stack and each thread has user mode plus kernel mode stack although not much but kernel stack will also add up to another 3 virtual pages(12KB) 1200*12KB of physical memory, kernel stack is physical memory resident. There will be around 100 MB or so in MEM_IMAGE itself

6. Since this exception has occurred while starting a worker thread from managed thread pool and at the same GC was also triggered. So that means there was not enough memory

7. There is no thread leak, these many threads are by design(one thread per request, I don't know why)

8. Some of the interesting pointsThere are more than 200 threads with preemptive GC disabled what that means is these threads can't be suspended and GC Threads will have to wait for these threads to return to preemptive GC mode in order to claim the memory. Managed threads should not be in a state with preemptive GC disabled because this is very rare as interpreted from the call stack Thread::RareDisablePreemptiveGC

Below is the description of Preemptive GC from http://blogs.msdn.com/yunjin/archive/2005/08/30/457756.aspx

Preemptive GC: also very important. In Rotor, this is m_fPreemptiveGCDisabled field of C++ Thread class. It indicates what GC mode the thread is in: "enabled" in the table means the thread is in preemptive mode where GC could preempt this thread at any time; "disabled" means the thread is in cooperative mode where GC has to wait the thread to give up its current work (the work is related to GC objects so it can't allow GC to move the objects around). When the thread is executing managed code (the current IP is in managed code), it is always in cooperative mode; when the thread is in Execution Engine (unmanaged code), EE code could choose to stay in either mode and could switch mode at any time; when a thread are outside of CLR (e.g, calling into native code using interop), it is always in preemptive mode.

CLR rotor source code has the following comments before calling Thread::RareDisablePreemptiveGC

We must do the following in this order, because otherwise we would be constructing the exception for the abort

Page 5: Dr.watson MiniDump Analysis Using WinDBG

without synchronizing with the GC. Also, we have no CLR SEH set up, despite the fact that we may throw a ThreadAbortException.

Most likely it crashed because of not having enough memory to start a worker thread since it crashed with the call stack pointing to WokerThreadStart

The real challenge is how will you find out the exception detail for sure?How will you make sure that this is indeed a out of memory exception?

Since this is a mini dump with no information on CLR Data structures. Effective managed debugging requires a mini dump with full virtual memory because managed heap is created with virtualalloc but Dr. Watson as a postmortem debugger with default options creates a minidump which includes all thread contexts and 10 instructions