DTrace can be used to find the root cause of a BAD TRAP Solaris 10 system crash. Most BAD TRAP panics are caused by one kernel module clobbering kernel memory containing a pointer. When this now bad pointer gets used later, the result is a BAD TRAP kernel panic. One possible way to find the root cause of this type of system crash is to turn on kernel memory auditing through the kmem_flags kernel tunable. Common memory access bugs that can be detected with these flags are:
Writing past the end of an allocated buffer
Using uninitialized memory
Using a buffer after it has been freed
You are also able to get a history of thread stack traces that operated on the buffer that had one of these problems. In a previous blog I intentionally clobbered an important kernel pointer: rootdir. The next user of this pointer which happened to be my bash shell exec'ing a program caused the crash. Using SCAT 5.0 on the crash dump files with the panic command clearly pointed to the reason for the crash, but not the root cause of who/why was this memory clobbered. The kmem_flags would not have helped here because they only work with dynamically allocated kernel objects. The following DTrace script (D script) allows you to monitor a variable for changes, in this case the rootdir kernel variable:
The technique used here is to find the first function in the kernel that changes this variable. At the point the variable is changed display details of what was running and on what CPU. This is comparable to using watchpoints in a debugger except you don't stop the OS when the event fires with DTrace. The back quote character in DTrace is used to reference symbols in the kernel. If the memory was imbedded in a dynamically allocated object like a vnode, then the original value would have to be dynamically assigned in the D script after a return of some appropriate kernel function (which you would figure out after looking at the source code.) The intent is to run this script in background wating for the next crash. This script should cause less than 0.5% overhead to your CPU utilization.
mdb(1) has the ability of displaying DTrace trace data from a crash dump. This example assumes you are in the crash dump directory and are looking at crash files unix.2 and vmcore.2:
Given a dtrace_state_t structure that represents a DTrace consumer, prints dtrace(1M)-like output for in-kernel DTrace data. (The dtrace_state_t structures for all DTrace consumers may be obtained by running the ::dtrace_state dcmd.) When data is present on multiple CPUs, data are presented in CPU order, with records within each CPU ordered oldest to youngest. Options: