Sun Solutions by Forsythe
David Rubio
Senior Consultant

Using DTrace to root cause system crashes

Mon, 08/11/2008 - 17:41 by David Rubio

DTrace can be used to find the root cause of a BAD TRAP Solaris 10 system crash. Most BAD TRAP panics are caused by one kernel module clobbering kernel memory containing a pointer. When this now bad pointer gets used later, the result is a BAD TRAP kernel panic. One possible way to find the root cause of this type of system crash is to turn on kernel memory auditing through the kmem_flags kernel tunable. Common memory access bugs that can be detected with these flags are:

Writing past the end of an allocated buffer

Using uninitialized memory

Using a buffer after it has been freed

You are also able to get a history of thread stack traces that operated on the buffer that had one of these problems. In a previous blog I intentionally clobbered an important kernel pointer: rootdir. The next user of this pointer which happened to be my bash shell exec'ing a program caused the crash. Using SCAT 5.0 on the crash dump files with the panic command clearly pointed to the reason for the crash, but not the root cause of who/why was this memory clobbered. The kmem_flags would not have helped here because they only work with dynamically allocated kernel objects. The following DTrace script (D script) allows you to monitor a variable for changes, in this case the rootdir kernel variable:

laptop# cat /watch.d
#!/usr/sbin/dtrace -s

#pragma D option bufpolicy=ring

BEGIN
{
value = `rootdir;
}

fbt:::return
/`rootdir != value/
{
printf(" (%d) %Y %s/%d/%d\n", cpu, walltimestamp, execname, pid, tid);
printf("User Stack:\n");
ustack();
printf("Kernel Stack:\n");
stack();
value = `rootdir;
}

The technique used here is to find the first function in the kernel that changes this variable. At the point the variable is changed display details of what was running and on what CPU. This is comparable to using watchpoints in a debugger except you don't stop the OS when the event fires with DTrace. The back quote character in DTrace is used to reference symbols in the kernel. If the memory was imbedded in a dynamically allocated object like a vnode, then the original value would have to be dynamically assigned in the D script after a return of some appropriate kernel function (which you would figure out after looking at the source code.) The intent is to run this script in background wating for the next crash. This script should cause less than 0.5% overhead to your CPU utilization.

mdb(1) has the ability of displaying DTrace trace data from a crash dump. This example assumes you are in the crash dump directory and are looking at crash files unix.2 and vmcore.2:

laptop# mdb 2
Loading modules: [ unix krtld genunix specfs dtrace cpu.generic uppc pcplusmp ufs ip hook neti sctp arp usba uhci s1394 fctl nca lofs audiosup zfs random cpc crypto fcip ptm sppp nfs ipc ]
> ::help dtrace

NAME
dtrace - print dtrace(1M)-like output

SYNOPSIS
addr ::dtrace [-c cpu]

DESCRIPTION

Given a dtrace_state_t structure that represents a DTrace consumer, prints
dtrace(1M)-like output for in-kernel DTrace data. (The dtrace_state_t
structures for all DTrace consumers may be obtained by running the
::dtrace_state dcmd.) When data is present on multiple CPUs, data are
presented in CPU order, with records within each CPU ordered oldest to
youngest. Options:

-c cpu Only provide output for specified CPU.

ATTRIBUTES

Target: kvm
Module: dtrace
Interface Stability: Unstable

> ::dtrace_state
ADDR MINOR PROC NAME FILE
d68d5700 2 d44629e0 dtrace d6b6acb8
> d68d5700::dtrace
CPU ID FUNCTION:NAME
0 16068 uiomove:return (0) 2008 Jul 30 11:22:16 mdb/993/1
User Stack:

0xd26473a5
0xd2751752
0xd2751924
0x80710e2
0x80846d1
0x805fe4d
0x8060071
0x806078c
0x80607fc
0x805e2cf
0x805e4a1
0x805def1
0x808b799
0x805da38
0x8073bc2
0x805adbe
Kernel Stack:

mm`mmio+0xbc
mm`mmrw+0x15e
mm`mmwrite+0x13
genunix`cdev_write+0x22
specfs`spec_write+0x379
genunix`fop_write+0x2a
genunix`pwrite64+0x203
unix`sys_sysenter+0x101

> $q

laptop#

The DTrace trace data captured from the crash files shows that it was mdb using the mm driver's write call that modified this memory.