This writeup was originally an article written for the Sun Catalyst
newsletter (January 96). I've edited out some of the introduction
so that we can jump right into the technical details.
Due to the age of the original article, some of the technical data
limits mentioned are no longer applicable. And I've certainly seen
crash dumps larger than 200mb since 1996!     :-)
...Let's focus on one example of system panics and hangs. Using a crash dump from a real customer, I will show you how I was able to debug and resolve a situation where the customer's server was occasionally hanging...
The system history can make a big difference in how I approach a crash dump. If the system only crashes or hangs at 3am on Tuesdays, I'm going to want to look at the cron jobs. If the system has a recent history of random crashes, hangs, watchdog resets, and sudden drops down to the OK prompt, I'm going to be very suspicious about the reliability of the hardware.
Unfortunately, too many customers spend hundreds of thousands of dollars on a large computer system, but won't invest the additional $1.95 and few minutes a day to maintain a system log note pad kept near the console. So, all too often, getting a good idea of the history of a system is not always easily achieved.
To analyze a system hang, one first needs to get a snapshot of the hung system. To do this, we convert the hang into a panic by forcing the system to an abrupt halt (L1-A) and then running the boot prom command: sync. Hopefully, depending on the reason for the hang, the system will cooperate, allow you to reach the boot prom, and the panic() routine will dump the image of memory to the primary swap device.
Not all hangs can be converted into system panics. When a system is suffering from hard, uninterruptable hangs, you are stuck with cycling power, and losing the contents of memory and thus the evidence as to why the system was hung.
Always make sure that savecore is pointing to a directory which has enough space to hold a few crash dumps without causing trouble. For example, do not use /var (which is the default) if /var is on your root filesystem. Have savecore put the crashes somewhere where you've got room to spare.
Also, when preparing for a panic, look at your primary swap device. When a system is panic'ing, it gets back to basics. Fancy disk mirroring and other disk management techniques can sometimes baffle the panic() routine. Also, panic() doesn't know anything about how multiple swap partitions can be used to make up one large swap area. Remember, keep the primary swap partition simple and straight forward!
Make sure your primary swap partition is large enough to hold an image of your system's memory. If you have a massive server with 2gb of memory, you can probably safely limit the primary swap partition to 300mb. To date, I've never seen a crash dump larger than that.
This server was hanging occasionally and also panic'ing at times. The customer turned on savecore and soon captured a crash image of one of the panics. I found that the panic was caused by a known kernel bug, one which was addressed in a recent version of the jumbo kernel patch. The patch was installed soon after. However, the hangs continued.
After forcing one of the hangs into a panic via L1-A and sync, the customer forwarded the files to me for analysis. The vmcore file from this crash was quite large, over 200mb in size. The unix file was around 1 mb, quite normal.
Hiya... strings vmcore.1 | more
Generic_101945-33 <--- Kernel patch
zero
rfsproccnt_v2
misc
With SMT7.2 Version 2.3.1 04189
Network Peripherals SBus FDDI
nfs_server
misc
unix
rpc_server
....
.... (stuff edited out)
....
nf0 at sbi3: SBus3 slot 1 0x7ff0 SBus level 4 sparc ipl 7
nf0 is /io-unit@f,e3200000/sbi@0,0/nf@1,7ff0
dump on /dev/dsk/c1t2d0s1 size 1024452K
panic[cpu0]/thread=0xe3901ec0: zero <--- Panic message
syncing file systems... 4830 4830 4830 4830 4830 done
49479 static and sysmap kernel pages
546 dynamic kernel data pages
620 kernel-pageable pages
7 segkmap kernel pages
....
... (output continues)
This core dump actually contains 604,179 lines of strings output. Fortunately, the kernel message buffer is up near the front of crash dumps.
As we see here, the jumbo kernel patch for Solaris 2.4 is installed. At the time of the crash, this was a fairly recent release. However, it is interesting to note that rev -33 never was released formally, so the customer was trying a "T" or temporary patch. Revision -33 was not found to be a problem during its testing, so I have no complaint with the customer running this revision for the moment.
Also in this output, we see the panic message which appeared on the console. A "panic: zero" is indicative of someone forcing a hang to become a panic. So, I now know that this is a hung system, which means I will not be doing a stack traceback. Instead, I'm going to have to look at ALL of the threads of the system to see who's doing what. This is not a difficult task, however, considering the massive size of the crash dump, this will be a long, tedious one.
Hiya... adb -k unix.1 vmcore.1
physmem 23d48
time/Y
time:
time: 1995 Oct 30 00:56:50
*time-(*lbolt%0t100)=Y
1995 Oct 25 07:16:58
Time and lbolt are two kernel symbols used to maintain the current
time, time, and the number of hundredths of a second since boot time,
lbolt. Using a simple equation within adb, we can get the time of
the crash, and thus the uptime of the system. In this case, this server
had been up less than 5 days.
Okay, it's time to get a stack trace of all of the threads. Since this is guaranteed to generate a lot of output (1937 lines in this case), we will tell adb to redirect output for the moment into a file called fred. Note that we are still in the adb session started above.
$> fred $< threadlist $> time/Y time: time: 1995 Oct 30 00:56:50The $>fred command sends adb output to file fred. The next command says to run the threadlist macro which generates the output we require. (For Solaris 1 systems, we would use the traceall macro instead.) Typing ahead, $> says to stop sending output to fred. Then, just so I know adb is done, I've asked to see time again.
Here are some snippets from the threadlist output.
thread_id e3922ec0
?(?) + effff4a8
cv_wait(0xe01c5b7c,0xe01c5b7c,0x2705bef,0x80000000,0xe01c8ac4,0x0) + 44
callout_thread(0x0,0x0,0xe01d6950,0xe01d6950,0xe01c5b90,0x0) + 1c
thread_id e3b08ec0
?(?) + effff4a8
poll_obp_mbox(0xfc,0xfc,0x0,0x404010e6,0x5b0c0,0x0) + bc
level14_handler(0xe3b08d54) + 4
L14_front_end(?) + 68
spl5(0x404000e0)
disp_getwork(0xf7660d20,0x404000e0,0x0,0xffffffff,0xe01c13f0,0x0) + c
idle(0xf7660d20,0x0,0xf7660d24,0xf7660e3c,0xe,0x0) + cc
thread_id f76eda40
?(?) + effff4a8
cv_wait_sig(0x0,0xe01ccf00,0xf76fa9a0,0xf76faa0c,0xf775caa8,0x0)
strwaitbuf(0xf4c8,0x1,0xffffffff,0xfffffffc,0xf7854198,0x0) + 50
strmakemsg(0x0,0x1,0x0,0x1,0xe3d2ed74,0x58) + 1bc
strputmsg(0x0,0xf76ae500,0xe3d2ee70,0x0,0xf76ae54c,0x3) + 294
msgio(0xe3d2efe0,0xe3d2ef50,0x2,0xe3d2eeef,0xe3d2eee8,0x3) + 34c
putmsg(0xe3d2efe0,0xe3d2ef50,0x7ff,0x3,0x0,0x29ea8) + 4c
syscall_ap(0x3) + 6c
thread_id f8ae7480
?(?) + effff4a8
cv_wait_sig_swap(0x0,0xf8c83f24,0xf8db4000,0xf8c83ed8,0xf8ac4048,0x0)
fifo_read(0xf7443264,0xe3eb3e48,0xf8c83ed8,0xf8c7fd28,0x0,0x0) + 12c
rw(0xe3eb3fe0,0xe3eb3f50,0x1,0x0,0x1,0x204) + 2f8
syscall_ap(0x0) + 6c
These are pretty common threads. Nothing odd going on here.
The 2nd thread shows an idle CPU waking up to acknowledge that the system was panic'ing. In fact, on this system, 19 out of 20 CPUs were idle. One was actually processing the "panic: zero". The fact that the system was completely idle is a clue to the problem we will find.
Tucked away in the output, we find some not so normal looking threads. As a matter of fact, there were quite a few threads which could be added to this list.
thread_id fc35ec60
?(?) + effff4a8
cv_wait(0xf7204010,0xf7204010,0xe1af63ff,0x0,0xf89b5564,0x0)
rmalloc_wait(0xf7204000,0x1,0xf7204018,0x1,0xf7204010,0x0) + 3c
kmem_getpages(0x1,0x0,0xe01ccc00,0x1,0x1,0x0) + 78
kmem_page_alloc(0x1,0x0,0xe1af63ff,0x1000,0x0,0xe01d7044) + 154
kmem_slab_create(0xf720d8a0,0x0,0x0,0x100,0x80,0xe01b92fc) + 5c
kmem_cache_alloc(?) + 64
kmem_alloc(0xf720d8a0,0x0,0xe1af63ff,0xfa73ccea,0x0,0xf720d8c8)
vol_enqueue(0x3,0xe3fa1e28,0x0,0x1,0x0,0x0) + 8
volclose(?) + 1a0
device_close(0xfa73cca8,0x3,0x2,0xf7f05558,0xf89b54f4,0xfcd1ea4c)
spec_close(0xf89b5b54,0x3,0xf89b5564,0xf89b556c,0xf89b5564,0x0) + d8
closef(0xf9d45540,0xe3fa1f50,0x16c0000,0xe3fa1f50,0x0,0xf89b5b54) + 68
syscall_ap(0x7) + 6c
thread_id f935e140
?(?) + effff4a8
cv_wait(0xf7204010,0xf7204010,0x400000e0,0x0,0x400000e0,0x0)
rmalloc_wait(0xf7204000,0x1,0xf7204018,0x1,0xf7204010,0x0) + 3c
ppmapin(0xe0930a7c,0x3,0xffffffff,0xe01c5578,0xe0930a7c,0xf6828000) + c
pagezero(0xe0930a7c,0xc00,0x400,0xf6827000,0xe0930a7c,0xe3e3bb24) + 38
ufs_getpage_miss(0xf8187a68,0x9000,0xe0930a7c,0xf73d7fa0,0x0,0x0) + 190
ufs_getpage(0xf8187a68,0xf6827000,0x9000,0x1000,0x1000,0xe3e3bc58) + 588
segmap_fault(0xf729b000,0xf73d7fa0,0xf6827000,0x1000,0x0,0x2) + e4
segmap_getmapflt(0xf73d7fa0,0x1,0x1800,0xf7224fc0,0xf6826000,0x1bff) + 308
wrip(?) + 434
rwip(0xf8187a60,0x8000,0x1800,0x0,0x400,0x1)
ufs_write(0xf8187a68,0xe3e3be48,0x0,0xf8febf68,0x8058,0xf8187ac8) + 1a8
rw(0xe3e3bfe0,0xe3e3bf50,0x2,0x0,0x0,0x400) + 2f8
syscall_ap(0x4) + 6c
Threads shouldn't have to wait for kernel memory, but that is what we
see happening here.
Hiya... crash -d vmcore.1 -n unix.1
dumpfile = vmcore.1, namelist = unix.1, outfile = stdout
> kmastat
buf buf buf memory #allocations
cache name size avail total in use succeed fail
---------- ----- ----- ----- -------- ------- ----
....
kmem_alloc_240 240 15 16 4096 7667 0
kmem_alloc_256 256 421 765 208896 721890 0
kmem_alloc_320 320 2 261312 89194496 328393 0
kmem_alloc_384 384 1265 6430 2633728 127825 0
....
inode_cache 320 41 20340 6942720 30188 0
fnode_cache 132 20 28 4096 41553 0
pipe_cache 252 38 105 28672 206902 0
---------- ----- ----- ----- -------- ------- ----
permanent - - - 155648 1358 43
oversize - - - 4788224 114667 10
---------- ----- ----- ----- -------- ------- ----
Total - - - 143306752 1688297597 31308688
>
On this system, the kernel map is huge! The hard limit on the kernel
map is around 140mb for this older release of the Solaris. We've hit
the limit. But why? The kernel isn't supposed to eat up memory!
The kmem_alloc_320 pool usage is much too heavy. Digging through SunSolve, we find that there were a couple of kernel memory leak problems, but the of one most interesting was a bug which talks about how the volume manager used to eat up kmem_alloc_320. This was fixed in the volume manager jumbo patch, which the customer now has installed. No system hangs of this type have been seen on this server since installing the patch provided.