Sun Solutions by Forsythe
Jarod Jenson
Chief Technology Architect

Back to the Blog

Thu, 07/10/2008 - 03:22 by Jarod Jenson


You know, I always hoped I wouldn't be one of those bloggers that drops a few posts and then disappears into the ether for months on end. It looks like I failed misearbly in my resolution.

The good news is that this is a clear indicator that our Solaris 10 Practice at Forsythe is growing. The bad news is that it means I have been too busy to keep up with the blog.

Being busy is not all bad though. Occasionally, it means that something new and interesting happens. A couple of weeks ago I was in Rhode Island with a Forsythe Account Executive - Jim - finishing up with a customer at lunch time. About that time, we get a call from a customer in New Hampshire that is having a critical performance issue, and they need someone there ASAP. Well, silly me thinks that I can't make it that day since fighting traffic around Boston would definitely take 2.5 hours or so if I get lucky. Jim had other plans. He is a private pilot and flies a Beechcraft Bonanza to get around the NE corridor. Less than 30 minutes later, we are in NH and just minutes from the customer - pretty cool. The flying was fun (though we did have to outrun a thunderstorm on the way back), but the best part was that we doubled the throughput of this particular Oracle instance. That always makes the trip worthwhile.

Although not directly related, this leads me to think about one of the unfortunate trends I have been seeing lately. There has been a marked increase in the number of 3rd party addons that include kernel drivers that seem to repeatedly be implicated as being major contributors to performance problems. Much of the software makes its way onto the system through the need for compliance/auditing (think SOX) or a vendor promise of (oddly enough) increased performance. When you see a non-core driver with lock names like xx_global_lock, and that lock is an order of magnitude hotter than all other locks combined - it is really disheartening. It is like towing an 8000lb trailer with an SSC Aero. You certainly won't get the performance you would expect from a car like that. Same situation here. Solaris is blamed for being slow, yet it is the unwillingly recipient of a shiny new hitch and trailer.

Certainly compliance is a constraint we must deal with, but a bit of due diligence on the developers part would go a long way. It seems quite obvious that they are testing on small systems (4 or fewer cores) where these issues aren't as apparent. Try the same bits on a 32, 64, or 128 way M-series system and things are quite different.

lockstat(1M) is a good friend to have here. Even if you don't have non-standard drivers on your system, it would be wise to both check 'smtx' from mpstat(1M) and get a few good lockstat(1M) runs during load in a QA/staging environment. Almost all kernel scalability issues will leave some form of tracks in lockstat(1M) output.

I do plan to update this space more frequently. If you happen to run into me, be sure and remind me of my promise.