Wednesday, April 22, 2009

Time to go embedded

It looks like we will need to optimize HelenOS for embedded systems because it is getting too big for some machines. One particular machine, which has been causing us problems is the Simics Serengeti system, a close relative of this half-ton UltraSPARC III-based baby. The problem which puts Serengeti into the embedded category is that its OpenFirmware is for some reason unable to allocate enough contiguous space to accommodate the HelenOS boot image and/or the ramdisk. One therefore needs to be somewhat picky when configuring the system for Serengeti, which results in a reduced set of features that can be used on this system.

I am planning to change the build and configuration system in a way which would support optimizing for size. This ranges from using -Os instead of -O3 and stripping the binaries to cherry-picking what binaries will be part of the ramdisk and deploying the hopefully-soon-to-be-completed dynamic linker.

Wednesday, April 1, 2009

Source of interesting Solaris bugs

In one of my previous blogs, I talked about how I accidentally discovered two new VM bugs in Solaris 10 when I had done regression testing of my own fix for some unrelated problem. A couple of weeks ago, I found myself in a similar situation. This time, I did regression tests for a Solaris 10 backport of one of those two VM bugs (I had to fix it in Solaris Nevada first) and one of the tests resulted in kernel heap corruption. I was lucky, because the crash occurred on a debug kernel which runs with the kmem_flags variable set to 0xf. Besides other things, this setting enables buffer transaction logging and detection of writes past the end of a buffer. With this aid and the crash dump, I was able to root cause this bug. Exactly as in the previous case, the culprit here was an uninitialized variable. If you are interested, have a look at the following CR:
6823941 sendvec_chunk() uses uninitialized variables
What is striking is that neither of these uninitialized variables was discovered in compile-time or lint-time. Well, it's not that striking when we notice what options are used to build Solaris. Among others, the Solaris compiler wrapper utility turns on -Wno-uninitialized, which explains why GCC does not complain.

The morale of the story is threefold:
  1. do regression testing of debug binaries too or they become source of fun for someone else
  2. do not write functions that span over 14 standard terminals in the height dimension
  3. compiler warnings can be useful