Friday, October 31, 2008

Two interesting Solaris VM bugs

I ran into two interesting virtual memory bugs while testing some bugfixes for our customers about a month ago. The problem was occurring on some older 32-bit machines when running debug builds of Solaris 10. When running one of our test suites, the system would hit an assert() and panic like this:
assertion failed: ht->ht_lock_cnt == 0 || ht->ht_valid_cnt > 0, file: ../../i86pc/vm/htable.c, line: 994
Searching our bug database quickly revealed the possible culprit:
6607917 assertion failed: ht->ht_lock_cnt == 0 || ht->ht_valid_cnt > 0
This is a bug, which was already fixed in Nevada, but not in Solaris 10. Moreover, it can only happen with debug kernels, so there is no need to worry about it in case of our production release, i.e. Solaris 10. Right?

Wrong. Besides the fact, that a test suite running with user privileges was panicking the test machines on which I and a handful of my colleagues did our pre-integration testing, I discovered, that this is certainly not 6607917. The bug requires two threads to be calling htable_release() on one htable_t structure at a time, while the crashdump I received from the PIT machine showed only one thread.

You can see the analysis of the crashdump in the description of the new bug I have filed for it:
6747539 hati_pte_map() should undo HTABLE_LOCK_INC(ht) on LPAGE_ERROR
In this case, the kernel detects a collision between a new large page mapping and the current contents of a page table. While this conflict is handled by the system, it forgets to unlock the respective htable_t structure when it bails out in hati_pte_map(). There is no way a debug kernel could not panic afterwards, because the very next call after the failed hati_pte_map() is htable_release() - the function where the assert is hit.

But wait, there's more to it. From my crashdump, I noticed that the htable_t in question was empty - there were no entries in the corresponfing page table, so how come there could have been some conflict between a new large page mapping and the base size old mapping? Impossible.
/*
* Set the new pte, retrieving the old one at the same time.
*/

old_pte = x86pte_set(ht, entry, pte, pte_ptr);

/*
* did we get a large page / page table collision?
*/

if (old_pte == LPAGE_ERROR) {
rv = -1;
goto done;
}
This snippet above shows how the collision is detected - via a return value from x86pte_set(). So maybe there is something wrong with that function and we should not have detected the collision in the first place. And really, there is a regression introduced into Solaris 10:
6747627 x86pte_set() uses uninitialized variable "prev" when PAE is not enabled
So the first bug was actually assisted by the second bug - a piece of uninitialized memory confused x86pte_set() which in turn signalized an LPAGE_ERROR, which caused hati_pte_map() to bail out too quickly, leading to a kernel panic.

I wanted to post about these two new bugs earlier, but I also wanted to use them during the CDA course I delivered last week in Prague Sun office, so excuse the delay.