Friday, August 21, 2009

Who is holding this RW-lock?

In Solaris, the kernel Readers-Writer locks are instances of the rwlock_impl_t type. The type is defined like this:
        typedef struct rwlock_impl {
uintptr_t rw_wwwh; /* waiters, write wanted, hold count */
} rwlock_impl_t;
As you can see, it is a mere 32-bit or 64-bit word, depending on the platform. Solaris uses all bits quite extensively. The lowest three bits indicate whether there are threads waiting for this lock, if some of them wants it for writing and whether the lock is held by a writer. The remaining 29 or 61 bits are used either to hold the address of the single writer holder or the number of readers already in the critical section. The reason why 29 or 61 bits are enough to hold a thread address is that thread structures are guaranteed to be aligned on at least 8-byte boundary, so the 3 missing bits will always be zero.

Now, while it is straightforward to determine the address of the thread who owns the lock as a writer, there is nothing like a list of threads that own it for reading. Which is worse, the current implementation does not even provide means to determine the address of a sole reader owner in cases when the lock is locked by one thread for reading and the next thread wants it for writing (and all consecutive threads, no matter whether writers or readers, are blocked until the sole reader exits the critical section). No need to say that exactly this kind of information may play an essential role when analyzing a crash dump of a hung machine.

Some time ago, I was involved in analyzing a hang of over 200 threads out of which the majority blocked on several RW-locks that were all held for reading:
6842265 lots of threads transitively waiting for scsi_flag_nointr_cv in scsi_transport()
In my analysis, I manage to identify all the reader holders of those locks using a technique, which is unfortunately not universal, but can be repeated in other similar cases. In my case, the data structure is a device tree, where each device node has a RW-lock. Besides a suitable data structure (e.g. a tree), the essence of success was also the knowledge of the functions that got hung: I knew that some of them will always attempt to read-lock a parent node and one of its children. Thus, when I found a thread waiting for e.g /pci@5c,600000, at a certain offset in dv_find(), I knew that it must be the thread which locked the root node for reading. To see how I identified the remaining reader-holders, I encourage you to see the above mentioned CR.