Sunday, July 26, 2009

Synchronous vs. Asynchronous, Part 1

Asynchronous Communication Using Synchronous IPC Primitives is Stefan Götz's master thesis that I have been recently reading through. In his thesis, the author designs asynchronous communication on top of L4's synchronous model. I think this paper is rather unique because it is the only L4 thesis (that I know of) which attributes some merit to asynchronous IPC. Other relevant L4 papers seem to neglect asynchronous communication with the typical microkernel-done-right dogmas.

Reading this thesis, especially the introductory chapter, was like pouring water on my microkernel antidogmatic mill because it clearly reveals a major drawback of purely synchronous IPC: increased number of address space switches. The rationale behind this claim is that with synchronous IPC, two address space switches must happen per one IPC request. The first happens on send, during a switch from the sender to the recipient and the second takes place on reply, during a switch from the recipient to the original sender. This happens per request due to the inevitable synchronous rendezvous of both communicating parties. If the communication protocol is a little bit more complicated, there will likely be several consecutive IPC sub-requests made by the sender and thus twice that number of address space switches.

Now, why there is not going to be so many address space switches with asynchronous IPC? Simply because the sender and the recipient do not have to wait for each other during each IPC. The message is stored in the recipient's mailbox without the need to switch to the recipient. Of course, in case of one IPC request, the number of address space switches will eventually be the same as the recipient needs to be switched to to read the message and the sender needs to be rescheduled in order to read the reply. However, in the case of more complex protocols, both the sender can push all sub-requests to the recipient's mailbox and the recipient can drain the mailbox without any additional address space switches. The asynchronous IPC will start to have diminishing returns as the dependency among consecutive sub-requests increases and the sender needs to wait for the results of previous requests prior to making new requests (and the communication becomes de facto synchronous).

The increased number of address space switches is a problem predominantly on processors that do not have tagged TLBs such as all IA-32 architecture and AMD64 architecture processors. The TLB, standing for Translation Look-aside Buffer, is a fast cache of recently used virtual-to-physical memory mappings. Without TLBs, the computer industry would probably have taken a different direction because the use of virtual memory would be so expensive (just imagine at least N physical memory accesses per page table walk incurred by a single instruction fetch on architectures with N-level page tables) that no one would like to use it (I think I have this observation from Andrew Tanenbaum's Modern Operating Systems, but my memory is weak so I might be wrong). With non-tagged TLBs, each TLB can contain mappings for only one address space, which has the unpleasant implication that non-tagged TLBs must be flushed on an address space switch. Flushed TLBs are gradually repopulated with the new address space's working set, which takes some time. Until the working set is completely restored, the TLB is not fully utilized and the system performance suffers the cost of all the page table walks.

The preceding paragraphs combined speak against massive and zealous use of synchronous IPC. The L4 developers came up with several architecture-specific tricks how to work around this problem. For example, in Improved Address-Space Switching on Pentium by Transparently Multiplexing User Address Spaces, Jochen Liedtke describes a way how to switch "small" address spaces without flushing the TLB on IA-32. No need to hold the surprise any longer, for address spaces of certain small size, they use segmentation. On AMD64, though, the segmentation has been practically removed from the processor, so this trick cannot be used.

All in all, it looks like asynchronous communication does indeed have some advantages over the synchronous communication. If taken ad absurdum, it is possible to devise IPC protocols which will linearly increase the number of address space switches with the number of requests made using the synchronous IPC.