Monday, June 22, 2009

Christmas time in June

If you have followed the HelenOS development mailing list lately, you might have come to a conclusion that it must be Christmas again.

It all started with me committing support for non-root file system mounts. As the name suggests, this feature allows people to mount more than one file system at a time and to create a file system hierarchy as we know it from other operating systems. With this feature, it was finally possible to boot off a FAT file system ramdisk and mount an empty TMPFS file system under /tmp. Actually even better is booting off a TMPFS file system ramdisk and mounting a FAT (or any other) file system from permanent block device. The design is such that the mount point data is not stored centrally in e.g. the VFS task, but is distributed across the endpoint file systems. Each node in the endpoint file system server is a potential mount point. When something gets mounted under it, the mount point data containing an open connection to the mounted file system server is recorded in it. I was quite surprised and delighted when I found out that homogeneous mounts (e.g. mounting FAT on FAT) do not need any special treatment other than a little bit of locking discipline in the kernel when cloning the VFS connection to the mountee for the mounter. The figure below depicts how VFS and the endpoint file systems are interconnected (displaying only connections along which file system requests can be sent).

Non-root mounts are essential not only for mounting file systems on real block devices, but also for seemingly unrelated features such as console inheritance between e.g. the bdsh shell and a newly spawned task. This is where Martin got involved. He added a new file system called DEVFS, which roughly speaking associates file system nodes with IPC services such as console. Each virtual console has a corresponding node under /dev (for instance /dev/vc0). When spawning a new task, the shell application passes it its stdin, stdout and stderr. When the application writes its standard output, the write() request first goes to DEVFS, which in turn sends it to the console server.
Jiri spotted the opportunity and created three block device drivers: gxe_bd, ata_bd and file_bd. The first one is to be used with the arm32 and mips32 ports on the GXemul simulator. It controls the simple GXemul disk device. The second one is a simple driver for the ATA disks and can be used in the Qemu simulator with the ia32 and amd64 ports. Note that Jiri now warns against using it on real systems with real disks. The last block device driver, file_bd, is a loopback block device. It creates a new block device backed by a file. If the file contains an image of a file system, it can be mounted as though it was located on a real block device.
In parallel with the trunk changes, Lukas Mejdrech committed his IP bits to the networking branch. These changes allow HelenOS running in the Qemu simulator to receive data from the web browser running on the localhost. True, it still cannot answer the HTTP request (for that there would have to be more of the TCP module and some simple HTTP server), but it nicely demonstrates the progress.
To give this blog some real meat and to show that there are problems too, consider the following diagram:

It illustrates the sequence of requests made when trying to mount a file system image stored in a regular file located on an already mounted TMPFS file system. What is notable is that there is a cycle in the diagram, because FILE_BD is both a block device driver and also a client of the VFS server. For some time, there has been exactly one open connection between VFS and each endpoint file system, and one connection fibril to handle it. So what do we make of it? It's a clear deadlock situation: the sole fibril will be waiting for answers to requests made along the cycle while VFS will wait for the same fibril to process a new read() request.

A similar situation occurred when one task blocked in read() waiting for a character to be read. This put the sole DEVFS fibril to sleep and delayed consequent requests to write() a character on the remaining virtual consoles.

It took us some serious thinking before we realized that we are hitting exactly this problem. Eventually I fixed it for now by allowing VFS to open multiple connections to the endpoint file systems.

So to sum this writing up, the next upcoming HelenOS release will be full of (hopefully well debugged) new interesting features.

Tuesday, June 2, 2009

OpenSolaris 2009.06 is out!

The new release of OpenSolaris was released yesterday and if I count right, it contains two fixes of mine:
6747539 hati_pte_map() should undo HTABLE_LOCK_INC(ht) on LPAGE_ERROR
6743475 Some MCA banks are skipped too fearlessly
I have already blogged about the first fix in one of my previous blogs. The second fix slightly improves OpenSolaris MCA (Machine Check Architecture) code by adhering more closely to the Intel and AMD specifications and, by virtue of that, supporting more error detectors.