After some discussion with Jiri, I thought it would be useful to know how big is the HelenOS kernel. I've done several measurements and obtained interesting and surprising results. For each architecture, I have built the system using the default configuration, and a configuration with all options disabled. Both these builds would use the -O3 optimization. Then I changed the -O3 to -Os and rebuilt the minimal configuration. On some platforms, the -Os could not be used right away due to missing parts of the softint library, so I used -O2 instead. The HelenOS build system does not strip the kernel.raw binary so I did that manually and wrote down the final size obtained. These are my results:
What is immediately clear from the chart is that there is something wrong with amd64 as it gives a kernel which is about three times as big as the other kernels. We'll clearly need to investigate this.
All 32-bit kernels and all 64-bit kernels except the amd64 kernel are comparable in size. The ia32 kernel is the smallest one, even though it would be interesting to know how small would be the arm32 kernel with the -Os optimization. When the minimal configuration is optimized for size and stripped, the ia32 binary is only 116 KiB. When I alter the kernel Makefile, I can get additional 4 KiB away by not using frame pointers (-fomit-frame-pointer). Even though not supported right now, further 27 KiB can be shook off by not including the .bss section in the kernel image. This will give us quite respectable kernel size of just 84 KiB! This 84 KiB kernel would be completely headless (almost no drivers compiled in) and trimmed down (e.g. without SMP support), but it would do its job well. I believe further improvements are still possible, but the continued effort will most likely start to have diminishing results soon.
Thursday, March 19, 2009
Friday, March 13, 2009
For about a month, Martin, Jiri and me have been torturing HelenOS sources in a distributed attempt to improve the subsystems responsible for processing user's input and output. Things are still not perfect now, but before these changes, this part of HelenOS was a complete disaster. Here are few examples of how badly designed this part of HelenOS was:
- There was no layering between the hardware drivers and the code which interpreted characters received using these drivers - everything would happen inside the driver itself. With this setup it was not possible to support different hardware configurations in a clean and generic way. For instance, the ns16550 driver assumed a Sun keyboard attached to the ns16550 serial port. This was, of course, something which complicated the use of the driver for plain serial communication between two computers interconnected with a serial cable.
- Neither the kernel nor the userspace drivers supported more than one instance of each character device (e.g. i8042, ns16550, z8530).
- Because of only one instance was supported, each driver inferred its role as stdin or stdout without asking.
- The way how interrupts were sent to userspace device drivers required a little brother kernel driver for the same device which would accept the interrupt and send it to the userspace server.
- Both kernel and userspace drivers were rather platform specific, either using memory mapped accesses to device's registers or separate I/O instructions.
- There was also some duplication of code, both on the physical device level (e.g. duplicate ns16550 driver) and also on the character interpretation level (e.g. several occurrences of code which processed serial line input).
- US QWERTY
- US Dvorak
Encouraged and inspired by Jiri's success, I wanted to fix the layering and all the other above mentioned problems in the kernel too. I started by converting the kernel drivers to the PIO (Programmed I/O) interface. The PIO functions abstract the implementation of the machine's I/O space away and allow the drivers to be written in a generic way, regardless of the fact whether the device is in a separate I/O space or is memory mapped. Once I had PIO for all platforms, I started to convert all character device drivers to it. That was probably the easiest part. In parallel, I began to slowly move away from the one-instance per device driver model and free the kernel drivers from the duty to notify their userspace counterparts about interrupts via IPC. That was probably the hardest part as it required:
- extend the interrupt top-half pseudocode to support independent userspace drivers, and
- rewrite the way how interrupts are dispatched in the kernel, and
- fix the userspace drivers to use the new pseudocode.
Rewriting interrupt dispatching looked like a real teaser to me because the kernel interrupt structures are never deallocated while the userspace interrupt structures need to be allocated and deallocated dynamically on demand. It was also interesting from the synchronization point of view as I didn't want to deallocate a userspace interrupt structure while the interrupt is in progress. In the end, I came up with a solution with separate hash tables for userspace and kernel interrupts. My change broke some things such as klog and kconsole notifications as well as switching between kernel and userspace drivers. The latter used to work thanks to the grab and release methods in each kernel driver. But since the kernel driver became a distinct entity and these functions were removed, the driver toggling had to be solved in another way. This is were Martin got involved and implemented a clever fix:
if the silent variable is true, search the userspace hash table first and the kernel hash table second
if the silent variable is false, search the kernel hash table first and the userspace hash table secondThis gives a userspace driver a chance to process the interrupt if the kernel console is inactive, but if there is no userspace driver to claim the interrupt, the kernel can still react to it, and vice versa.
Having freed the kernel drivers from the burden of interrupt notifications for userspace brothers, I could finally proceed to fix the other problems and layer the kernel input subsystem into several components. At the lowest level, there are serial controller drivers, much like the port drivers in the case of the userspace kbd server. Each driver connects either to a keyboard input module or a serial line input module. Contrary to the userspace server, the kernel input modules convert raw data from the serial controller drivers to a stream of ASCII characters and feed it the connected component, which is most likely the kernel console.
After my change, Martin noticed that the data structure which has been used to pass characters between various components - chardev_t - is bidirectional in nature, even though it is mostly used for one direction only. After some 30 commits, the whole kernel and all drivers and input modules were converted to use indev_t for input devices and outdev_t for output devices instead.
On the userspace side, the kbd server is still a monolithic piece of software with a limited hardware support and flexibility configured during compile time. We are considering changing this towards a more modular scheme in which there would be one running port driver task per device instance. That would allow us to move the configuration from compile time to runtime.