In a news message on root.cz, someone has confused the newly introduced GCC 4.3 optimization of not including the extra CLD isntruction in front of each string instruction with a GCC bug. The title of the news read:
GCC 4.3 contains a bug
After reading the news, I figured out that it is not GCC which contains a bug, but the Linux kernel. I am not going to talk about this bug in this post (read more about it here). What I will talk about is, however, very related.
Learning more about the Linux bug made me realize how dangerous the seemingly innocent DF flag can be to the kernel. Without preventive measures, the reversed direction of string operations could cause severe kernel memory corruption. I also realized that we don't probably take those preventive measures (i.e. clearing DF upon entering the kernel) in HelenOS/ia32 and HelenOS/amd64. Quick check of the respective sources confirmed my worries.
Later at home, I did basicaly two things. The first was that I modified HelenOS libc to always do STD before doing a syscall and verified that the problem was real. The second thing was implementing the fix for this HelenOS bug.
On ia32, I inserted one CLD into the interrupt handler macro and one CLD into the syscall handler macro and that was it. On amd64, I used the CLD instruction for the interrupt handler macro as well, but wanted to make use of the amd64's SFMASK MSR for the syscall path instead of the boring CLD. The SFMASK register contains a mask that will be applied to the RFLAGS register during the SYSCALL instruction. In other words, a man can program it to clear arbitrary RFLAGS bits. When I was done with amd64, I wanted to test the fix in QEMU. To my surprise, fixed HelenOS/amd64 didn't function well in QEMU. I repeated the same test with Simics, where everything worked fine. Hm, a bug in QEMU? Looks like it (and today I found out what was wrong).
To top the DF story off, I have to confess that for some (rather short) time I believed that there was a similar bug in amd64 version of Solaris. Contrary to the ia32 version, there is no CLD at the beginning of the respective interrupt handler (i.e. cmntrap) and the syscall handler. In case of the syscall handler I noticed that the SFMASK MSR is only programed to clear the IF and TF, but not DF. Eager to find a bug, I even started to build a crafted Solaris libc, that would demonstrate the bug as I did with HelenOS. But then I finally got it: Solaris brings RFLAGS into a known state by pushing a pre-defined RFLAGS value onto the stack followed by the POPF instruction. I was so focused on the CLD which wasn't there and the insufficient SFMASK that I didn't see what was there all the time.