A work posting again because everybody always asks what I do for a living.
I work in x64 engineering at Sun Microsystems. I ensure that vendor software from VMware works well on the various AMD and Intel x64 systems we ship. The process works something like this.
* Product Team specifies hardware and software requirements.
* Hardware Team designs the hardware.
* Software Team (that’s me) engages OS vendors (e.g. VMware, Redhat, Novell, Microsoft and Sun) and says, “In six months we’ll release the SuperConstellationMegaPlus using the unreleased 64 core Unobtanium HyperQuickConnect CPU with support for 256 sockets and maximum 32TB RAM and we already know your OS breaks with that high CPU count and memory size; InterHub MCP100 bridge on each socket so don’t forget to fix your multiroot PCI Express support; SAS 2 with zoning and up to 1024 discrete solid state disks; and the usual peripheral support and high speed Infiniband, 10Gbps networking, etc. with hot plug required on everything, including the processors and RAM.”
* 3 months later, OS vendors toss their pre-release builds to me.
* I toss the builds to our Software QA.
* Software QA finds bugs. It’s my job to work with the OS vendor and find the cause of those bugs and ask our OS vendors to fix them.
Here’s an example:
Early on, we discovered the VMware’s ESXi ‘thin’ hypervisor would not install on the SunFire x4140, x4240 and x4440 servers. These machines, codenamed “Dorado Tucana” (DTa for short), are essentially identical and share the same motherboard.
Previously, VMware always booted a modified Redhat distribution to install ESX. The ESXi install process differs from ESX “Classic” in that it uses itself as the installer. When you boot the ESXi install CD, you are booting ESXi.
I initially thought there was a bug in the HBA storage adapter because the install program always locked up at “Loading aacraid…”, which is the software to control the Adaptec storage controller we use in our test machines. Debug by process of elimination: I removed the Adaptec controller.
So now the machine hangs somewhere else. Hmmm….
But now I’m able to see messages like “Keyboard controller buffer overflow….” And our nifty hardware debug tool shows me that the program is stuck in a very small loop that looks something like this:
while (inb(0x64 & 0x01)) { call somefunction() }
I/O port 0x64 is the old legacy 8042 keyboard controller, except DTa does not have an 8042 or even a SuperIO chip! When I was reviewing the DTa hardware design way back in 2007, I even made a notation to our product team that this was our first platform without a legacy keyboard controller of any kind and we may encounter some OS bugs.
All modern PCs emulate the old 8042 keyboard controller first used in the IBM PC AT in 1984, because MS-DOS, the BIOS setup program, and the various option ROM setup programs all depend on the existence of a PC/AT keyboard even though your PC no longer even has a keyboard connector. The system BIOS can find your USB keyboard and make it pretend that it’s an old legacy PS/2 keyboard for this old legacy software.
When your modern OS (such as Windows, Linux or ESXi) boots, it pokes the BIOS and USB controller and tells them to stop pretending to be an 8042 and start acting like a real USB controller — this is called USB BIOS handoff. Almost every PC made, however, still has something that acts like the 8042 at I/O locations 0x64 and 0x60 somewhere on the motherboard — when the pretending stops, the real 8042 I/O is still there. When the OS reads the keyboard status register at 0x64, though, the 8042 isn’t connected to a keyboard, so it always reports the keyboard buffer is empty with a value of “0”.
As I mentioned previously, DTa does not have an 8042 of any kind. As soon as the OS takes over the USB operations and tells BIOS and the USB controller to stop pretending to be an 8042, there’s no longer anything at I/O locations 0x64 and 0x60. And when the CPU reads an invalid I/O location, the returned value is always “-1.” This means every bit of what the keyboard driver thinks is a status register is set. The keyboard driver thinks the keyboard buffer is full, reads the keyboard data register at 0x60 (which also returns -1 or 0xff), and tests the keyboard status again, which will be 0xff again. Rinse and repeat until done, except, of course, it never is done because inb(0x64) always returns -1.
I proved this by dissecting the guts of ESXi and removing the OHCI and UHCI USB drivers (which forces this handoff behavior and keeps the BIOS and USB driver in legacy keyboard mode). When I remove those software bits, the problem goes away. I reported this to VMware so they could make the necessary changes.
There are a couple of fixes to this problem. Linux counts the number of “-1” values it reads and if it decides the number is unreasonable, it decides there’s no 8042. The engineers at VMware got a little more clever for the fix and they look at the ACPI DSDT — the Differentiated System Description Table. This is a data structure in BIOS that lists the component hardware. If an 8042 is not listed in this table, ESXi knows not to load the keyboard controller device driver.
For those waiting to install ESXi on the x4140, x4240, and x440 (and many people have asked): This fixed version of ESXi is not yet released, though it should be available Real Soon Now and we’re already certifying that new version of ESXi for those servers.