It would also explain how the real fix works.
If you take a BSDI box after the patch and before the patch and compare
the MMU tables via /dev/mem etc you'll find there are a pair of funny pages
where the interrupt descriptor table has moved.
Odder still the low part of it doesnt have a pte. What it seems is done is to
put the low descriptors into an invalid page and take a page fault when
it tries to handle the fault from the lock cmpxchg8.
The linux code is based on this observation and does this trick. The page
fault handler then checks the fault and sees a kernel mode fault on
the descriptor block[1] and works out what the real fault was. It then calls
the relevant kernel function instead of doing normal page fault processing.
We could probably just remap the page then but its faster to call the
functions by hand than map and remap the page (causing tlb flushes).
Hopefully that info and the 2.1.63 linux patch is enough to get the fix into
other free OS's too. And if anyone can find a way to break the linux 2.1.63
fix we'd all love to know. Hopefully a complete official intel workaround
will appear shortly and we can switch to that.
Alan
[1] This is important - or we might take a fault for a user process at the
same address by chance and do a trap instead ..