To make this work, I suggest:
* Create the partially mapped IDT as per the BSDI patch. Use it.
* Create a second copy of the IDT that it fully mapped. For each
vector in this second copy, install a routine which first reloads the
IDT pointer to point to the partially mapped IDT and then uses the
normal routine.
* When we get a page fault, check to see if the fault was in the
second IDT, and if so turn off interrupts (with a CLI), load the
pointer to the first IDT, gratuitously fetch the IDT descriptor for
exception 6 to make sure it's in the cache, and return to user mode
(doing an implicit CLI during the IRET).
The theory is that we reexecute the faulting instruction with a
normal-looking IDT, making sure that the descriptor is in the L1
cache, so we don't get the hang. The only way it would get rotated
out of the cache before the instruction is reexecuted would be if an
interrupt or exception occurs (i.e. some other code is caused to
execute) between when we reload the IDT pointer to the fully mapped
IDT and when the instruction is reexecuted. To prevent this, we
arrange for any such interrupt or exception to cause the partially
mapped IDT to be loaded again, and thus when the interrupt or
exception completes, the instruction would cause another page fault.
This has a bit more performance impact on debuggers (because trace and
breakpoint traps are handled through this mechanism, with an
additional ~100 cycles on a 486), but it shouldn't have any of the
caveats I previously mentioned.
[I'd implement this right now and try it, but I *really* have to go
sleep now. Recovering from a cold. *sigh*]