Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Embedded Systems

The Pentium F00F Bug


The Pentium F00F Bug

Robert is an independent consultant on the x86 architecture. He can be reached at [email protected].


Last fall, a message warning users of a bug in the Pentium/Pentium MMX was anonymously posted on the comp.os.linux.advocacy Internet newsgroup. According to the message, the bug could completely lock up the computer from any operating mode in any operating system. At first glance, the ordinary news reader might say "so what." After all, many users are accustomed to Windows 3.1/95 regularly locking up. But the placement of the bug on comp.os.linux.advocacy was calculated and intentional. The readers on that newsgroup knew exactly what the bug meant -- an ordinary user or saboteur could unleash a program to bring down network servers that form the backbone of our modern networking community. Internet service providers (ISPs), web hosts, government agencies, and computer departments at universities were petrified of the potential damage that this bug could do.

Enter the F00F Bug

When any x86 processor from the 80186 and beyond encounters an invalid instruction, the processor is supposed to generate an invalid opcode exception. In Intel vernacular, the undefined opcode exception is known as a "#UD." This handler usually signals an error condition and terminates the errant program. When this mechanism works, the errant program can't harm the computer system. Should this mechanism fail, however, the errant program can bring down the entire computer. If the computer is a network server or ISP, then the errant program can bring down the entire network.

That's what can, in fact, happen when the Pentium encounters the "F00F" bug, which maps to a LOCK CMPXCHG8B EAX instruction. CMPXCHG8B compares 64-bit memory contents with the contents in EDX and EAX. One of the operands must be memory, and the other (implied) operand is EDX:EAX. It is possible to construct an instruction encoding that doesn't map to a memory operand. Since the nonmemory form of this instruction is invalid, a compiler or assembler will not generate this code. Instead, assembly-language programmers must construct it by hand.

Such an illegal encoding should generate the requisite #UD. As you'd expect, a CMPXCHG8B EAX instruction generates a #UD. However, when this illegal encoding is prepended with a LOCK prefix, the processor fails to work correctly.

Using the LOCK prefix on this form of CMPXCHG8B is illegal in and of itself. LOCK prefixes are only allowed on memory-based read-modify-write instructions. Hence a LOCK prefix on the register-based CMPXCHG8B EAX instruction should also generate an invalid opcode exception.

Instead, the Pentium locks up and freezes the entire computer when it encounters this instruction. This bug is especially nasty, because any user can construct a program with this instruction, and upload it to a network computer, or incorporate it within an ActiveX applet. Once the program is run on the network, the network server crashes. The only possible recovery comes by hitting the big red switch. Suppose you download an ActiveX applet that contains this code. As soon as the code executes, your computer freezes up.

Within one week of the discovery of the bug, Intel announced a software workaround that can be incorporated into virtually any operating system (except real-mode operating systems, like DOS).

How the Bug Works

When the processor encounters the instruction F0 0F C7 C8 (or anything from F0 0F C7 C8..CF), the F00F bug occurs. The processor recognizes that an invalid opcode has occurred and tries to dispatch the #UD handler. Because of the LOCK prefix, the processor is confused. When the processor issues the bus reads to get the #UD handler vector address, the processor erroneously asserts the LOCK# signal. The LOCK# signal can only be asserted for read-modify-write instructions that modify memory. When the bus is locked, a locked memory read must be followed by a locked memory write, lest unpredictable results may occur. But in this case, the LOCK# signal remains asserted for the two consecutive memory reads required to retrieve the #UD vector address. The processor never issues any intervening locked write, and then hangs itself. This behavior is shown in the logic analyzer trace in Example 1. As you can see, the Pentium tries to retrieve the #UD vector with two locked reads. After that, all processor activity stops.

The Various Workarounds

There are various workarounds to this bug -- not all of them are good (a few are outright kludgey, in fact). One workaround Intel has proposed actually takes advantage of the bug's behavior to do the right thing. Another Intel workaround is ingenious, though a kludge. The first two alternate workarounds, to be presented shortly, are given for academic purposes only. Even though the workarounds have demonstrated their ability to obscure the bug behavior, they are not entirely reliable.

Intel's First Workaround

Part I, IDT Alignment:

A)Align the Interrupt Descriptor Table (IDT) such that it spans a 4-KB page boundary by placing the first entry starting 56 bytes from the end of the first 4-KB page. This places the first seven entries (0-6) on the first 4-KB page, and the remaining entries on the second page.

B)The page containing the first seven entries of the IDT must not have a mapping in the OS page tables. This will cause any of exceptions 0-6 to generate a page not present fault. A page fault prevents the bus lock condition and gives the OS complete control to process these exceptions as appropriate. Note that exception 6 is the invalid opcode exception, so with this scheme an OS has complete control of any program executing an invalid CMPXCHG8B instruction.

Part II, Page Fault Handler Modifications:

A)Recognize accesses to the first page of the IDT by testing the fault address in CR2. Page not present faults on other addresses can be processed normally.

B) For page not present faults on the first page of the IDT, the OS must recognize and dispatch the exception which caused the page not present fault. Before proceeding, test the fault address in CR2 to determine if it is in the address range corresponding to exceptions 0-6.

C)Calculate which exception caused the page not present fault from the fault address in CR2.

D)Depending on the operating system, certain privilege level checks may be required, along with adjustments to the interrupt stack.

E)Jump to the normal handler for the appropriate exception.

This is an ingenious solution to a horrible problem. Unfortunately, the solution is as bad as the problem. When the processor receives any of the first seven exceptions (Divide by Zero through Invalid Opcode), the processor generates a page fault instead of the appropriate exception. The page-fault handler gets mucked up with all kinds of code to check privilege levels and whether the fault was caused by another exception. If I had my druthers, I'd stay as far away from this solution as possible.

Intel's Second Workaround

Part I, IDT Page Access:

A)Mark the page containing the first seven entries (0-6) of the IDT as read only by setting bit 1 of the page table entry to zero. Also set CR0.WP (bit 16) to 1. Now when the invalid opcode exception occurs on the locked CMPXCHG8B instruction, the processor will trigger a page fault since it does not have write access to the page containing entry 6 of the IDT. This page fault prevents the bus lock condition and gives the OS complete control to process the invalid operand exception as appropriate. Note that exception 6 is the invalid opcode exception, so with this scheme an OS has complete control of any program executing an invalid CMPXCHG8B instruction.

B)Optional: If updates to entries 7-255 of the IDT occur during the course of normal operation, page faults should be avoided on writes to these IDT entries. These page faults can be avoided by aligning the IDT across a 4-KB page boundary such that the first seven entries (0-6) of the IDT are on the first read only page and the remaining entries are on a read/writeable page.

Part II, Page Fault Handler Modifications:

A)Modify the page fault handler to calculate which exception caused the page fault using the fault address in CR2. If the error code on the stack indicates the exception occurred from ring 0 and if the address corresponds to the invalid opcode exception, then pop the error code off the stack and jump to the invalid opcode exception handler. Otherwise continue with the normal page fault handler.

This workaround is really quite clever, in that it takes advantage of the bug as a means to provide a fix to the problem. When any of the first six exceptions occur, they are handled as they normally would. Divide by Zero through BOUND exceptions vector to their normal exception handlers without any intervening code in the page-fault handler. However, when the F00F bug occurs, the page-fault handler is invoked instead of the #UD handler. Why? CR0.WP=1 instructs the microprocessor to generate a page fault when an attempt is made by the supervisor to modify a memory page. The processor doesn't actually attempt to modify the Interrupt Descriptor Page (IDT page holding the #UD vector address) when the F00F opcode is encountered. But the bug actually makes the processor think it's modifying the IDT page with the #UD vector. The locked memory cycle somehow convinces the internal state of the Pentium to think that a write cycle is going to occur. Since the transition to the #UD handler is considered a supervisor task, the processor thinks it's going to write to this page. Thus when CR0.WP=1, a page fault occurs.

Even though this is a clever fix, there are two things I don't like about it:

  • The fix requires a kludge to the page-fault handler code. The page-fault handler must contain code to detect this condition, and vector to the #UD handler.
  • Setting CR0.WP=1 isn't a viable solution for everybody. Some operating systems may not be able to use this workaround.

If I were forced to choose between two of Intel's "blessed" solutions, I'd choose this one. However, because Intel set a precedent in documenting a solution that actually takes advantage of the bug behavior, this could give rise to much more elegant solutions that also take advantage of the bug behavior.

In fact, there are several alternative solutions to the problem, some quite simple.

Alternate Solution #1

Part I, IDT Alignment:

A)Align the Interrupt Descriptor Table (IDT) such that it spans a 4-KB page boundary by placing the first 56 bytes (IDT entries 0-6) from the end of the first 4-KB page. This places the first seven entries (0-6) on the first 4-KB page, and the remaining entries on the second page.

Part II, Page Table Modification:

A)Mark the first 4-KB page of the IDT as noncacheable by setting the appropriate Page Table Entry to Page Cache Disable (PTE.PCD=1).

All exceptions vector to their appropriate interrupt handler. The page fault handler doesn't need to be mucked up with any extra code. All of the exception handling code may remain unmodified. When the F00F bug occurs, the processor issues the two consecutive locked reads. However, the processor doesn't lock up because the page is noncacheable. Example 2 shows the logic analyzer trace of the microprocessor recovering from the F00F bug.

Alternate Solution #2

Part I, IDT Alignment:

A)Align the Interrupt Descriptor Table (IDT) such that it spans a 4-KB page boundary by placing the first 56 bytes (IDT entries 0-6) from the end of the first 4-KB page. This places the first seven entries (0-6) on the first 4-KB page, and the remaining entries on the second page.

Part II, Page Table Modification:

A)Mark the first 4-KB page of the IDT as noncacheable by setting the appropriate Page Table Entry to Page Write-Through (PTE.PWT=1). This is by far the best solution.

This solution maintains all of the benefits of having the page cacheable. However, because the page is considered write-through, the processor is tricked into recovering from the bus LOCK up condition. Example 3 shows the results of encountering the F00F bug when the page is cacheable, but marked as write-through.

Alternate Solution #3 (For DOS Users)

A)Turn off your cache. Enter your BIOS setup utility, and disable the processor cache. This prevents the F00F bug from locking up your computer.

This isn't really a viable solution for most people. Turning off the microprocessor cache can have a dramatically negative performance impact on your computer. Example 4 shows the results of the F00F bug with cache disabled.

Conclusion

The ultimate solution to the F00F bug problem is obtained when disabling the cache -- indicating that some interaction exists between the cache and the bug. Instead of taking advantage of the cache interaction, Intel's second solution takes advantage of interaction between the bug and page-fault mechanism. Now that Intel has set the precedent of using the bug behavior as a workaround, nobody should be concerned by the two more elegant solutions provided herein. My second alternate solution is by far the best. The exception handlers don't need to be mucked up with extra code, and the processor performance isn't impacted in the slightest. Note, however, that neither of these two alternate workarounds have been thoroughly tested in production code.

I've written two programs so that you can examine this bug and the various workarounds: F00FBUG.EXE, which demonstrates all five workarounds for the bug, and F00FBUG2.EXE, which demonstrates the most elegant workaround -- Alternate Solution #2. The source code and executables for both programs are available electronically from both DDJ and at ftp://ftp.x86.org/dloads/F00FBUG.ZIP.

DDJ


Copyright © 1998, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.