You don't have to write for the least-common denominator
Rick was formerly the director of technology at IntraCorp, a company specializing in entertainment software. Some of his recent titles include GrandMaster chess, BridgeMaster, and Trump Castle 3. Rick was on the staff at COMPUTE magazine and still writes a column and various features. He recently released Graphics Guru, a graphics library that most certainly uses processor-specific code. You can reach him through the DDJ offices; or on CompuServe at 74676,457.
While many applications really require the power of 80386 PCs, we can't assume (or demand) that users of our software have 386-class machines. Consequently, our programs have to support everything from 8088s to Pentiums. This means writing for the least-common denominator--and hoping Pentium users won't notice the difference.
However, it is possible to write processor-specific code, taking advantage of each processor's strengths using the techniques I present here. The hard part is finding out which processor, or even which version of a processor, you're running on; the easy part is knowing what to do. For example, my assembly code usually has vector tables for each processor, and once I've identified the CPU, I copy the appropriate vectors into a global table. From then on, it's completely transparent--you simply make calls based on the vector table.
There are few steps to finding out what CPU is under the hood. To get started, I check bits 12-15 of the flags register. If they're always set, regardless of how you alter them, you've got an 8088 or an 8086. Then, I use a self-modifying code technique to find out if the prefetch queue is 4 or 6 bytes. If it's 4, the chip is an 8088; 6 means it's an 8086.
To distinguish between a 286 and 386/486s, you need to check for 386/486 flags that are undefined for the 286. If it's a 386 or better, the next step is to check the AC bit in the flags. If it can be set, you've got a 486; if it remains unset, it's a 386.
Using Bits 12-15 of the Flags
Determining whether a processor is an 8088 or 8086 simply requires checking bits 12-15 of the flags register. Although you can't move the flags directly to a word register, you can push the flags and pop them into a register. To move the value back into the flags, reverse the operation.
To test bits 12-15, mask them off by ANDing the working register with Offfh. The value must be moved into the flags, then back into the working register. If bits 12-15 are all set, the processor is an 8088 or an 8086. The; Is It an 8088? segment of Listing One (page 126) illustrates this.
Differentiating between 8088s and 8086s is trickier. The easiest way I've found to do it is to modify code that's five bytes ahead of IP. Since the prefetch queue of an 8088 is four bytes and the prefetch queue of an 8086 is six bytes, an instruction five bytes ahead of IP won't have any effect on an 8086 the first time around. See the; Is It an 8086? segment in Listing One.
Differentiating between the 286, 386, and 486
The easiest way to determine whether or not the CPU is a 80286 is to set the NT (nested task flag) and IOPL (I/O privilege level) bits in the flags register, then check to see if they're still set. These bits are undefined for 286s and will hold 0s no matter what you do. Since you can't set the flag bits directly, you have to use ax and the stack to indirectly set the bits. The ; Is It an 80286? part of Listing One describes this.
Actually, it turns out there's one case where IsItA286 returns a bogus value, indicating a 286 when in fact you're running on a 386. This is due to an obscure bug in Windows 3.1, as Bob Moote of Phar Lap Software has graciously pointed out. In a DOS box under Windows 3.1 enhanced mode, the pushf and popf instructions are virtualized if the DOS box is running at IOPL 0. According to Bob, when you execute the virtualized pushf, Windows doesn't remember that you tried to set those bits, and it returns the real hardware bits, which are all 0s. Consequently, you think you're on a 286 when it's really a 386 (or later). Keep in mind that this only happens under Windows 3.1 once out of every several thousand runs because Windows doesn't use IOPL 0 that often. Phar Lap gets around this problem in its 386|DOS-Extender (once they know they're on at least a 286) by doing an additional test before trying the flag-setting test. They use the SMSW instruction to check if the PE (protected mode) bit is set; if so, it's in V86 mode and must be a 386 or later. If the PE bit indicates real mode, then they do the flag-setting test to see if it's a 286.
If you know you've got a 386 or higher, you have to look for a 486. The easiest way I've found to do this is to set the 486 AC (alignment check) bit in the flags and see if it remains set. If it does, it's a 486. The code segment in Listing One labeled; Is It an 80386 or 80486? shows how to do this.
For a Pentium-detection scheme, see the accompanying text box entitled "Pentium Detection." For a cache-detection technique, see the text box "486 Cache Detection."
Conclusion
Now you have no more excuses. Write the best code for the processor you've got, and give users their money's worth. Supporting 386s and higher is a solution, but not one nearly as professional as multiprocessor support.
486 Cache Detection
Steve Heller
Steve is the author of Large Problems, Small Machines (Academic Press, 1992) and president of Chrysalis Software, P.O. Box 0335, Baldwin, NY 11510. He can be contacted on CompuServe at 71101, 1702.
While Intel's 486 has an internal cache of 8 Kbytes, 486 clones from other manufacturers have internal caches of varying sizes. Cyrix's 486SLC, for instance, has a 1K cache, while IBM's 486SLC has one that's 16 Kbytes in size. Furthermore, most 486-based PCs also have additional external caches. The program presented here (see Listing Three, Page, 127) detects whether internal or external (or both) caches are active so that you can turn them on or off, if necessary.
This approach takes advantage of the fact that jumping to an instruction that isn't aligned on a doubleword boundary is time consuming when there's no caching, as the processor has to access additional memory words to extract the next instruction to be executed. The program has two loops: one that executes an indirect jump to a properly aligned target, and a second that executes the same jump to a misaligned target. If the 486/66 DX2 internal cache is active, the two loops take almost identical amounts of time. If only the external cache is active, the misaligned loops takes about 50 percent longer to execute than the aligned one. If neither is active, the misaligned loop takes about twice as long to execute as the aligned one. Similar results occur on a 486/33 and on a Cyrix 486SLC/50. Even a 386/40 DX with an external cache displays a noticeable effect, although of much smaller magnitude. Keep in mind that if the internal cache is active, this program cannot determine the status of the external cache, as it is completely masked by the internal caching.
The program was compiled using Borland C++ 3.1 and should be straightforward to port to other compilers as long as they allow inline assembly language. However, any changes must not disturb the alignment of the first loop and the misalignment of the second.
Pentium Detection
Robert Moote
Robert is a cofounder of Phar Lap Software and can be contacted at 60 Aberdeen Ave., Cambridge, MA 02138.
To determine whether or not the target system is Pentium based, first ascertain that you are at least on an i486 chip, as Rick describes. Then call the detect routine in Listing Two (page 127), which returns 4 if the processor is an i486, 5 or greater if the processor is a Pentium or later chip.
In the Pentium chip, Intel finally designed in a software-detection method by adding the CPUID instruction for obtaining information about the chip family, stepping, and unique features, and by adding an ID bit to the EFLAGS register to ascertain whether the processor supports the CPUID instruction. The code first attempts to toggle the ID bit in the EFLAGS register; the Pentium processor is the first in the 80x86 family to allow this bit to be set to both 1 and 0.
The CPUID instruction takes an input value in the EAX register. A 0 in EAX returns the highest input value recognized by CPUID in EAX (this is 1 for the Pentium chip), and a vendor identification string in EBX-EDX (GenuineIntel for Intel's chip). Passing a 1 in EAX returns the chip information in EAX-EDX; this is the form of CPUID used in the detect code. On the Pentium, the chip family, model, and stepping are returned in EAX, and some bit flags identifying processor features are returned in EDX. The detect routine returns the chip family code, taken from bits 8-11 of EAX. The value 5 is used to identify the Pentium chip family; if Intel takes the logical step of incrementing this ID value for each subsequent chip in the 80x86 processor line, this is the last 80x86 detect routine you'll have to write.
Of course, the code presented here requires an assembler that supports Pentium opcodes. Note that the .586 directive is how you turn on Pentium support in the Phar Lap assembler; other assemblers may use different directives, since the chip isn't called a 586. You can also code the detect routine with any assembler just by using the DB directive to hardcode the opcode for CPUID.
_PROCESSOR DETECTION SCHEMES_ by Richard C. Leinecker[LISTING ONE]
<a name="0168_000c"> ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Detect the Processor Type -- by Richard C. Leinecker ;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; _PTEXT SEGMENT PARA PUBLIC 'CODE' ASSUME CS:_PTEXT, DS:_PTEXT public _Processor ; This routine returns the processor type as an integer value. ; C Prototype ; int Processor( void ); ; Returns: 0 == 8088, 1 == 8086, 2 == 80286, 3 == 80386, 4 == 80486 ; Code assumes es, ax, bx, cx, and dx can be altered. If their contents ; must be preserved, then you'll have to push and pop them. _Processor proc far push bp ; Preserve bp mov bp,sp ; bp = sp push ds ; Preserve ds push di ; and di mov ax,cs ; Point ds mov ds,ax ; to _PTEXT call IsItAn8088 ; Returns 0 or 2 cmp al,2 ; 2 = 286 or better jge AtLeast286 ; Go to 286 and above code call IsItAn8086 ; Returns 0 or 1 jmp short ExitProcessor ; Jump to exit code AtLeast286: call IsItA286 ; See if it's a 286 cmp al,3 ; 4 and above for 386 and up jl short ExitProcessor ; Jump to exit code AtLeast386: call IsItA386 ; See if it's a 386 ExitProcessor: pop di ; Restore di, pop ds ; ds, pop bp ; and bp ret ; Return to caller _Processor endp ; Is It an 8088? ; Returns ax==0 for 8088/8086, ax==2 for 80286 and above IsItAn8088 proc pushf ; Preserve the flags pushf ; Push the flags so pop bx ; we can pop them into bx and bx,00fffh ; Mask off bits 12-15 push bx ; and put it back on the stack popf ; Pop value back into the flags pushf ; Push it back. 8088/8086 will ; have bits 12-15 set pop bx ; Get value into bx and bx,0f000h ; Mask all but bits 12-15 sub ax,ax ; Set ax to 8088 value cmp bx,0f000h ; See if the bits are still set je Not286 ; Jump if not mov al,2 ; Set for 286 and above Not286: popf ; Restore the flags ret ; Return to caller IsItAn8088 endp ; Is It an 8086? ; Returns ax==0 for 8088, ax==1 for 8086 ; Code takes advantage of the 8088's 4-byte prefetch queues and 8086's ; 6-byte prefetch queues. By self-modifying the code at a location exactly 5 ; bytes away from IP, and determining if the modification took effect, ; you can differentiate between 8088s and 8086s. IsItAn8086 proc mov ax,cs ; es == code segment mov es,ax std ; Cause stosb to count backwards mov dx,1 ; dx is flag and we'll start at 1 mov di,OFFSET EndLabel ; di==offset of code tail to modify mov al,090h ; al==nop instruction opcode mov cx,3 ; Set for 3 repetitions REP stosb ; Store the bytes cld ; Clear the direction flag nop ; Three nops in a row nop ; provide dummy instructions nop dec dx ; Decrement flag (only with 8088) nop ; dummy instruction--6 bytes EndLabel: nop mov ax,dx ; Store the flag in ax ret ; Back to caller IsItAn8086 endp ; Is It an 80286? ; Determines whether processor is a 286 or higher. Going into subroutine ax = 2 ; If the processor is a 386 or higher, ax will be 3 before returning. The ; method is to set ax to 7000h which represent the 386/486 NT and IOPL bits ; This value is pushed onto the stack and popped into the flags (with popf). ; The flags are then pushed back onto the stack (with pushf). Only a 386 or 486 ; will keep the 7000h bits set. If it's a 286, those bits aren't defined and ; when the flags are pushed onto stack these bits will be 0. Now, when ax is ; popped these bits can be checked. If they're set, we have a 386 or 486. IsItA286 proc pushf ; Preserve the flags mov ax,7000h ; Set the NT and IOPL flag ; bits only available for ; 386 processors and above push ax ; push ax so we can pop 7000h ; into the flag register popf ; pop 7000h off of the stack pushf ; push the flags back on pop ax ; get the pushed flags ; into ax and ah,70h ; see if the NT and IOPL ; flags are still set mov ax,2 ; set ax to the 286 value jz YesItIsA286 ; If NT and IOPL not set ; it's a 286 inc ax ; ax now is 4 to indicate ; 386 or higher YesItIsA286: popf ; Restore the flags ret ; Return to caller IsItA286 endp ; Is It an 80386 or 80486? ; Determines whether processor is a 386 or higher. Going into subroutine ax=3 ; If the processor is a 486, ax will be 4 before leaving. The method is to set ; the AC bit of the flags via EAX and the stack. If it stays set, it's a 486. IsItA386 proc mov di,ax ; Store the processor value mov bx,sp ; Save sp and sp,0fffch ; Prevent alignment fault .386 pushfd ; Preserve the flags pushfd ; Push so we can get flags pop eax ; Get flags into eax or eax,40000h ; Set the AC bit push eax ; Push back on the stack popfd ; Get the value into flags pushfd ; Put the value back on stack pop eax ; Get value into eax test eax,40000h ; See if the bit is set jz YesItIsA386 ; If not we have a 386 add di,2 ; Increment to indicate 486 YesItIsA386: popfd ; Restore the flags .8086 mov sp,bx ; Restore sp mov ax,di ; Put processor value into ax ret ; Back to caller IsItA386 endp _PTEXT ENDS END <a name="0168_000d"> <a name="0168_000e">[LISTING TWO]
<a name="0168_000e"> .586 ; Pentium detect routine. Call only after verifying processor is an i486 or ; later. Returns 4 if on i486, 5 if Pentium, 6 or greater for future ; Intel processors. EF_ID equ 200000h ; ID bit in EFLAGS register Pentium proc near ; Check for Pentium or later by attempting to toggle the ID bit in EFLAGS reg; ; if we can't, it's an i486. pushfd ; get current flags pop eax ; mov ebx,eax ; xor eax,EF_ID ; attempt to toggle ID bit push eax ; popfd ; pushfd ; get new EFLAGS pop eax ; push ebx ; restore original flags popfd ; and eax,EF_ID ; if we couldn't toggle ID, it's an i486 and ebx,EF_ID ; cmp eax,ebx ; je short Is486 ; ; It's a Pentium or later. Use CPUID to get processor family. mov eax,1 ; get processor info form of CPUID cpuid ; shr ax,8 ; get Family field; 5 means Pentium. and ax,0Fh ; ret Is486: mov ax,4 ; return 4 for i486 ret pentium endp <a name="0168_000f"> <a name="0168_0010">[LISTING THREE]
<a name="0168_0010"> #pragma inline main() { long start1; long end1; long start2; long end2; start1 = clock(); asm P386 asm mov eax,10000000 asm lea ebx,loop1 asm loop1: asm dec eax asm jz loop1e asm jmp ebx loop1e: start2 = end1 = clock(); asm mov eax,10000000 asm lea ebx,loop2 asm nop asm loop2: asm dec eax asm jz loop2e asm jmp ebx loop2e: end2 = clock(); printf("misaligned loop time = %d, aligned loop time =%d\n", (int)(end1-start1), (int)(end2-start2)); return; }
Copyright © 1993, Dr. Dobb's Journal