Brian is a graphics researcher specializing in the field of real-time 3-D computer graphics. He currently works for a 3-D graphics hardware firm and is the author of Building a 3D Games Engine in C++ (John Wiley & Sons, 1995). He can be contacted at [email protected]. Kendall is the lead developer at SciTech Software, which has developed the Universal VESA VBE, MGL graphics library, and recently, WinDirect. He can be reached at [email protected].
High-performance graphics applications under DOS typically require direct access to the graphics card's frame buffer. On the VGA, this memory is accessed in the memory region from A000:0000 to A000:FFFF. Since this allows only 64 Kbytes to be accessed at a time on a VGA, accessing all of the memory on the VGA card requires a banking scheme. With a banking scheme, a window into a part of the frame buffer is addressed at A000:0000, but the piece of frame buffer this window points to can slide around. While this allows all of the memory on the video card to be accessed, it requires that the frame buffer be dealt with in 64-Kbyte chunks--a programming hassle at best, a serious performance degradation at worst. To combat this problem, the Video Electronics Standards Association (VESA) has implemented, as part of its VESA BIOS Extension (VBE) 2.0, a method by which a pointer to a linear frame buffer can be obtained by an application running on a VBE 2.0-compliant graphics system.
Compatibility and the VESA BIOS
Manipulating a video card's banks requires in-depth programming knowledge of its video chipset. Supporting a multitude of video cards, however, can be arduous. To circumvent this, VESA designed and implemented a standard BIOS interface that supports bank switching and other functions for a wide range of video cards. This allows you to program for the VESA BIOS without supporting a specific video card's idiosyncratic bank-switching mechanisms. If a video card is VESA compatible, you can be reasonably sure that your code will run.
The Banking Performance Penalty
While the original VESA BIOS specification provided a hardware-independent method to access all of a video card's display RAM, there was a significant performance penalty because banking is inherently slow. For starters, bank switching is expensive, since the VESA BIOS interface is accessed via software interrupt 0x10. Not only is the interrupt slow, but there is a potential context switch from protected mode to real mode and back, compounding the expense. This, coupled with the significant bookkeeping overhead that bank switching imposes (bank boundaries must be watched for at all times during operations such as line drawing rectangle clears, and so on), makes banked frame buffers inefficient for getting to a video card's RAM.
Linear Frame Buffer versus Banking
To address the problems of banked frame-buffer access, the VESA committee has designed and ratified the VBE 2.0 specification. This major overhaul of the VBE interface introduces two performance-enhancing capabilities. The first is protected-mode bank switching, which removes the need for expensive context switching when banking. The second capability removes the need for banking altogether by handling the frame buffer as a single chunk of contiguous memory, assuming that the underlying hardware is capable of supporting such access. This is important, since VBE 2.0 doesn't guarantee the existence of linear frame-buffer access--it only provides an interface to the linear frame buffer if it exists. Linear frame-buffer support possesses many of the same advantages the 32-bit flat model has over the 16-bit segmented memory model of the Intel processor--simpler addressing, no segment/bank swapping, and access to a larger address space.
As a fortunate side effect, VBE 2.0's linear frame-buffer access usually provides significantly improved performance on PCI-bus-based video systems. Specifically, when dealing with the VGA frame buffer at A000:0000 on PCI systems, PCI burst mode is usually not available. However, when working with a video card's frame buffer, linear burst mode is available, and performance can double (or more) during mass-data transfers.
Accessing the Linear Frame Buffer
Acquiring a pointer to a graphics system's linear frame buffer is a simple but lengthy process, requiring care since a misstep at any point renders the frame buffer invalid. The steps involved are:
- Getting VESA Super-VGA information, such as VBE revision number.
- Determining if the desired linear video mode is available.
- Creating a 48-bit far pointer to access the linear frame buffer.
Getting VBE Super-VGA Information
The function VBE_detect() first executes VESA function 0 (Get SuperVGA Information), which fills in a VGA info block. The VBE is accessed via the standard video interrupt 10h; however, AH is set to 4Fh so that the VBE knows to intercept the call, and AL is set to the VBE function number. After the call has been performed, the routine returns the VESA version as a BCD value. The video-mode list returned in the VGA info block must be copied into another buffer, because the VGA info table will be clobbered by any calls that use this area of memory (for example, a call to get information on a specific video mode).
Finding a Video Mode
In VBE 2.0, VESA has dropped the policy of introducing hardcoded video modes. Instead, you can query the hardware directly for a video mode with a certain set of attributes. The AvailableModes() function searches the video-mode list for those video modes that fit a specific application's criteria.
Get a Pointer to the Frame Buffer
The GetPtrToLFB() function is responsible for returning a 48-bit far pointer to the linear frame buffer. The process is somewhat lengthy but easy to understand. DPMI service 0 is first used to allocate a selector, as implemented by DPMI_allocSelector(). Note that this function immediately sets the selector's access rights to 32-bit page granular. Next, the newly allocated selector must have its base address set to the address of the frame buffer.
This gets a little sticky, because the physical address of the frame buffer (as given in the VBE_modeInfo structure) is not the same as the linear address the selector expects as a base address. It's therefore necessary to use DPMI service 0x800 to map the frame buffer's physical address into the processor's currently running linear address space. The function DPMI_mapPhysicalToLinear() performs this chore.
With linear address in hand, the selector's base address is set using DPMI service 7, as DPMI_setSelectorBase() demonstrates. The only thing left now is setting the selector's limit, which is done using DPMI function 8 (DPMI_setSelectorLimit()). LFBPROF sets the limit to 4 Mbytes, since this is the most memory any modern VESA-compatible video card will likely have. (Because we made the selector 32-bit page granular, the limit passed must be in 4K increments and set to the value of limit-1.)
At this point, a selector has been secured that maps directly into the video card's frame buffer. This selector forms the basis of a far pointer. The macro MK_FP() in dos.h (which is Watcom C++ specific) creates a 48-bit far pointer out of the selector. This pointer allows linear access to the frame buffer.
Accessing the Frame Buffer
The linear frame buffer can now be accessed either directly in assembly or using Watcom's far-memory routines (for example, _fmemcpy() and _fmemset()). The sample program, LFBPROF ("Line Frame Buffer Profiler"), uses Watcom inline assembly to gain access to the frame buffer, since we must guarantee that data is packed in 32-bit dwords as it goes across the bus. The functions LfbMemcpy() and LfbMemset() in LFBPROF.H are implemented as Watcom inline assembly and are guaranteed 32-bit memory copying and setting routines.
LFBPROF
LFBPROF implements everything we've discussed to this point, including banked frame-buffer access. LFBPROF is a video-card benchmarking program that tests the ability of the underlying hardware to handle system-to-video copies and frame- buffer clears in both banked and linear frame-buffer modes. (For more information on benchmarking frame buffers, see the accompanying text box entitled, "Frame Buffer Performance Metrics.") Listings One and Two contain the complete source to LFBPROF.C and LFBPROF.H, respectively. The compiler used was Watcom C/C++ 10.0a using the 32-bit flat model, with DOS4GW.EXE as the DOS extender. Because the VBE 2.0 is so new, few hardware manufacturers have implemented the spec in their ROMs, so SciTech Software's Universal VESA BIOS Extensions TSR (UniVBE) must be used as the VESA BIOS interface provider. This package is shareware and available at most DOS ftp sites, including ftp.scitechsoft.com, and on CompuServe GO IBMPRO.
LFBPROF takes two arguments on the command line, which are the resolution of the desired video mode; for example, to test 640x480, the command line is LFBPROF 640 480.
If no arguments are given, a list of available video modes is printed. Note that only 8-bit linear frame-buffer modes are tested, although it would be relatively straightforward to add support for 15-bit and higher modes to LFBPROF. Listing Three is a sample makefile that can be used to compile LFBPROF.
LFBPROF's main() is subdivided into three basic parts: initialization, benchmarking, and shutdown. Initialization is responsible for determining VBE 2.0 compliance, checking on the availability of the desired video mode, and initializing the graphics mode (which includes securing a selector to the linear frame buffer).
The benchmark tests a video card's frame-buffer clearing and setting speed using LFBPROF's LfbMemcpy() and LfbMemset() routines in both linear and banked modes. The benchmark runs for ten seconds so that any granularity in the system timer can be factored out.
Shutdown restores VGA mode 3 (80 column text mode) and finally computes and prints the results of the tests in both Mbytes/sec and frames/sec.
Frame-Buffer Performance Metrics
The ability to quantify frame-buffer performance is important because such performance is often directly proportional to the performance of a game or other type of graphics application. However, measuring frame-buffer performance accurately (and in a form where the results can be interpreted meaningfully) is tricky and often controversial.
Let's take as an example a very naive VGA frame-buffer performance test. Such a test would consist of repeatedly blitting out a 320x200, 8-bit, off-screen buffer to the VGA mode 0x13 real-mode address space from A000:0000 to A000:FFFF. Time would be measured using the clock() standard-library function, with enough test-loop iterations to factor out the fairly coarse granularity typical of PC system timers (18.2 ticks/sec). The inner loop of such a test would be straightforward, as Example 1(a) shows. This would seem to give a very good indication of a system's capability to update VRAM, but this isn't necessarily true. A benchmark attempts to determine not only the video card's peak speed, but also its likely real-world performance. Often, the two are not even close.
Sequential Transfer versus Random Access
Our sample test has several significant flaws when it comes to real-world performance measurement. The first, and most obvious, is that it only tests the speed of sequential writes to the video card from system RAM. This is fine for many applications, but for programs that do random writes to video, this type of measurement can be misleading. For example, video chips with interleaved memory access (S3 805i, Tseng Labs ET4000/w32i) typically have extremely fast sequential transfer rates because the interleaved RAM is optimized for just this type of action. Random writes on these boards, however, are not nearly as fast as their sequential write times might imply.
Additionally, sequential transfers usually trigger PCI burst-mode operation on PCI bus systems, which significantly increases transfer speed, but only during blits. This can easily lead to misinterpretation of transfer rates as measured by a naive benchmarking program, since burst transfer rates are significantly higher than random-access rates.
Also, the standard-library memcpy() function used in our naive benchmark isn't guaranteed to decompose the system-to-video copy into 32-bit read/writes. If memcpy() were implemented using MOVSB, for example, an 8-bit video card would likely have the same performance as a 32-bit video card (assuming techniques such as byte-merging were disabled in system-chipset setup).
Another deficiency is that this benchmark only tests VGA mode 0x13 performance. This does not necessarily scale accurately to other video modes, such as tweaked, planar, VGA 8-bit modes ("ModeX") or high-resolution VESA modes. There are many reasons for this. For starters, different chipsets behave differently depending on the video mode; some chipsets have poor planar performance but reasonable packed-mode performance. Another reason is that some video cards handle different video modes with different video chips; the Diamond Viper, for example. Video mode 0x13 is handled by a secondary VGA processor, such as an OAK or Weitek 5x86 standard VGA chip, whereas VESA mode 640x400x8 (used by Microsoft Flight Simulator 5) uses the Weitek P9000 accelerator natively. As a result, mode 0x13 on the Viper is extremely lackluster, yet the high-resolution VESA mode is almost breathtakingly fast. Also, the sheer size difference of the different video modes (mode 0x13 requires only 64,000 bytes, whereas a mode such as 1024x768x32-bit can consume over three million bytes) can greatly affect cache coherency.
Note that DRAM and VRAM boards do not perform the same in all video modes. In very-low-resolution modes like VGA mode 0x13, little time is required by the CRTC controller to refresh the display, leaving nearly all the DRAM bandwidth available for CPU access. However, in the very-high-resolution modes like 1280x1024x256, much more of the DRAM bandwidth is required by the CRTC controller; hence, frame-buffer performance will drop significantly. With boards based on dual-ported VRAM, the CPU and CRTC controller can both gain access to the memory at the same time; therefore, the performance generally does not degrade as the resolution increases (making VRAM boards popular for high-end CAD applications that need high-resolution video modes).
Cache Coherency
External (L2) cache coherency is extremely important in characterizing frame-buffer performance. Modern computer systems can have caches anywhere from nonexistent to 512K in size. This cache proves very important in real-world tests, but in an application-specific (and thus unpredictable) manner. In general, if the effect of a factor on an application's performance is unpredictable, then an attempt should be made to remove it from the benchmark. A good benchmarking program should try and minimize the effects of the L2 cache as much as possible. The naive benchmark presented earlier does not do this, and as a result will find significantly better performance on a system with a 256K cache than on one without any cache, solely because the source buffer will reside in the L2 cache for the duration of the program. To defeat cache coherency, two techniques can be used. The first is to simply step through multiple buffers. In Example 1(b), cycling through multiple buffers instead of using a single 64K buffer pretty well destroys cache coherency. This makes for lower benchmark figures, but they reflect real-world performance more accurately.
The second method of defeating cache coherency is to simply use a video mode larger than the system's external cache. A video mode such as 640x480x16bpp consumes 600K, well beyond the size of a typical L2 cache.
When measuring frame-buffer performance, most local buses will operate at a different speed, depending on the processor installed. For example, a PCI-bus Pentium/66 will generally blit faster than a PCI-bus Pentium/90 because the former clocks the PCI bus at 33 MHz (clock halved) and the latter, at 30 MHz (clock thirded). This can't be accounted for by the benchmark, but it should be noted when analyzing performance characteristics of different video cards measured in different systems.
Conclusion
With so many factors, it would seem nearly impossible to devise a single, ideal benchmark. However, a comprehensive benchmark isn't necessarily better than an accurate, informative one. The most important criterion for any measurement tool is that it specify clearly what is being measured, how it is being measured, and what applications will find the benchmark's data relevant. Accurate, comprehensive data is useless unless it is easy to understand and translates into relevant, meaningful results. LFBPROF does not claim to provide comprehensive data on a video card's performance; instead, it simply states the performance of a video card when using VBE 2.0's linear frame-buffer feature for clearing and system-to-video copying operations. Such performance is very important for games, but often absolutely irrelevant for other applications such as CAD or GUIs.
--B.H. and K.B.
Example 1: (a) Inner loop of test; (b) cycling through multiple buffers.
(a) for ( i = 0; i < NUM_ITERATIONS; i++ ) { memcpy( video, source_buffer, SRC_BUF_SIZE ); } (b) for ( i = 0; i < NUM_ITERATIONS; i++ ) { memcpy( video, source_buffer[i%NUM_SRC_BUFS], SRC_BUF_SIZE ); }
Listing One
/**************************************************************************** * VBE 2.0 Linear Framebuffer Profiler * By Kendall Bennett and Brian Hook * Filename: LFBPROF.C * Language: ANSI C * Environment: Watcom C/C++ 10.0a with DOS4GW * Description: Simple program to profile the speed of screen clearing * and full screen BitBlt operations using a VESA VBE 2.0 * linear framebuffer from 32 bit protected mode. * For simplicity, this program only supports 256 color * SuperVGA video modes that support a linear framebuffer. ****************************************************************************/ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <conio.h> #include <dos.h> #include "lfbprof.h" /*---------------------------- Global Variables ---------------------------*/ int VESABuf_len = 1024; /* Length of VESABuf */ int VESABuf_sel = 0; /* Selector for VESABuf */ int VESABuf_rseg; /* Real mode segment of VESABuf */ short modeList[50]; /* List of available VBE modes */ float clearsPerSec; /* Number of clears per second */ float clearsMbPerSec; /* Memory transfer for clears */ float bitBltsPerSec; /* Number of BitBlt's per second */ float bitBltsMbPerSec; /* Memory transfer for bitblt's */ int xres,yres; /* Video mode resolution */ int bytesperline; /* Bytes per scanline for mode */ long imageSize; /* Length of the video image */ char far *LFBPtr; /* Pointer to linear framebuffer */ /*------------------------- DPMI interface routines -----------------------*/ void DPMI_allocRealSeg(int size,int *sel,int *r_seg) /**************************************************************************** * Function: DPMI_allocRealSeg * Parameters: size - Size of memory block to allocate * sel - Place to return protected mode selector * r_seg - Place to return real mode segment * Description: Allocates a block of real mode memory using DPMI services. * This routine returns both a protected mode selector and * real mode segment for accessing the memory block. ****************************************************************************/ { union REGS r; r.w.ax = 0x100; /* DPMI allocate DOS memory */ r.w.bx = (size + 0xF) >> 4; /* number of paragraphs */ int386(0x31, &r, &r); if (r.w.cflag) FatalError("DPMI_allocRealSeg failed!"); *sel = r.w.dx; /* Protected mode selector */ *r_seg = r.w.ax; /* Real mode segment */ } void DPMI_freeRealSeg(unsigned sel) /**************************************************************************** * Function: DPMI_allocRealSeg * Parameters: sel - Protected mode selector of block to free * Description: Frees a block of real mode memory. ****************************************************************************/ { union REGS r; r.w.ax = 0x101; /* DPMI free DOS memory */ r.w.dx = sel; /* DX := selector from 0x100 */ int386(0x31, &r, &r); } typedef struct { long edi; long esi; long ebp; long reserved; long ebx; long edx; long ecx; long eax; short flags; short es,ds,fs,gs,ip,cs,sp,ss; } _RMREGS; #define IN(reg) rmregs.e##reg = in->x.reg #define OUT(reg) out->x.reg = rmregs.e##reg int DPMI_int86(int intno, RMREGS *in, RMREGS *out) /**************************************************************************** * Function: DPMI_int86 * Parameters: intno - Interrupt number to issue * in - Pointer to structure for input registers * out - Pointer to structure for output registers * Returns: Value returned by interrupt in AX * Description: Issues a real mode interrupt using DPMI services. ****************************************************************************/ { _RMREGS rmregs; union REGS r; struct SREGS sr; memset(&rmregs, 0, sizeof(rmregs)); IN(ax); IN(bx); IN(cx); IN(dx); IN(si); IN(di); segread(&sr); r.w.ax = 0x300; /* DPMI issue real interrupt */ r.h.bl = intno; r.h.bh = 0; r.w.cx = 0; sr.es = sr.ds; r.x.edi = (unsigned)&rmregs; int386x(0x31, &r, &r, &sr); /* Issue the interrupt */ OUT(ax); OUT(bx); OUT(cx); OUT(dx); OUT(si); OUT(di); out->x.cflag = rmregs.flags & 0x1; return out->x.ax; } int DPMI_int86x(int intno, RMREGS *in, RMREGS *out, RMSREGS *sregs) /**************************************************************************** * Function: DPMI_int86 * Parameters: intno - Interrupt number to issue * in - Pointer to structure for input registers * out - Pointer to structure for output registers * sregs - Values to load into segment registers * Returns: Value returned by interrupt in AX * Description: Issues a real mode interrupt using DPMI services. ****************************************************************************/ { _RMREGS rmregs; union REGS r; struct SREGS sr; memset(&rmregs, 0, sizeof(rmregs)); IN(ax); IN(bx); IN(cx); IN(dx); IN(si); IN(di); rmregs.es = sregs->es; rmregs.ds = sregs->ds; segread(&sr); r.w.ax = 0x300; /* DPMI issue real interrupt */ r.h.bl = intno; r.h.bh = 0; r.w.cx = 0; sr.es = sr.ds; r.x.edi = (unsigned)&rmregs; int386x(0x31, &r, &r, &sr); /* Issue the interrupt */ OUT(ax); OUT(bx); OUT(cx); OUT(dx); OUT(si); OUT(di); sregs->es = rmregs.es; sregs->cs = rmregs.cs; sregs->ss = rmregs.ss; sregs->ds = rmregs.ds; out->x.cflag = rmregs.flags & 0x1; return out->x.ax; } int DPMI_allocSelector(void) /**************************************************************************** * Function: DPMI_allocSelector * Returns: Newly allocated protected mode selector * Description: Allocates a new protected mode selector using DPMI * services. This selector has a base address and limit of 0. ****************************************************************************/ { int sel; union REGS r; r.w.ax = 0; /* DPMI allocate selector */ r.w.cx = 1; /* Allocate a single selector */ int386(0x31, &r, &r); if (r.x.cflag) FatalError("DPMI_allocSelector() failed!"); sel = r.w.ax; r.w.ax = 9; /* DPMI set access rights */ r.w.bx = sel; r.w.cx = 0x8092; /* 32 bit page granular */ int386(0x31, &r, &r); return sel; } long DPMI_mapPhysicalToLinear(long physAddr,long limit) /**************************************************************************** * Function: DPMI_mapPhysicalToLinear * Parameters: physAddr - Physical memory address to map * limit - Length-1 of physical memory region to map * Returns: Starting linear address for mapped memory * Description: Maps a section of physical memory into the linear address * space of a process using DPMI calls. Note that this linear * address cannot be used directly, but must be used as the * base address for a selector. ****************************************************************************/ { union REGS r; r.w.ax = 0x800; /* DPMI map physical to linear */ r.w.bx = physAddr >> 16; r.w.cx = physAddr & 0xFFFF; r.w.si = limit >> 16; r.w.di = limit & 0xFFFF; int386(0x31, &r, &r); if (r.x.cflag) FatalError("DPMI_mapPhysicalToLinear() failed!"); return ((long)r.w.bx << 16) + r.w.cx; } void DPMI_setSelectorBase(int sel,long linAddr) /**************************************************************************** * Function: DPMI_setSelectorBase * Parameters: sel - Selector to change base address for * linAddr - Linear address used for new base address * Description: Sets the base address for the specified selector. ****************************************************************************/ { union REGS r; r.w.ax = 7; /* DPMI set selector base address */ r.w.bx = sel; r.w.cx = linAddr >> 16; r.w.dx = linAddr & 0xFFFF; int386(0x31, &r, &r); if (r.x.cflag) FatalError("DPMI_setSelectorBase() failed!"); } void DPMI_setSelectorLimit(int sel,long limit) /**************************************************************************** * Function: DPMI_setSelectorLimit * Parameters: sel - Selector to change limit for * limit - Limit-1 for the selector * Description: Sets the memory limit for the specified selector. ****************************************************************************/ { union REGS r; r.w.ax = 8; /* DPMI set selector limit */ r.w.bx = sel; r.w.cx = limit >> 16; r.w.dx = limit & 0xFFFF; int386(0x31, &r, &r); if (r.x.cflag) FatalError("DPMI_setSelectorLimit() failed!"); } /*-------------------------- VBE Interface routines -----------------------*/ void FatalError(char *msg) { fprintf(stderr,"%s\n", msg); exit(1); } static void ExitVBEBuf(void) { DPMI_freeRealSeg(VESABuf_sel); } void VBE_initRMBuf(void) /**************************************************************************** * Function: VBE_initRMBuf * Description: Initialises the VBE transfer buffer in real mode memory. * This routine is called by the VESAVBE module every time * it needs to use the transfer buffer, so we simply allocate * it once and then return. ****************************************************************************/ { if (!VESABuf_sel) { DPMI_allocRealSeg(VESABuf_len, &VESABuf_sel, &VESABuf_rseg); atexit(ExitVBEBuf); } } void VBE_callESDI(RMREGS *regs, void *buffer, int size) /**************************************************************************** * Function: VBE_callESDI * Parameters: regs - Registers to load when calling VBE * buffer - Buffer to copy VBE info block to * size - Size of buffer to fill * Description: Calls the VESA VBE and passes in a buffer for the VBE to * store information in, which is then copied into the users * buffer space. This works in protected mode as the buffer * passed to the VESA VBE is allocated in conventional * memory, and is then copied into the users memory block. ****************************************************************************/ { RMSREGS sregs; VBE_initRMBuf(); sregs.es = VESABuf_rseg; regs->x.di = 0; _fmemcpy(MK_FP(VESABuf_sel,0),buffer,size); DPMI_int86x(0x10, regs, regs, &sregs); _fmemcpy(buffer,MK_FP(VESABuf_sel,0),size); } int VBE_detect(void) /**************************************************************************** * Function: VBE_detect * Parameters: vgaInfo - Place to store the VGA information block * Returns: VBE version number, or 0 if not detected. * Description: Detects if a VESA VBE is out there and functioning * correctly. If we detect a VBE interface we return the * VGAInfoBlock returned by the VBE and the VBE version number. ****************************************************************************/ { RMREGS regs; short *p1,*p2; VBE_vgaInfo vgaInfo; /* Put 'VBE2' into the signature area so that the VBE 2.0 BIOS knows * that we have passed a 512 byte extended block to it, and wish * the extended information to be filled in. */ strncpy(vgaInfo.VESASignature,"VBE2",4); /* Get the SuperVGA Information block */ regs.x.ax = 0x4F00; VBE_callESDI(®s, &vgaInfo, sizeof(VBE_vgaInfo)); if (regs.x.ax != 0x004F) return 0; if (strncmp(vgaInfo.VESASignature,"VESA",4) != 0) return 0; /* Now that we have detected a VBE interface, copy the list of available * video modes into our local buffer. We *must* copy this mode list, since * the VBE will build the mode list in the VBE_vgaInfo buffer that we have * passed, so the next call to the VBE will trash the list of modes. */ p1 = LfbMapRealPointer(vgaInfo.VideoModePtr); p2 = modeList; while (*p1 != -1) *p2++ = *p1++; *p2 = -1; return vgaInfo.VESAVersion; } int VBE_getModeInfo(int mode,VBE_modeInfo *modeInfo) /**************************************************************************** * Function: VBE_getModeInfo * Parameters: mode - VBE mode to get information for * modeInfo - Place to store VBE mode information * Returns: 1 on success, 0 if function failed. * Description: Obtains information about a specific video mode from the * VBE. You should use this function to find the video mode * you wish to set, as the new VBE 2.0 mode numbers may be * completely arbitrary. ****************************************************************************/ { RMREGS regs; regs.x.ax = 0x4F01; /* Get mode information */ regs.x.cx = mode; VBE_callESDI(®s, modeInfo, sizeof(VBE_modeInfo)); if (regs.x.ax != 0x004F) return 0; if ((modeInfo->ModeAttributes & vbeMdAvailable) == 0) return 0; return 1; } void VBE_setVideoMode(int mode) /**************************************************************************** * Function: VBE_setVideoMode * Parameters: mode - VBE mode number to initialise ****************************************************************************/ { RMREGS regs; regs.x.ax = 0x4F02; regs.x.bx = mode; DPMI_int86(0x10,®s,®s); } /*-------------------- Application specific routines ----------------------*/ void far *GetPtrToLFB(long physAddr) /**************************************************************************** * Function: GetPtrToLFB * Parameters: physAddr - Physical memory address of linear framebuffer * Returns: Far pointer to the linear framebuffer memory ****************************************************************************/ { int sel; long linAddr,limit = (4096 * 1024) - 1; sel = DPMI_allocSelector(); linAddr = DPMI_mapPhysicalToLinear(physAddr,limit); DPMI_setSelectorBase(sel,linAddr); DPMI_setSelectorLimit(sel,limit); return MK_FP(sel,0); } void AvailableModes(void) /**************************************************************************** * Function: AvailableModes * Description: Display a list of available LFB mode resolutions. ****************************************************************************/ { short *p; VBE_modeInfo modeInfo; printf("Usage: LFBPROF <xres> <yres>\n\n"); printf("Available 256 color video modes:\n"); for (p = modeList; *p != -1; p++) { if (VBE_getModeInfo(*p, &modeInfo)) { /* Filter out only 8 bit linear framebuffer modes */ if ((modeInfo.ModeAttributes & vbeMdLinear) == 0) continue; if (modeInfo.MemoryModel != vbeMemPK || modeInfo.BitsPerPixel != 8 || modeInfo.NumberOfPlanes != 1) continue; printf(" %4d x %4d %d bits per pixel\n", modeInfo.XResolution, modeInfo.YResolution, modeInfo.BitsPerPixel); } } exit(1); } void InitGraphics(int x,int y) /**************************************************************************** * Function: InitGraphics * Parameters: x,y - Requested video mode resolution * Description: Initialise the specified video mode. We search through * the list of available video modes for one that matches * the resolution and color depth are are looking for. ****************************************************************************/ { short *p; VBE_modeInfo modeInfo; for (p = modeList; *p != -1; p++) { if (VBE_getModeInfo(*p, &modeInfo)) { /* Filter out only 8 bit linear framebuffer modes */ if ((modeInfo.ModeAttributes & vbeMdLinear) == 0) continue; if (modeInfo.MemoryModel != vbeMemPK || modeInfo.BitsPerPixel != 8 || modeInfo.NumberOfPlanes != 1) continue; if (modeInfo.XResolution != x || modeInfo.YResolution != y) continue; xres = x; yres = y; bytesperline = modeInfo.BytesPerScanLine; imageSize = bytesperline * yres; VBE_setVideoMode(*p | vbeUseLFB); LFBPtr = GetPtrToLFB(modeInfo.PhysBasePtr); return; } } printf("Valid video mode not found\n"); exit(1); } void EndGraphics(void) /**************************************************************************** * Function: EndGraphics * Description: Restores text mode. ****************************************************************************/ { RMREGS regs; regs.x.ax = 0x3; DPMI_int86(0x10, ®s, ®s); } void ProfileMode(void) /**************************************************************************** * Function: ProfileMode * Description: Profiles framebuffer performance for simple screen clearing * and for copying from system memory to video memory (BitBlt). * This routine thrashes the CPU cache by cycling through * enough system memory buffers to invalidate the entire CPU * external cache before re-using the first memory buffer again. ****************************************************************************/ { int i,numClears,numBlts,maxImages; long startTicks,endTicks; void *image[10],*dst; /* Profile screen clearing operation */ startTicks = LfbGetTicks(); numClears = 0; while ((LfbGetTicks() - startTicks) < 182) LfbMemset(FP_SEG(LFBPtr),0,numClears++,imageSize); endTicks = LfbGetTicks(); clearsPerSec = numClears / ((endTicks - startTicks) * 0.054925); clearsMbPerSec = (clearsPerSec * imageSize) / 1048576.0; /* Profile system memory to video memory copies */ maxImages = ((512 * 1024U) / imageSize) + 2; for (i = 0; i < maxImages; i++) { image[i] = malloc(imageSize); if (image[i] == NULL) FatalError("Not enough memory to profile BitBlt!"); memset(image[i],i+1,imageSize); } startTicks = LfbGetTicks(); numBlts = 0; while ((LfbGetTicks() - startTicks) < 182) LfbMemcpy(FP_SEG(LFBPtr),0,image[numBlts++ % maxImages],imageSize); endTicks = LfbGetTicks(); bitBltsPerSec = numBlts / ((endTicks - startTicks) * 0.054925); bitBltsMbPerSec = (bitBltsPerSec * imageSize) / 1048576.0; } void main(int argc, char *argv[]) { if (VBE_detect() < 0x200) FatalError("This program requires VBE 2.0; Please install UniVBE 5.1."); if (argc != 3) AvailableModes(); /* Display available modes */ InitGraphics(atoi(argv[1]),atoi(argv[2])); /* Start graphics */ ProfileMode(); /* Profile the video mode */ EndGraphics(); /* Restore text mode */ printf("Profiling results for %dx%d 8 bits per pixel.\n",xres,yres); printf("%3.2f clears/s, %2.2f Mb/s\n", clearsPerSec, clearsMbPerSec); printf("%3.2f bitBlt/s, %2.2f Mb/s\n", bitBltsPerSec, bitBltsMbPerSec); }
Listing Two
/**************************************************************************** * VBE 2.0 Linear Framebuffer Profiler * By Kendall Bennett and Brian Hook * Filename: LFBPROF.H * Language: ANSI C * Environment: Watcom C/C++ 10.0a with DOS4GW * Description: Header file for the LFBPROF.C progam. ****************************************************************************/ #ifndef __LFBPROF_H #define __LFBPROF_H /*---------------------- Macros and type definitions ----------------------*/ #pragma pack(1) /* SuperVGA information block */ typedef struct { char VESASignature[4]; /* 'VESA' 4 byte signature */ short VESAVersion; /* VBE version number */ long OemStringPtr; /* Pointer to OEM string */ long Capabilities; /* Capabilities of video card */ long VideoModePtr; /* Pointer to supported modes */ short TotalMemory; /* Number of 64kb memory blocks */ /* VBE 2.0 extensions */ short OemSoftwareRev; /* OEM Software revision number */ long OemVendorNamePtr; /* Pointer to Vendor Name string */ long OemProductNamePtr; /* Pointer to Product Name string */ long OemProductRevPtr; /* Pointer to Product Revision str */ char reserved[222]; /* Pad to 256 byte block size */ char OemDATA[256]; /* Scratch pad for OEM data */ } VBE_vgaInfo; /* SuperVGA mode information block */ typedef struct { short ModeAttributes; /* Mode attributes */ char WinAAttributes; /* Window A attributes */ char WinBAttributes; /* Window B attributes */ short WinGranularity; /* Window granularity in k */ short WinSize; /* Window size in k */ short WinASegment; /* Window A segment */ short WinBSegment; /* Window B segment */ long WinFuncPtr; /* Pointer to window function */ short BytesPerScanLine; /* Bytes per scanline */ short XResolution; /* Horizontal resolution */ short YResolution; /* Vertical resolution */ char XCharSize; /* Character cell width */ char YCharSize; /* Character cell height */ char NumberOfPlanes; /* Number of memory planes */ char BitsPerPixel; /* Bits per pixel */ char NumberOfBanks; /* Number of CGA style banks */ char MemoryModel; /* Memory model type */ char BankSize; /* Size of CGA style banks */ char NumberOfImagePages; /* Number of images pages */ char res1; /* Reserved */ char RedMaskSize; /* Size of direct color red mask */ char RedFieldPosition; /* Bit posn of lsb of red mask */ char GreenMaskSize; /* Size of direct color green mask */ char GreenFieldPosition; /* Bit posn of lsb of green mask */ char BlueMaskSize; /* Size of direct color blue mask */ char BlueFieldPosition; /* Bit posn of lsb of blue mask */ char RsvdMaskSize; /* Size of direct color res mask */ char RsvdFieldPosition; /* Bit posn of lsb of res mask */ char DirectColorModeInfo; /* Direct color mode attributes */ /* VBE 2.0 extensions */ long PhysBasePtr; /* Physical address for linear buf */ long OffScreenMemOffset; /* Pointer to start of offscreen mem*/ short OffScreenMemSize; /* Amount of offscreen mem in 1K's */ char res2[206]; /* Pad to 256 byte block size */ } VBE_modeInfo; #define vbeMemPK 4 /* Packed Pixel memory model */ #define vbeUseLFB 0x4000 /* Enable linear framebuffer mode */ /* Flags for the mode attributes returned by VBE_getModeInfo. If * vbeMdNonBanked is set to 1 and vbeMdLinear is also set to 1, then only * the linear framebuffer mode is available. */ #define vbeMdAvailable 0x0001 /* Video mode is available */ #define vbeMdColorMode 0x0008 /* Mode is a color video mode */ #define vbeMdGraphMode 0x0010 /* Mode is a graphics mode */ #define vbeMdNonBanked 0x0040 /* Banked mode is not supported */ #define vbeMdLinear 0x0080 /* Linear mode supported */ /* Structures for issuing real mode interrupts with DPMI */ struct _RMWORDREGS { unsigned short ax, bx, cx, dx, si, di, cflag; }; struct _RMBYTEREGS { unsigned char al, ah, bl, bh, cl, ch, dl, dh; }; typedef union { struct _RMWORDREGS x; struct _RMBYTEREGS h; } RMREGS; typedef struct { unsigned short es; unsigned short cs; unsigned short ss; unsigned short ds; } RMSREGS; /* Inline assembler block fill/move routines */ void LfbMemset(int sel,int off,int c,int n); #pragma aux LfbMemset = \ "push es" \ "mov es,ax" \ "shr ecx,2" \ "xor eax,eax" \ "mov al,bl" \ "shl ebx,8" \ "or ax,bx" \ "mov ebx,eax" \ "shl ebx,16" \ "or eax,ebx" \ "rep stosd" \ "pop es" \ parm [eax] [edi] [ebx] [ecx]; void LfbMemcpy(int sel,int off,void *src,int n); #pragma aux LfbMemcpy = \ "push es" \ "mov es,ax" \ "shr ecx,2" \ "rep movsd" \ "pop es" \ parm [eax] [edi] [esi] [ecx]; /* Map a real mode pointer into address space */ #define LfbMapRealPointer(p) (void*)(((unsigned)(p) >> 12) + ((p) & 0xFFFF)) /* Get the current timer tick count */ #define LfbGetTicks() *((long*)0x46C) #pragma pack() #endif /* __LFBPROF_H */
Listing Three
# Very simple makefile for LFBPROF.C using Watcom C++ 10.0a with DOS4GW lfbprof.exe: lfbprof.c lfbprof.h wcl386 -zq -s -d2 lfbprof.c
Copyright © 1995, Dr. Dobb's Journal