Some time ago, I read with interest an article about optimized bitmap rotation. Like many other game programmers, I started programming many years ago in the I can spin more cubes than you Amiga demo scene. I read that article hoping to find a new algorithm to beat the one I picked up some 10 years ago only to find an algorithm that was slower and less flexible.
I spent a good while searching the Web with no avail and found only a few side notes from the Usenet archives. Maybe what I thought was once well-known has died along with the Amiga. Since Im a die-hard fan, I decided to write this article to relight a piece of useful code from the Amiga.
Algorithms
There are several methods to scale and rotate a bitmap but generally not at the same time. One well known method for rotation is the shear and transpose [1] method where the image is first sheared, and then transposed, to produce the rotated effect.
One way to rotate and scale at the same time is to scan the target bitmap passing each pixels position through a rotation matrix. For each destination pixel at position (x, y) on the target bitmap, this transformation produces the source position (u, v) that supplies that pixel from the original source bitmap, as illustrated in this pseudo code:
for(y=0; y<height; y++) { for(x=0; x<width; x++) { u = cos(-angle) * x * (1.0 / scale) + sin(-angle) * y * (1.0 / scale); v = -sin(-angle) * x * (1.0 / scale) + cos(-angle) * y * (1.0 / scale); dstArray[x, y] = srcArray[u, v]; } }
A faster version of this includes passing the end points of the line through the matrix and using Bresenhams algorithm to generate the source pixel position.
This is very similar to the method Ill present in this article, except I go a stage further and calculate the x and y displacements between each pixel in the rotated bitmap. I can use these displacements to generate two vectors, one between each pixel for the horizontal (duRow, dvRow)
in the destination image, and one for the vertical (duCol, dvCol)
. These two vectors are at right angles to each other:
duCol = sin(-angle) * (1.0 / scale) dvCol = cos(-angle) * (1.0 / scale) duRow = dvCol dvRow =-duCol
Using these two vectors, scaling and rotating the bitmap is simply a case of holding a second set of coordinates for the source pixel, scanning the target image row by row, copying the pixel at the source position to the linear target position, and then adding the horizontal vector. Once at the end of a row, the coordinate for the source pixel is reset and the vertical vector added. Because the vectors are common over the complete operation, I only need to calculate them once. The original coordinate for the source pixel needs to be rotated and scaled so that it points to the correct pixel for the first pixel of the target image. I do this by rotating the coordinates of the center point of the target image and subtracting it from the coordinates of the center of the source:
startingu = fSrcRotCX - (fDstRotCX * dvCol + fDstRotCY * duCol); startingv = fSrcRotCY - (fDstRotCX * dvRow + fDstRotCY * duRow);
Given these calculated values, I use the following pseudo code to render the rotated and scaled image:
rowu = startingu rowv = startingv for(y=0; y<height; y++) { u = rowu v = rowv for(x=0; x<width; x++) { dstArray[x, y] = srcArray[u, v]; u += duRow v += dvRow } rowu += duCol rowv += dvCol }
As you can see from the above code, I still have the problem of what to do with out of bound u,v coordinates. This really only happens when the destination bitmap is much larger than the source bitmap, or the scale is small less than 1.0. There are two possible solutions; the first is to clip to the source image dimensions using the following pseudo code:
if(u>0 && v>0 && u<dstW && v<dstH) dstArray[x, y] = srcArray[u, v]; else dstArray[x,y] = 0;
This introduces four more comparisons to the main loop and is a considerable slowdown. The other solution is to wrap the image. If the source image dimensions are a power of two, you can just use a logical AND to produce the wrapping:
dstArray[x, y] = srcArray[u1 & srcW-1, v1 & srcH-1];
Alternatively, you could use modulus. Alas, modulus has the problem of negative values that you can fix with explicit checking and inversion:
u1 = ((u < 0.0 ? (int)-u : (int)u) + srcW) % srcW; v1 = ((v < 0.0 ? (int)-v : (int)v) + srcH) % srcH; dstArray[x, y] = srcArray[u1, v1];
or by the less correct, but faster method of loading the u,v coordinate with a high multiple of its modulus before dividing:
u1 = u + (srcW << 8) % srcW v1 = v + (srcH << 8) % srcH dstArray[x, y] = srcArray[u1, v1];
Obviously, this is a slowdown in itself and can still lead to problems if the scale goes too small. The multiple of the modulus can be made dependent on operating parameters. Suffice it to say that in the high-speed, cutthroat world of Amiga demos, the only version to see the light of day was the wrapping power-of-two version with source and target dimensions of 256, allowing for automatic wrapping by accessing bytes.
Implementation
In the world of Windows programming, the only real way of getting to the
pixels of a bitmap is to create the bitmap as a DIB (device-independent bitmap)
using CreateDIBSection()
or to define the LR_CREATEDIBSECTION
flag in LoadImage()
and use GetObject()
to get the bits. To
keep the implementation simple and fast, I decided to only use 16-bit DIBs.
It allows me to access the DIB as an array of WORD
s, whereas 24-bit
DIBs offer uneven alignment, and 8-bit DIBs would require extra code for palette
handling. The DIB creation code is in the CDIB class shown in cdib.h
(ListingOne) and cdib.cpp
(Listing
Two).
Listing One: cdib.h Declaration of CDIB
/////////////////////////////////////////////////////////////////// class CDIB { public: CDIB(); ~CDIB(); bool Create(HDC hdcSrc, int iSrcX, int iSrcY, int iWidth, int iHeight); void Release(); // Varibles HDC m_hdc; // HDC of the DIB HBITMAP m_hbm; // HBITMAP of the DIB HBITMAP m_hbmOld; // old HBITMAP from the hdc WORD* m_pSrcBits; // pointer to DIB pixel array int m_iWidth; // Width of the DIB int m_iHeight; // Height of the DIB int m_iSWidth; // Storage Width (in bytes) }; /* End of File */
Listing Two: cdib.cpp Definition of CDIB
////////////////////////////////////////////////////////////////// // CDIB - Small wrapper class to handle DIB`s #include <windows.h> #include "cdib.h" ////////////////////////////////////////////////////////////////// CDIB::CDIB() { m_hdc = 0; m_hbm = 0; m_hbmOld = 0; m_pSrcBits = NULL; m_iWidth = 0; m_iHeight = 0; m_iSWidth = 0; } ////////////////////////////////////////////////////////////////// CDIB::~CDIB() { Release(); } ////////////////////////////////////////////////////////////////// // Takes DC handle, creates a DIB from given location and size bool CDIB::Create(HDC hdcSrc, int iSrcX, int iSrcY, int iWidth, int iHeight) { // Release the old DIB Release(); // fill in the BITMAPINFO structure BITMAPINFO bmi; ZeroMemory(&bmi, sizeof(BITMAPINFO)); bmi.bmiHeader.biSize = sizeof(BITMAPINFOHEADER); bmi.bmiHeader.biWidth = iWidth; bmi.bmiHeader.biHeight = iHeight; bmi.bmiHeader.biPlanes = 1; bmi.bmiHeader.biBitCount = 16; bmi.bmiHeader.biCompression = BI_RGB; bmi.bmiHeader.biXPelsPerMeter = 72; bmi.bmiHeader.biYPelsPerMeter = 72; m_iWidth = iWidth; m_iHeight = iHeight; // Each line of the DIB is always quad aligned. m_iSWidth = (((iWidth*sizeof(WORD)) + 3) & ~3); // Get hdc of screen for CreateDIBSection and create the DIB HDC hdcScreen = GetDC(NULL); m_hbm = CreateDIBSection( hdcScreen, &bmi, DIB_RGB_COLORS, (void**)&m_pSrcBits, NULL, 0); // Create and select the bitmap into a DC m_hdc = CreateCompatibleDC(hdcScreen); m_hbmOld = (HBITMAP)SelectObject(m_hdc, m_hbm); // Copy the original image into the DIB BitBlt(m_hdc, 0, 0, iWidth, iHeight, hdcSrc, iSrcX, iSrcY, SRCCOPY); // all GDI functions must be flushed before we can use the DIB GdiFlush(); ReleaseDC(NULL, hdcScreen); return true; } ////////////////////////////////////////////////////////////////// void CDIB::Release() { if(m_hdc) { if(m_hbmOld) { SelectObject(m_hdc, m_hbmOld); } if(m_hbm) { DeleteObject(m_hbm); } DeleteDC(m_hdc); } m_hdc = NULL; m_hbm = NULL; m_hbmOld = NULL; m_pSrcBits = NULL; m_iWidth = 0; m_iHeight = 0; } ////////////////////////////////////////////////////////////////// //End of File
The example code uses two 16-bit DIBs, one for the source image and one for the window image. The window DIB is recreated each time the window receives a WM_SIZE
message, to keep the window DIB the same size as the window. The example uses WM_TIMER
messages to pass the rotate function the pixels of both the source and window DIBs to perform the rotation, then uses BitBlt()
to copy the window DIB to the window.
Another option would have been to use DirectX to get direct access to the screen display, thereby cutting out a render stage. However, for demonstration purposes, this would have overcomplicated the example.
I provide three versions of the rotate routine in rotate.h
(Listing
Three) and rotate.cpp
(Listing Four). Rotate()
is the slowest but supports any sized source image (by using modulus). FastRotate()
requires the source image to be of a power of two. Finally, RotateWithClip()
clips the u,v of the source image to the limits of the source dimensions.
This removes the need for wrapping the u,v coordinate. main.cpp
(Listing
Five) demonstrates how to use these functions.
Listing Three: rotate.h Interface to rotation functions
/////////////////////////////////////////////////////////////////// void Rotate( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstRotCenterX, float fDstRotCenterY, float fSrcRotCenterX, float fSrcRotCenterY, float fAngle, float fScale); /////////////////////////////////////////////////////////////////// void FastRotate( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstRotCenterX, float fDstRotCenterY, float fSrcRotCenterX, float fSrcRotCenterY, float fAngle, float fScale); /////////////////////////////////////////////////////////////////// void RotateWithClip( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstRotCenterX, float fDstRotCenterY, float fSrcRotCenterX, float fSrcRotCenterY, float fAngle, float fScale); /* End of File */
Listing Four: rotate.cpp Implementation of rotation functions
#include <windows.h> #include <math.h> ////////////////////////////////////////////////////////////////// // Rotate - wrapping version // This version takes any dimension source bitmap and wraps. ////////////////////////////////////////////////////////////////// void Rotate( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstCX, float fDstCY, float fSrcCX, float fSrcCY, float fAngle, float fScale) { srcDelta /= sizeof(WORD); dstDelta /= sizeof(WORD); float duCol = (float)sin(-fAngle) * (1.0f / fScale); float dvCol = (float)cos(-fAngle) * (1.0f / fScale); float duRow = dvCol; float dvRow = -duCol; float startingu = fSrcCX - (fDstCX * dvCol + fDstCY * duCol); float startingv = fSrcCY - (fDstCX * dvRow + fDstCY * duRow); float rowu = startingu; float rowv = startingv; for(int y = 0; y < dstH; y++) { float u = rowu; float v = rowv; WORD *pDst = pDstBase + (dstDelta * y); for(int x = 0; x < dstW ; x++) { int sx = ((u < 0.0 ? (int)-u : (int)u) + srcW) % srcW; int sy = ((v < 0.0 ? (int)-v : (int)v) + srcH) % srcH; WORD *pSrc = pSrcBase + sx + (sy * srcDelta); *pDst++ = *pSrc++; u += duRow; v += dvRow; } rowu += duCol; rowv += dvCol; } } ////////////////////////////////////////////////////////////////// // FastRotate - wrapping version // // This version assumes the dimensions of the source image to be a // power of two. For none-power2 dimensions, use Rotate // ////////////////////////////////////////////////////////////////// void FastRotate( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstCX, float fDstCY, float fSrcCX, float fSrcCY, float fAngle, float fScale) { srcDelta /= sizeof(WORD); dstDelta /= sizeof(WORD); float duCol = (float)sin(-fAngle) * (1.0f / fScale); float dvCol = (float)cos(-fAngle) * (1.0f / fScale); float duRow = dvCol; float dvRow = -duCol; float startingu = fSrcCX - (fDstCX * dvCol + fDstCY * duCol); float startingv = fSrcCY - (fDstCX * dvRow + fDstCY * duRow); float rowu = startingu; float rowv = startingv; for(int y = 0; y < dstH; y++) { float u = rowu; float v = rowv; WORD *pDst = pDstBase + (dstDelta * y); for(int x = 0; x < dstW ; x++) { WORD *pSrc = pSrcBase + (((int)u) & srcW-1) + ((((int)v) & srcH-1) * srcDelta ); *pDst++ = *pSrc++; u += duRow; v += dvRow; } rowu += duCol; rowv += dvCol; } } ////////////////////////////////////////////////////////////////// // Rotate - nowrapping clipping version // // this version does not have the limitation of power of 2 // dimensions and will clip the source image instead of wrapping. ////////////////////////////////////////////////////////////////// void RotateWithClip( WORD *pDstBase, int dstW, int dstH, int dstDelta, WORD *pSrcBase, int srcW, int srcH, int srcDelta, float fDstCX, float fDstCY, float fSrcCX, float fSrcCY, float fAngle, float fScale) { srcDelta /= sizeof(WORD); dstDelta /= sizeof(WORD); float duCol = (float)sin(-fAngle) * (1.0f / fScale); float dvCol = (float)cos(-fAngle) * (1.0f / fScale); float duRow = dvCol; float dvRow = -duCol; float startingu = fSrcCX - (fDstCX * dvCol + fDstCY * duCol); float startingv = fSrcCY - (fDstCX * dvRow + fDstCY * duRow); float rowu = startingu; float rowv = startingv; for(int y = 0; y < dstH; y++) { float u = rowu; float v = rowv; WORD *pDst = pDstBase + (dstDelta * y); for(int x = 0; x < dstW ; x++) { if(u>0 && v>0 && u<srcW && v<srcH) { WORD *pSrc = pSrcBase + (int)u + ((int)v * srcDelta); *pDst++ = *pSrc++; } else { *pDst++ = 0; } u += duRow; v += dvRow; } rowu += duCol; rowv += dvCol; } } ////////////////////////////////////////////////////////////////// //End of File
Listing Five: main.cpp Demonstration program
/* ************************************************************** ** NAME: Fast Bitmap Rotation and Scaling ** AUTHOR: Steven M Mortimer ** COMPILER: Visual C++ V6.0 Enterprise SP3 ** Target Platform: Win32 ** RIGHTS: Stevem Mortimer ** Date: 18th Jan 2000 ************************************************************** */ #include <windows.h> #include <windowsx.h> #include <stdio.h> #include "cdib.h" #include "rotate.h" ///////////////////////////////////////////////////////////////// // defines #define _MINSCALE 0.4f #define _MAXSCALE 5.0f #define SZIMAGE "test1.bmp" ///////////////////////////////////////////////////////////////// // Globals CDIB* gDibSrc = NULL; CDIB* gDibDst = NULL; float gdScale = _MAXSCALE; float gdAngle = 0.0f; float gdScaleDir = 0.1f; double gdTicksPerSec = 0.0; bool gbTimeFunction = false; ///////////////////////////////////////////////////////////////// // Function Prototypes LRESULT CALLBACK WndProc (HWND, UINT, WPARAM, LPARAM) ; ///////////////////////////////////////////////////////////////// // WinMain int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR szCmdLine, int iCmdShow) { HWND hwnd; MSG msg; // Register the window class WNDCLASS wndclass; ZeroMemory(&wndclass, sizeof(WNDCLASS)); wndclass.style = CS_HREDRAW | CS_VREDRAW; wndclass.lpfnWndProc = WndProc; wndclass.hInstance = hInstance; wndclass.hCursor = LoadCursor(NULL, IDC_ARROW); wndclass.lpszClassName = __FILE__; RegisterClass(&wndclass); // Check if we can use the high performace counter // for timing the rotation function LARGE_INTEGER perfreq; if(QueryPerformanceFrequency(&perfreq)) { gdTicksPerSec = (double)perfreq.LowPart; gbTimeFunction = true; } else { MessageBox(NULL, "High resolution timer not available", "Warning", MB_OK); } // Create our source and destination DIB`s gDibSrc = new CDIB(); gDibDst = new CDIB(); if(gDibSrc && gDibDst) { // Create Window hwnd = CreateWindow( __FILE__, "Fast Bitmap Rotation", WS_OVERLAPPEDWINDOW, CW_USEDEFAULT, CW_USEDEFAULT, 400, 400, NULL, NULL, hInstance, NULL); ShowWindow (hwnd, iCmdShow) ; UpdateWindow (hwnd) ; while (GetMessage (&msg, NULL, 0, 0)) { TranslateMessage (&msg) ; DispatchMessage (&msg) ; } // cleanup DIB`s delete gDibSrc; delete gDibDst; } return msg.wParam; } ///////////////////////////////////////////////////////////////// double GetTimer() { if(gbTimeFunction) { LARGE_INTEGER time; QueryPerformanceCounter(&time); return (double)time.QuadPart / gdTicksPerSec; } return 0.0; } ///////////////////////////////////////////////////////////////// void Update(HDC hdc) { double dStartT = GetTimer(); // Call Rotate routine, using center of the window and the // center of the source image as the points to rotate around FastRotate( gDibDst->m_pSrcBits, gDibDst->m_iWidth, gDibDst->m_iHeight, gDibDst->m_iSWidth, gDibSrc->m_pSrcBits, gDibSrc->m_iWidth, gDibSrc->m_iHeight, gDibSrc->m_iSWidth, (float)gDibDst->m_iWidth/2, (float)gDibDst->m_iHeight/2, (float)gDibSrc->m_iWidth/2, (float)gDibSrc->m_iHeight/2, gdAngle, gdScale); // Change direction of the scale if(gdScale <= _MINSCALE || gdScale >= _MAXSCALE) { gdScaleDir *= -1.0; } // Update angle and scale gdScale += gdScaleDir; gdAngle += 0.02f; double dUpdateT = GetTimer(); // Copy our rotated image to the screen BitBlt(hdc, 0, 0, gDibDst->m_iWidth, gDibDst->m_iHeight, gDibDst->m_hdc, 0, 0, SRCCOPY); double dRenderT = GetTimer(); // Print function timing satistics if(gbTimeFunction) { char szBuffer[256]; TextOut(hdc, 5, 5, szBuffer, sprintf(szBuffer, "Update took %3.6f (~%3.2ffps)", dUpdateT-dStartT, 1.0 / (dUpdateT-dStartT))); TextOut(hdc, 5, 20, szBuffer, sprintf(szBuffer, "Render took %3.6f (~%3.2ffps)", dRenderT-dUpdateT, 1.0 / (dRenderT-dUpdateT))); } } //////////////////////////////////////////////////////////////// BOOL OnCreate(HWND hwnd, CREATESTRUCT FAR* lpCreateStruct) { BOOL bSuccess = FALSE; // Load test bitmap HBITMAP hbm = (HBITMAP)LoadImage(NULL, SZIMAGE, IMAGE_BITMAP, 0, 0, LR_LOADFROMFILE); if(hbm) { // Map the bitmap into a dc HDC hdc = CreateCompatibleDC(NULL); HBITMAP hbmOld = (HBITMAP)SelectObject(hdc, hbm); // Get info about this bitmap BITMAP bm; if(GetObject(hbm, sizeof(BITMAP), &bm) != 0) { // Convert the bitmap into DIB of known colour depth if(gDibSrc->Create(hdc, 0, 0, bm.bmWidth, bm.bmHeight)) { // Start the update timer SetTimer(hwnd, 0, 100, NULL); bSuccess = TRUE; } // cleanup hdc SelectObject(hdc, hbmOld); DeleteDC(hdc); } // delete the loaded image DeleteObject(hbm); } else { char szError[512]; wsprintf(szError, "Error loading image %s", SZIMAGE); MessageBox(hwnd, szError, "oops", MB_OK); } return bSuccess; } ///////////////////////////////////////////////////////////////// void OnSize(HWND hwnd, UINT state, int x, int y) { // Recreate the window DIB to match the size of the window gDibDst->Create(NULL, 0, 0, x, y); } ///////////////////////////////////////////////////////////////// BOOL OnEraseBkGnd(HWND hwnd, HDC hdc) { // clearing the bk results in tears. return TRUE; } ///////////////////////////////////////////////////////////////// void OnPaint(HWND hwnd) { HDC hdc; PAINTSTRUCT ps; hdc = BeginPaint (hwnd, &ps); Update(hdc); EndPaint (hwnd, &ps) ; } ///////////////////////////////////////////////////////////////// BOOL OnTimer(HWND hwnd, UINT id) { HDC hdc = GetDC(hwnd); Update(hdc); ReleaseDC(hwnd, hdc); return TRUE; } ///////////////////////////////////////////////////////////////// void OnDestroy(HWND hwnd) { PostQuitMessage(0); } ///////////////////////////////////////////////////////////////// LRESULT CALLBACK WndProc(HWND hwnd, UINT iMsg, WPARAM wParam, LPARAM lParam) { switch (iMsg) { HANDLE_MSG( hwnd, WM_CREATE, OnCreate ); HANDLE_MSG( hwnd, WM_SIZE, OnSize ); HANDLE_MSG( hwnd, WM_PAINT, OnPaint ); HANDLE_MSG( hwnd, WM_ERASEBKGND, OnEraseBkGnd ); HANDLE_MSG( hwnd, WM_TIMER, OnTimer ); HANDLE_MSG( hwnd, WM_DESTROY, OnDestroy ); } return DefWindowProc (hwnd, iMsg, wParam, lParam) ; } ///////////////////////////////////////////////////////////////// //End of File
Optimizations
Although reasonably quick, the example code is far from optimal. However, to increase its speed, its necessary to start optimizing for specific cases. Most optimizations come from restricting the source image size to a specific power of two. An optimization to consider is using fixed-point integers as the u,v coordinate and row and column deltas. A fixed-point integer is an integer that is logically shifted upwards, allowing the bottom bits to represent the fractional part of the number. This would cut out float
to int
conversions, and you could use the size of the source image as the shift component. This means that simply using a logical AND to mask off the integer part of the fixed-point integer would leave the u,v coordinates.
Conclusion
This procedure has come a long way since I first hacked the 68000 assembler out of someone elses Amiga demo some years ago. Sadly, I cannot remember who made the demo and therefore cannot give credit where credit is due thanks, whoever you are. Since finding the routine, Ive used it in games, in commercial screen savers, and in numerous applications. Hopefully, another generation of programmers will find it as useful as I have.
Note
[1] See Computer Graphics: Principles and Practice, Foley, Van dam et al.
Steven Mortimer has been working for Argonaut Software as a games developer for the past three years. He previously was a development manager at Aptecsoft, a small company, writing screensavers and multimedia software. He can be reached at [email protected].
The Shear and Transpose Method
The idea of the shear and transpose method of bitmap rotation is to first take a bitmap and compose the shear and transpose functions to produced the desired rotation.
First the bitmap is sheared from:
a----b | | | | c----d
to the given x component of the angle as such:
a----b \ \ > \ \ c----d
The bitmap is then transposed:
b / \ / \ a d \ / \ / c
Transformations around the diagonals can be used to increase the range of rotation to the full 360 degrees, however this method requires several passes in order to achieve the final rotation.
S.M.