Thiadmer develops computer animation software for his company CompuPhase. He can be contacted at http://www.compuphase.com/.
Image scaling has been the subject of graphics research for quite some time. While some of this research focuses on achieving the best image quality, algorithms that produce high-quality images are usually (and unfortunately) slow. Fast algorithms, on the other hand, are usually based on nearest-neighbor sampling (also called "point sampling") and lack quality. In this article, I present a compromise an algorithm for real-time scaling of bitmapped graphics that delivers more than adequate quality.
Real-time scaling is particularly relevant in animation, where (movie) frames must be scaled from the encoded size to the display size many times per second. Likewise, quick scaling is useful where large sets of sprites must be scaled on the fly each one to a different size, perhaps to reflect a different view point.
The algorithm presented here, which I call "smooth Bresenham," can best be described as a nearest-neighbor interpolation on a zoomed grid using a Bresenham algorithm. The algorithm is quick, produces a quality equivalent to that of linear interpolation, and can zoom up and down. On the downside, it is only suitable for a zoom factor within a fairly small range; say from 67 to 200 percent. To offset this, I propose the use of the smooth Bresenham algorithm in combination with a MIP-mapping scheme.
While many scaling algorithms basically work on single-channel image data (gray-scale images) and process the three channels of color images independently, my approach is to scale color images in a single pass without unpacking/repacking pixels into separate channels. Apart from the increased performance, this approach also makes it feasible to smoothly resize palette-indexed images in real time.
A Bresenham Background
J.E. Bresenham published an algorithm in 1965 to approximate polynomial functions on a grid of discrete values. In computer graphics, the polynomial functions typically refer to lines and elliptical curves, and the pixels of the display form the grid. Bresenham's algorithm is based on two concepts:
- Rewriting the equation so that it expresses the difference between one outcome and the next.
- Keeping the (accumulated) fractional part of the outcome in a separate variable and updating the integral part as soon as the fractional part exceeds unity.
For example, the general equation of a line is y=ax+b. If you fix (x0,y0) at some arbitrary location and decide that xi+1=xi+1 so that you step one pixel horizontally at each iteration, then yi+1=yi+a. The value of a typically has a fractional part, so you introduce another symbol to keep the accumulation of these fractional parts, yielding:
yi+1=yi+int(a)
Di+1=Di+frac(a)
with one additional (overflow) rule: If D exceeds 1.0, you increase y by 1 and subtract 1.0 from D.
You can avoid floating-point arithmetic by using a scaled integer for frac(a). For example, if a=y/x, where x and y are integers (and assuming x>y), you could replace D by E=xD and state Ei+1=Ei+y. The overflow rule must be scaled accordingly, of course.
These reformulations of polynomials are widely used to draw lines, circles, and ellipses, but the same algorithm can be used in resampling (scaling an image up or down, rotating an image) or other areas where sampling on a discrete grid is needed. For example, scaling a horizontal scan line the Bresenham way can be done using Listing One.
When enlarging a picture, IntPart is always zero, and when shrinking an image, IntPart is only nonzero when shrinking to less than half the original size. Also, if IntPart is zero, FractPart is just SrcWidth. (These observations are useful when you restrict the zoom range used with the smooth Bresenham algorithm.)
Smooth Scaling with Bresenham
When zooming with Bresenham, pixels are picked up from discrete positions in the source image and placed on discrete positions in the destination image. This is known as "nearest-neighbor sampling" when magnifying, pixels from the source image are replicated, when minifying (scaling down), pixels from the source image are dropped. Both pixel replication and pixel dropping give artifacts in the destination image. For accurate resampling, pixels should be read from fractional positions in the source image and cover a fractional area rather than just one pixel. In practice, the size of a pixel is just one pixel and the coordinates of a pixel are confined to integer values in both the source and the destination images; there is no pixel at coordinate pair (3.12, 6.41). To improve quality, many resampling techniques approximate the value of a source pixel at a fractional coordinate pair by taking a weighted average of a cluster of source pixels around that particular fractional coordinate pair. The diverse resampling algorithms differ mostly in how many pixels from the source image they consider and how they weigh these values.
As you can see in Listing One, the error accumulator in the Bresenham algorithm serves to adjust the source pointer on an overflow. Tim Kientzle observed in "Scaling Bitmaps With Bresenham" (C/C++ User's Journal, October 1995) that the error accumulator can, at the same time, function as a weighting criterion for interpolating a destination pixel between two neighboring source pixels. Unfortunately, the algorithm requires the following for each color channel: one multiplication per source pixel, plus another multiplication and one division per destination pixel. For a color image, you need to process three channels. I am focusing on multiplications and divisions because they are still relatively slow CPU instructions. If your images are not already in 24-bit RGB format, the algorithm requires you to convert each pixel to RGB before calculating the weighted average and convert it back to the proper pixel format afterwards.
Smooth Bresenham is a lightweight alternative a compromise between the linear interpolation (which is what Tim Kienztle's algorithm does) and the coarse Bresenham scaling. The algorithm sets each destination pixel to either the value of the closest pixel or the (unweighted) average of the two neighboring pixels. The decision on whether to replicate or average depends on the position of the destination pixel relative to the grid of the source pixels. If the destination pixel is on top of (or close to) a source pixel, that source pixel is replicated. If the destination pixel is closer to the midpoint between two source pixels, those two source pixels are averaged.
Listing Two is the modified scan line scaling algorithm. There are now two criteria for the accumulated error: When it exceeds the midpoint, it must start calculating the average between two neighboring pixels; when it exceeds unity, the source position must be adjusted.
Another change is that the loop stops one pixel before the final destination pixel is reached if the scan line is enlarged. When processing the last destination pixel on a scan line, the source pointer arrives at the last pixel in the source scan line. If, at that point, the algorithm decides to average pixels, it attempts to average the last source pixel with the pixel one beyond the last. To avoid this error, the algorithm always replicates the last pixel on a scan line. This works for enlarging up to two times; for a larger zoom factor, you need to end the loop earlier still. Of course, you could check for an overflow of the source pointer inside the loop, but for performance, I've kept a maximum of code outside the inner loop.
A restricted zoom range simplifies the code. Again, the IntPart and FractPart variables (Listing One) are not needed when the zoom factor is known to be higher than 0.5. Consequently, I removed them from Listing Two in the smooth Bresenham scaling routine. You can create a smooth Bresenham scaling routine that does not have any limits on the scaling factor: Compare Listing Two with the template-based smooth Bresenham (available electronically; see "Resource Center," page 5).
In contrast to the coarse Bresenham scaling routine, averaging Bresenham scaling needs to know the pixel format of the data it zooms. This is hidden in the average() function; Listing Two assumes 8-bit gray-scale pixels. In the average() function, I use a shift-right operator rather than a division by two to make it explicit that the algorithm requires only addition, subtraction, and bit shifting. In real production code, I would write a division by two and trust that the compiler would optimize that particular integer division to a bit shift. (In truth, I would probably optimize the entire inner loop of the scaling function in assembler myself.)
The merit of the proposed algorithm is that a standard (unweighted) average is simpler to calculate than a weighted average, not only for gray scale but especially for other pixel formats. Simplicity translates to increased performance here.
Another way of seeing this algorithm is that I put imaginary source pixels between each pair of actual source pixels, and the colors of these imaginary pixels are averaged from the neighboring actual pixels. This creates a virtual grid with twice the resolution, and I am applying the coarse Bresenham scaling algorithm on this new virtual grid.
Figures 1 and 2 show that smooth Bresenham is not as smooth as bilinear or bicubic interpolation, but far better than coarse Bresenham scaling, while costing only little more than coarse Bresenham scaling. In particular, the most objectionable properties of nearest-neighbor interpolation dropped pixels and excessive jaggies due to pixel splatting are absent from smooth Bresenham scaling, provided that the zoom range stays within the range 2/3 to 2. While the bilinearly interpolated image is smoother, it is actually blurred. In some areas of Figures 1 and 2, smooth Bresenham looks better than bilinear interpolation, which is why I claim that the quality of smooth Bresenham is equivalent to linear interpolation.
The result of bicubic interpolation depends on the particular polynomial that is used. The linear interpolation and cubic interpolation examples in Figures 1 and 2 were produced with Adobe Photoshop 5.5. Photoshop's bicubic interpolation filter has an implicit sharpening effect. This may augment the perceived quality of the interpolation process, but creates artifacts at (steep) edges, as Figure 3 shows. To emphasize the effect, the bottom row shows two representative cut-outs of the pictures zoomed by 400 percent. Because of these artifacts, I regard the sharpening effect as a disadvantage of Photoshop's bicubic interpolator, rather than an asset.
For small scaling factors, the interpolating filters cause the output picture to be blurred. This is apparent in Figure 3, notwithstanding the implied sharpening of the bicubic interpolation polynomial used. For these small scaling factors, you can reduce the blurring effect by snapping the target pixel to the nearest source pixel if it is close enough, and interpolating it otherwise. A blend between nearest neighbor and interpolation, so to speak. With the smooth Bresenham algorithm, you achieve this by altering the midpoint criterion. In the ScaleLineAvg() function, Listing Two, the local variable Mid is set to TgtWidth/2; increasing its value makes the target pixels snap earlier to source pixels (when Mid exceeds the value of TgtWidth, smooth Bresenham reduces to nearest neighbor).
Scaling in Two Dimensions
The routines presented so far scale a single scan line horizontally and are easy to adapt to scale column-wise. Then, by first looping over all raster lines and consecutively looping over all raster columns of the image, you can scale it both horizontally and vertically.
From the standpoint of performance, a better approach is to combine horizontal and vertical scaling in a single pass. To this end, the vertical scaling routine processes complete scan lines and it calls the horizontal scaling routine to scale one scan line. The coarse 2D scaling routine in Listing Three is similar to the horizontal scan line scaling routine. I have made one optimization: If the source pointer does not change in an iteration, rather than scaling the same source scan line again, I just copy the result of the previous iteration.
Smooth scaling in two dimensions is a bit more involved because you must also average two scan lines. The smooth Bresenham scaling needs a read-ahead of one one pixel in the horizontal case and one smoothly scaled scan line in the vertical case. Since you also want to avoid scaling the same scan line twice, it is easiest to use a second temporary scan line buffer. In fact, the routine is getting much more convenient to create if you use two of those temporary buffers.
The smooth 2D scaling algorithm is available electronically; see "Resource Center," page 5. Keep two things in mind when reading through this routine:
- The routine calls malloc() for two buffers but lacks any error checking. Use any way to allocate memory and any way to handle errors that suits you.
- In general, I care less about optimizing this routine than I did for the scan line scaling routine because the latter routine contains the inner loop (performance bottlenecks are typically in inner loops). The exception is the for loop that averages two scan lines. That for loop is also an inner loop so, in production code, keep an eye on it and check whether it benefits from hand optimization.
The special cases for the two scan line caches make the code fairly long and tricky. If you need to scale multiple pixel formats, it is probably a good idea to create optimized versions for the smooth scan line scaling routines (one for each pixel format) and a single universal smooth vertical scaling function. The vertical scaling function then invokes the appropriate scan line scaling function through function pointers, polymorphism, or whatever other technique you have at your disposal. In the example code available electronically, I have implemented the scaling routines as template functions for convenience.
In the routines presented here, one horizontal scan line from the source image is mapped to a horizontal scan line in the (scaled) destination image. The idea is more generally applicable, however: You could walk over the source image along a skewed line and write the output pixels on a horizontal scan line in the destination. This results in combined rotation and scaling. The only new issue to solve is how to walk along a skewed line over a discrete grid formed by the pixels of the source image. If you have followed the article this far, you know how to solve that Bresenham.
MIP-Mapping
The quality of the smooth Bresenham scaling algorithm quickly deteriorates when the zoom factor drops below 2/3 or raises beyond 2. Within that range, every source pixel is represented, fully or partially, in the destination image and no source pixel is replicated twice in exactly the same way (a source pixel may attribute to two destination pixels, but with different weights). Outside that range, some pixels quite noticeably get dropped or replicated.
There is another well-known technique for scaling images (with a better quality than nearest neighbor) that has low complexity MIP-mapping. Originally, a MIP-map is a one-channel image split into four equally sized quadrants. Three of the quadrants each hold one channel of the original color image; the fourth quadrant is a copy of the first three quadrants scaled by a factor of 1/2. This process is applied recursively: The fourth quadrant of the subimage is a sub-subimage of the first three quadrants scaled by 1/4. MIP, short for "Multum In Parvo" and freely translated as "many in little," means that one bitmap holds many images, whose sizes are powers of two, while the total amount of memory space needed to keep these many images is just 4/3 times that of the largest image. In the original MIP-mapping algorithm (see "Pyramidal Parametrics," by Lance Williams, Computer Graphics, July 1983), you obtained an output image with the requested size by first finding the section in the MIP-map with the nearest size and then interpolating from that.
Diverse implementations have deviated from the original MIP-mapping algorithms and bitmap layouts. That is why I prefer using a looser definition of the concept: A MIP-map is a multiresolution image where the relative resolutions are powers of two. How the multiresolution image is implemented whether a single bitmap with the color channels split into separate quadrants or a list of separate bitmaps for the same scene at different scales doesn't matter in this context. The MIP-map is a basis for interpolation. Several computer games and multimedia titles use MIP-mapping in combination with a nearest neighbor interpolator (like the coarse Bresenham scaler in Listing Three) to scale the appropriate MIP-map section to the final size. I suggest that you use the smooth Bresenham scaler instead, since it offers higher quality at a low-performance cost.
Using MIP-mapping in combination with smooth Bresenham circumvents the limitation that the smooth Bresenham algorithm is only suitable for a fairly small zoom range. When you can choose between image resolutions where each section has twice the size of its predecessor, the zoom factor that the final interpolator must handle lies between 3/4 and 3/2. This is neatly inside the range of smooth Bresenham.
The MIP-mapping scheme assumes that the prescaled bitmaps to select amongst are already available. You are supposed to have prepared them in one way or another, but typically by using a program such as Adobe Photoshop. The idea is that you can now select the best possible interpolator regardless of its performance, as the interpolator does its work at design time, not at run time.
It does not always hold that you are able to create prescaled bitmaps for every image that needs to be scaled. In some situations (for example, when playing a movie at a zoomed size) you have only one bitmap per frame. Fortunately, for the special case of scaling by 2, there exist directionally interpolating algorithms that keep edges sharp and smooth areas smooth. These algorithms can serve as your MIP-map factory at run time.
DDJ
Listing One
void ScaleLine(PIXEL *Target, PIXEL *Source, int SrcWidth, int TgtWidth) { int NumPixels = TgtWidth; int IntPart = SrcWidth / TgtWidth; int FractPart = SrcWidth % TgtWidth; int E = 0; while (NumPixels-- > 0) { *Target++ = *Source; Source += IntPart; E += FractPart; if (E >= TgtWidth) { E -= TgtWidth; Source++; } /* if */ } /* while */ }
Listing Two
#define average(a, b) (PIXEL)(( (int)(a) + (int)(b) ) >> 1) void ScaleLineAvg(PIXEL *Target, PIXEL *Source, int SrcWidth, int TgtWidth) { int NumPixels = TgtWidth; int Mid = TgtWidth / 2; int E = 0; PIXEL p; if (TgtWidth > SrcWidth) NumPixels--; while (NumPixels-- > 0) { p = *Source; if (E >= Mid) p = average(p, *(Source+1)); *Target++ = p; E += SrcWidth; if (E >= TgtWidth) { E -= TgtWidth; Source++; } /* if */ } /* while */ if (TgtWidth > SrcWidth) *Target = *Source; }
Listing Three
void ScaleRect(PIXEL *Target, PIXEL *Source, int SrcWidth, int SrcHeight, int TgtWidth, int TgtHeight) { int NumPixels = TgtHeight; int IntPart = (SrcHeight / TgtHeight) * SrcWidth; int FractPart = SrcHeight % TgtHeight; int E = 0; PIXEL *PrevSource = NULL; while (NumPixels-- > 0) { if (Source == PrevSource) { memcpy(Target, Target-TgtWidth, TgtWidth*sizeof(*Target)); } else { ScaleLine(Target, Source, SrcWidth, TgtWidth); PrevSource = Source; } /* if */ Target += TgtWidth; Source += IntPart; E += FractPart; if (E >= TgtHeight) { E -= TgtHeight; Source += SrcWidth; } /* if */ } /* while */ }