Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Inside Xbox Graphics


Aug00: Xbox Memory Bandwidth

Xbox Memory Bandwidth

UMA doesn't slow Xbox down because the Xbox GPU has several bandwidth-saving features. Also, Xbox has extremely high memory bandwidth, thanks to its 128-bit-wide 200 MHz double-data-rate memory, which yields 6.4 GBps. To illustrate how this works, let's try to push the GPU to the rarefied level of 100-million Gouraud-shaded, two-texture triangles per second (tris/sec.). Note that this is a simplified discussion; there are many internal factors that can have an impact, such as caching, memory access patterns, and texture format, but that's far too complex to get into here.

For starters, memory usage won't be perfectly efficient, so let's assume 5.1 GBps is actually available. Next, as shown in Figure 1, a maximum of 1 GBps is allocated to the CPU, with CPU requests getting higher priority than the graphics pipeline. The CPU will rarely use its full allocation (which is considerably more than most current PCs support), but let's allocate the full 1GBps anyway.

That leaves 4.1 GBps for the graphics and audio controllers. We'll allocate 25 MBps for audio and 75 MBps for video output, leaving 4.0 GBps for the graphics pipeline.

We'll have to make some assumptions here. Let's say the triangles to be drawn are stored indexed meshes, with a vertex:triangle ratio of 0.625:1 (the GPU peaks at 62.5 Mverts/sec. with this many vertex attributes, so we're maxing out vertex processing), 32 bytes/vertex (x, y, z, four texture coordinates, one color), and 6 bytes/triangle for indices, all fetched directly by the GPU. This yields about 26 bytes/triangle, so drawing 100 Mtris/sec. requires 2.6 GBps just to fetch the triangles, leaving 1.4 GBps unallocated.

Texturing is the next bandwidth sink. For simplicity, let's assume a texel:pixel ratio of 1:1, with 32-bit pixels, texels, and z. Let's further assume that the screen is overdrawn 10 times/frame, so each triangle is roughly 1.8 pixels in size. (That might sound small, but when you're dealing with more than 1.5 Mtris/frame, the individual triangles won't be very large.) Then the raw texture bandwidth would be twice the 720 MBps drawn-pixel rate (because there are two textures). However, the GPU supports DirectX texture compression, so textures will typically be compressed by 75 percent. This leaves us with texture bandwidth at 360 MBps, which reduces to 90 MBps due to early z processing that avoids unneeded texture reads assuming z rejection three-quarters of the time (reasonable with 10X overdraw). The same assumption reduces pixel bandwidth to 180 MBps. Finally, there's 900 MBps for the z-buffer (again assuming z rejection three-quarters of the time), putting us pretty close to our bandwidth budget.

There's a trick, though -- automatic z compression that cuts z-buffer bandwidth by 60 to 65 percent. This reduces z-buffer bandwidth to about 330 MBps, for a total of 600 MBps used by this part of the pipeline -- making it with 800 Mbps to spare. So Xbox has plenty of bandwidth to render 100 Mtris/sec. But what happens when we turn on antialiasing?

The bandwidth requirements for antialiasing are the same as mentioned earlier, except for pixel and z-buffer bandwidth. Textures don't need to be any more detailed, because they ultimately relate to actual pixel size and are already filtered. The GPU supports two- and four-sample antialiasing, so there are two analyses to perform.

For the two-sample case, the pixel buffer stays the same size, because 16-bit dithered pixels can be used; filtered together, these provide adequate information for a 32-bit pixel. Z-buffer bandwidth doubles, to 660 MBps, and 150 MBps is needed for the blt that filters the samples into pixels, but we still have 320 Mbps to spare.

Finally, we come to four-sample antialiasing, for which we pull another rabbit out of our hat: a switch from 32-bit z-buffering to 16-bit w-buffering, at the cost of losing the stencil buffer. W-buffering keeps the z-buffer bandwidth at two-sample levels, letting us sneak under the wire, despite an extra 75 MBps for the filter blt and 180 MBps for pixel writes.

In general, though, bandwidth is not nearly as tight as it seems. I intentionally chose an unusually tough scenario and didn't employ vertex compression, both to be conservative and because it's similar to what current games do. However, high-end Xbox programming will be different from current 3D. Because of the power of the shaders and the vast bandwidth difference between the CPU and GPU, the key to Xbox graphics performance will be getting the CPU out of the loop by using static meshes and moving calculations from the CPU to the vertex and pixel shaders. With this approach, a vertex might consist of a compressed 6- or 12-byte coordinate and a compressed 4- or 6-byte normal, with lighting and texture coordinates calculated by the GPU; there's a gigabyte or more of bandwidth to spare with the resulting 10- to 18-byte vertices.

-- M.A.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.