Intra Prediction
Intra frames by their nature don't depend on earlier or later frames for reconstruction. However, in H.264 the encoder can use earlier blocks from within the same frame as reference for new blocks. This process, intra prediction, can give additional compression for intra macroblocks, and can be particularly effective if a sufficiently appropriate reference block can be found.
The reference blocks are not used in the way that inter prediction blocks are, by taking the pixel-by-pixel difference of actual blocks in adjacent frames. Instead, a prediction of the current block is calculated as an average of some of the pixels bordering it. Which pixels are chosen and how they are used to calculate the block is dependent on the intra prediction mode. Figure 4 shows the directions that pixels may be used, along with the mode numbers as defined in the H.264 specification.
This can also be one of the most computationally intensive parts of the encoding process. For the encoder to exhaustively search through all options, it would have to compare each 16x16 luma or 8x8 chroma block against four other blocks, and each 4x4 or 8x8 luma block against 9 other blocks.
Because the encoder can consider a variety of block sizes, a scheme that optimizes the trade-off between the number of bits necessary to represent the video and the fidelity of the result is desirable.
Transformation
Instead of the DCT, the H.264 algorithm uses an integer transform as its primary transform to translate the difference data between the spatial and frequency domains. The transform is an approximation of the DCT that is both lossless and computationally simpler. The core transform, illustrated in Figure 5, can be implemented using only shifting and adding.
This 4x4 transform is only one flavor of the H.264 transform. H.264 defines transformations on 2x2 and 4x4 blocks in the baseline profile, and additional profiles support transforms on larger block sizes, rectangular or square, with dimensions that are also powers of two.
The algorithm applies the transforms separately on the first, or DC chroma and luma component. In the baseline profile, H.264 uses one 2x2 transform chroma DC coefficients, a 4x4 transform luma DC coefficients, and the main 4x4 transform for all other coefficients.
Quantization
The quantization stage reduces the amount of information by dividing each coefficient by a particular number to reduce the quantity of possible values that value could have. Because this makes the values fall into a narrower range, this allows entropy coding to express the values more compactly.
Quantization in H.264 is arithmetically expressed as a two-stage operation. The first stage is multiplying each coefficient in the 4x4 block by a fixed coefficient-specific value. This stage allows the coefficients to be scaled unequally according to importance or information. The second stage is dividing by an adjustable quantization parameter (QP) value. This stage provides a single "knob" for adjusting the quality and resultant bitrate of the encoding. The two operations can be combined into a single multiplication and single shift operation.
The QP is expressed as an integer from 0 to 51. This integer is converted to a quantization step size (QStep) nonlinearly. Each six steps increases the step size by a factor of 2, and between each pair of power-of-two step sizes N and 2N there are 5 five steps: 1.125N, 1.25N, 1.375N, 1.625N, 1.75N.
Reordering
When encoding the coefficients of each macroblock using entropy coding, the codec processes the blocks in a particular order. The order helps increase the number of consecutive zeros.
It's natural to handle this ordering when writing the output of the transform and quantization stage.