H.264 and Video Compression

By Stewart Taylor, August 31, 2007

Producing video compression of acceptable quality and very low bit-rate

Transformation and Quantization

In Intel IPP functions, transform and quantization functionality are merged for more efficiency. There are four functions for the decoding of H.264:

ippiTransformDequantLumaDC_H264_16s_C1I
ippiTransformDequantChromaDC_H264_16s_C1I
ippiDequantTransformResidual_H264_16s_C1I
ippiDequantTransformResidualAndAdd_H264_16s_C1I

There are analogous functions for encoding:

ippiTransformQuantLumaDC_H264_16s_C1I
ippiTransformQuantChromaDC_H264_16s_C1I
ippiTransformQuantResidual_H264_16s_C1I

Additional functions handle 8x8 blocks.

Listing 3 lists a block of code from the H.264 that uses these functions.

The cbp4x4 variable is a bitmask indicating whether there are any DC coefficients within the macroblock that have any data, and individually whether each residual (AC) block within the macroblock has any data. The QP variable contains the Quality Parameter that specifies the degree of quantization.

If the bitmask indicates that there is any DC luma data, the code transforms it with the ippiTransformDequantLumaDC function. Then the code iterates over the 16 blocks within the macroblock. For each block, if there is either DC data or residual data, the code will transform and dequantize the block. It will pass in the decoded DC coefficient, which might be 0, the buffer of residual data along with a flag indicating whether the residual data is valid, and the Quality Parameter.

if ((cbp4x4 & (IPPVC_CBP_LUMA_AC | IPPVC_CBP_LUMA_DC)) != 0)
{
  Ipp16s *pDC;
  Ipp16s DCCoeff;

  Ipp16s *tmpbuf;

  /* bit var to isolate cbp for block being decoded */
  Ipp32u uCBPMask = (1 << IPPVC_CBP_1ST_LUMA_AC_BITPOS);

  if ((cbp4x4 & IPPVC_CBP_LUMA_DC) != 0)
  {
    luma_dc = (*ppSrcCoeff);
    *ppSrcCoeff += 16;
    ippiTransformDequantLumaDC_H264_16s_C1I(luma_dc, QP);
  }

  tmpbuf = 0;  /* init as no ac coeffs */
  pDC = 0;  /* init as no dc */

  ac_coeffs = pDstCoeff;

  for (Ipp32s uBlock = 0; uBlock < 16;
       uBlock++, uCBPMask <<= 1)
  {
    DCCoeff = (Ipp16s)luma_dc[block_subblock_mapping[uBlock]];
    if (DCCoeff != 0)
      pDC = &DCCoeff; /* dc coeff presents */

    if ((cbp4x4 & uCBPMask) != 0)
    {
      memcpy(pDstCoeff, *ppSrcCoeff, 16*sizeof(Ipp16s));
      tmpbuf = pDstCoeff;
      pDstCoeff += 16;
      *ppSrcCoeff += 16;
    }

    Ipp32s hasAC = tmpbuf != 0;
    if (tmpbuf || pDC)
    {
      if (!pDC)
      {
        if (tmpbuf)
        {
          if (dc_present)
            tmpbuf[0] = 0;
        }
      }
      else
      {
        if (!tmpbuf)
        {
          tmpbuf = pDstCoeff;
          pDstCoeff += 16;
          cbp4x4 |= uCBPMask;
        }
      }
      ippiDequantTransformResidual_H264_16s_C1I(tmpbuf, 8, pDC,
        hasAC, QP);
      tmpbuf = 0;
      pDC = 0;
    }
  }
}

Listing 3: Transformation and Quantization in H.264

Deblocking Filter

The Intel IPP functions that perform filtering on the edges of macroblocks are divided according to horizontal and vertical edges, luma and chroma blocks, block size, bit depth, and sampling rate. They are the following:

ippiFilterDeblockingLuma_VerEdge_H264_[8u|16u]_C1IR
ippiFilterDeblockingLuma_HorEdge_H264_[8u|16u]_C1IR
ippiFilterDeblockingChroma_HorEdge[422|444]_H264_[8u|16u]_C1IR
ippiFilterDeblockingChroma_VerEdge[422|444]_H264_[8u|16u]_C1IR
ippiFilterDeblockingLuma_VerEdge_MBAFF_H264_[8u|16u]_C1IR
ippiFilterDeblockingChroma_VerEdge_MBAFF_H264_[8u|16u]_C1IR

The MBAFF versions of the function filter 16x8 blocks instead of 16x16 and are intended for use with interlaced video.

Slightly different variations of some of these functions take a structure of parameters instead of pushing all of the parameters on the stack. These provide a slight performance improvement due to decreased stack usage.

Listing 4 shows a code snippet that executes a deblocking filter. The behavior of the filters are determined by the alpha, beta, and clipping thresholds, and the filter strength arrays. The alpha parameter is the threshold for gradient across the edges, while the beta parameter is the threshold for gradient on one side of an edge. The clipping thresholds, held in the array Clipping and called tc0 in the standard, limit the effect of the filter. The threshold parameters are based on fixed tables, indexed by the Quality Parameter (QP) plus a tuning factor. The strength parameter pStrength, which is referred to as bS in the standard, affects the deblocking filter in a number of ways, including the basic algorithm. Both the tables and the formulas used in to calculate the indices are taken from the H.264 standard.

For simplicity, this code uses simple wrapper functions around each of the Intel IPP functions. The wrappers adapt the arguments and provide a uniform prototype for all the deblocking filters, but do not do any computation. Since they have a uniform prototype, the function calls them indirectly, according to a table set elsewhere.

Ipp8u BETA_TABLE[52] =
{
  0,  0,  0,  0,  0,  0,  0,  0,
  0,  0,  0,  0,  0,  0,  0,  0,
  2,  2,  2,  3,  3,  3,  3,  4,
  4,  4,  6,  6,  7,  7,  8,  8,
  9,  9,  10, 10, 11, 11, 12, 12,
  13, 13, 14, 14, 15, 15, 16, 16,
  17, 17, 18, 18
};

.{
  ...
  IppStatus ( *(IppDeblocking[])) (Ipp8u *, Ipp32s, Ipp8u *,
    Ipp8u *, Ipp8u *, Ipp8u *, Ipp32s ) =
  {
    &(FilterDeblockingLuma_VerEdge),
    &(FilterDeblockingLuma_HorEdge),
    &(FilterDeblockingChroma_VerEdge),
    &(FilterDeblockingChroma_HorEdge),
    &(FilterDeblockingChroma422_VerEdge),
    &(FilterDeblockingChroma422_HorEdge),
    &(FilterDeblockingChroma444_VerEdge),
    &(FilterDeblockingChroma444_HorEdge),
    &(FilterDeblockingLuma_VerEdge_MBAFF),
    &(FilterDeblockingChroma_VerEdge_MBAFF)
  };

  IppStatus ( *(IppDeblocking16u[])) (Ipp16u *, Ipp32s, Ipp8u *,
    Ipp8u *, Ipp8u *, Ipp8u *, Ipp32s ) =
  {
    &(FilterDeblockingLuma_VerEdge),
    &(FilterDeblockingLuma_HorEdge),
    &(FilterDeblockingChroma_VerEdge),
    &(FilterDeblockingChroma_HorEdge),
    &(FilterDeblockingChroma422_VerEdge),
    &(FilterDeblockingChroma422_HorEdge),
    &(FilterDeblockingChroma444_VerEdge),
    &(FilterDeblockingChroma444_HorEdge),
    &(FilterDeblockingLuma_VerEdge_MBAFF),
    &(FilterDeblockingChroma_VerEdge_MBAFF)
  };

  // internal edge variables
  QP = pmq_QP;

  index = IClip(0, 51, QP + BetaOffset);
  Beta[1] = (Ipp8u) (BETA_TABLE[index]);

  index = IClip(0, 51, QP + AlphaC0Offset);
  Alpha[1] = (Ipp8u) (ALPHA_TABLE[index]);
  pClipTab = CLIP_TAB[index];

  // create clipping values
  {
    Ipp32s edge;

    for (edge = 1;edge < 4;edge += 1)
    {
      if (*((Ipp32u *) (pStrength + edge * 4)))
      {
        // create clipping values
        Clipping[edge * 4 + 0] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 0]]);
        Clipping[edge * 4 + 1] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 1]]);
        Clipping[edge * 4 + 2] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 2]]);
        Clipping[edge * 4 + 3] =
          (Ipp8u) (pClipTab[pStrength[edge * 4 + 3]]);
      }
    }
  }

  if (pParams->bitDepthLuma > 8)
  {
    IppDeblocking16u[dir]((Ipp16u*)pY,
      pic_pitch,
      Alpha,
      Beta,
      Clipping,
      pStrength,
      pParams->bitDepthLuma);
  }
  else
  {
    IppDeblocking[dir](pY,
      pic_pitch,
      Alpha,
      Beta,
      Clipping,
      pStrength,
      pParams->bitDepthLuma);
  }
}

Listing 4: Deblocking Filters in H.264

Threading and Video Coding

H.264 and MPEG-4 in general are amenable to threading. Listing 5 shows the key piece of code from the Intel IPP codec sample for H.264 that uses one OpenMP pragma to parallelize this encoder.

The key aspect of this code is the slice. The slice is defined as an independent segment of the image, a segment that neither uses other video slices for reference in prediction is used for reference by other video slices. That makes it the perfect level for parallelization, as the codec can process multiple slices simultaneously and not be forced into serial mode by motion compensation.

template <class PixType, class CoeffsType> Status
  H264CoreEncoder<PixType,CoeffsType>::CompressFrame(
   EnumPicCodType &    ePictureType,
   EnumPicClass   &    ePic_Class,
   MediaData*        dst)
{
  Status      status = UMC_OK;
  Ipp32s  slice;

  for (m_field_index=0;
    m_field_index <= (Ipp8u)
    (m_pCurrentFrame->m_PictureStructureForDec< FRM_STRUCTURE); 
	m_field_index++)
  {
    ...

#if defined _OPENMP
      vm_thread_priority mainTreadPriority = vm_get_current_thread_priority();
#pragma omp parallel for private(slice)
#endif // _OPENMP
      for (slice = (Ipp32s)m_info.num_slices*m_field_index;
           slice < m_info.num_slices*(m_field_index+1);
           slice++)
      {
#if defined _OPENMP
        vm_set_current_thread_priority(mainTreadPriority);
#endif // _OPENMP

        UpdateRefPicList(m_Slices + slice,
          m_pCurrentFrame->GetRefPicLists(slice),
          m_SliceHeader, &m_ReorderInfoL0,
          &m_ReorderInfoL1);

        // Compress one slice
        if (m_is_cur_pic_afrm)
          m_Slices[slice].status =
            Compress_Slice_MBAFF(m_Slices + slice);
        else{
          m_Slices[slice].status =
            Compress_Slice(m_Slices + slice,
            slice == m_info.num_slices*m_field_index);
        }
      ...
      }

Listing 5: Threading the H.264 Encoder

Previous 1 2 3 4 5

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

H.264 and Video Compression

Transformation and Quantization

Deblocking Filter

Threading and Video Coding

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

H.264 and Video Compression

Transformation and Quantization

Deblocking Filter

Threading and Video Coding

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content