.NET

Win32 Performance Measurement Options

By Matthew Wilson, May 01, 2003

Win32 provides six main timing functions for profiling. We'll analyze them and present six corresponding performance counter classes that wrap these functions. Also included is a template class that manipulates instances of any of the six timing classes in order to provide scoped timing operations.

Initialized Counters

If you need to have your counter class initialized to meaningful values from the point of its construction, you can derive from the one you are interested in, and call its start() and stop() members in the constructor of your derived class. Alternatively, you can use the WinSTL class performance_counter_init template:

template <class C>
class performance_counter_init
  : public C
{
public:
  typedef C     counter_type;

// Conclusion
public:
  performance_counter_init()
  {
    counter_type  &counter  = *this;

    counter.start();
    counter.stop();
  }
};

which basically does this for you for any class on which you parameterize it, as in:

  performance_counter_init<tick_counter>  counter;

  some_timed_operation();

  counter.stop();

  dump(counter.get_milliseconds());

Call Costs

Any measurements on a system affect the behavior being measured. Therefore, an important characteristic of the performance classes (and their underlying timing functions) is the cost of the timing function calls. The first part of the analysis is to quantify the call costs of the functions.

Listing Eight shows the essentials of the counter_cost application. For each of the counter classes, the template test_cost() function is called, and the returned timing results, representing the total call costs, are printed to stdout.

The test_cost() function takes the form of an outer loop (which is executed twice in order to eliminate any caching effects, and the value of the second iteration is used), and an inner loop within which start() and stop() are called 1,000,000 times on an instance of the counter type being examined. The main application counter (which is an instance of highperformance_counter) measures the cost of the inner loop using the performance_counter_scope template.

Because the operating systems are on machines with widely different hardware, comparisons of the actual time costs over different systems are not meaningful. Since the call costs of GetTickCount() were lower than those of any other timing function (except GetSystemTimeAsFileTime() on XP), the results are expressed as a percentage of the GetTickCount() time on each platform to provide meaningful comparisons. The results are shown in Table 4.

The results clearly demonstrate that GetTickCount() has the lowest performance cost on all operating systems, except the single case of GetSystemTimeAsFileTime() on XP. Also clear is the fact that timeGetTime() costs between four and 69 times that of GetTickCount().

Table 4: Call cost of timing functions (as percentage of GetTickCount()).

On NT operating systems, GetSystemTimeAsFileTime() has barely any additional cost over GetTickCount(). It is also notable that GetSystemTimeAsFileTime() has a relatively better performance on later operating-system variants. However, on Windows 98, this call has an exceedingly high cost, nearly 8000 times that of GetTickCount(). QueryPerformanceCounter() has a high call cost on all operating systems, ranging from 49 to 2080 times that of GetTickCount().

The cost of GetThreadTimes() and GetProcessTimes() is very consistent over all flavors of NT operating systems (between 296 and 924 times that of GetTickCount()). Note that the figures are not shown for Windows 98, since these two functions are not implemented on 9x.

One final point is that QueryPerformanceCounter has a higher cost than GetThreadTimes()/GetProcessTimes() on single processor machines, but lower on multiprocessor machines. Presumably this is because access to the thread/system time infrastructure on multiprocessor machines requires synchronization, and that to the performance counter hardware does not.

Call Resolution

The other characteristic examined is that of the resolution of the various timing functions. Their documented resolutions are listed in Table 5. The second part of the analysis quantifies the actual resolutions of the functions.

Table 5: Resolution of timing functions.

Listing Nine shows the implementation of the counter_resolution application. For each of the counter classes, the test_resolution() template function is called, and the returned results, representing the minimum measured resolution for the counter class, are printed to stdout.

Listing Nine: Extract from counter_resolution.cpp


/* /////////////////////////////////////////////////////////////
 * ...
 *
 * Extract from counter_resolution.cpp
 *
 * Copyright (C) 2002, Synesis Software Pty Ltd.
 * (Licensed under the Synesis Software Standard Source License:
 *  http://www.synesis.com.au/licenses/ssssl.html)
 *
 * ...
 * ////////////////////////////////////////////////////////// */

#include <stdio.h>

#define _WINSTL_NO_NAMESPACES

#include <winstl.h>
#include <winstl_tick_counter.h>
#include <winstl_multimedia_counter.h>
#include <winstl_systemtime_counter.h>
#include <winstl_highperformance_counter.h>
#include <winstl_threadtimes_counter.h>
#include <winstl_processtimes_counter.h>
#include <winstl_performance_counter.h>

#include <stlsoft_limit_traits.h>

/* ////////////////////////////////////////////////////////////////////// */

const int   C_ITERATIONS    =   1000000;

/* ////////////////////////////////////////////////////////////////////// */

template <ws_typename_param_k C>
inline ws_typename_type_k C::interval_type test_resolution(C &counter)
{
  typedef ws_typename_type_k C::interval_type interval_type;

  interval_type   min_inc = stlsoft::limit_traits<interval_type>::maximum();

  for(volatile int i = 0; i < C_ITERATIONS; ++i)
  {
    counter.start();

    // Execute a short inner loop, capping at 2048 repeats
    for(volatile int j = 0; j < (i & 0x7ff); ++j)
    {}

    counter.stop();

    interval_type   interval = counter.get_microseconds();

    if( interval != 0 &&
        interval < min_inc)
    {
      min_inc = interval;
    }
  }

  return min_inc;
}

int main(int /* argc */, char* /* argv */[])
{

#if defined(_STLSOFT_COMPILER_IS_BORLAND) || \
    defined(_STLSOFT_COMPILER_IS_INTEL) || \
    defined(_STLSOFT_COMPILER_IS_MSVC)    
 #define _counter_test_fmt	"%I64d"
#else
 #define _counter_test_fmt	"%lld"
#endif /* compiler */

#define _test_counter(_x)   \
  do \
  { \
    _x x; \
   \
    printf( #_x ": " _counter_test_fmt "us\n", \
            test_resolution(x)); \
  } \
  while(0)

  _test_counter(tick_counter);
  _test_counter(multimedia_counter);
  _test_counter(systemtime_counter);
  _test_counter(highperformance_counter);
  _test_counter(threadtimes_counter);
  _test_counter(processtimes_counter);
  _test_counter(performance_counter);

  return 0;
}

The test_resolution() function takes the form of an outer loop, which executes 100,000 times. Within that loop, an inner loop of a limited maximum 2048 iterations is executed, and its execution time measured. The minimum nonzero (since it is likely that some intervals will be reported to be 0) interval is recorded, and returned as the result of the function. The results are shown in Table 5.

The results mainly illustrate that every timing function save QueryPerformanceCounter() (between 1_s and 5_s) has a significantly lower actual resolution than stated. The three exceptions are GetTickCount() and timeGetTime() on Windows 98, and timeGetTime() on one particular dual-processor Windows 2000 machine (though the other SMP 2000 machine does not show this). In all other cases, the best resolution ranges from 10ms to 20ms.

It is also interesting to note that for most machines, the resolutions obtainable from GetThreadTimes(), GetProcessTimes(), GetSystemTimeAsFileTime(), and timeGetTime() are (roughly) equivalent to that of GetTickCount(), suggesting that all these functions derive their timing information from a common low-resolution source.

Previous 1 2 3 4 5 6 7 8 9 10 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET

Win32 Performance Measurement Options

Initialized Counters

Call Costs

Table 4: Call cost of timing functions (as percentage of GetTickCount()).

Call Resolution

Table 5: Resolution of timing functions.

Listing Nine: Extract from counter_resolution.cpp

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

.NET

Win32 Performance Measurement Options

Initialized Counters

Call Costs

Table 4: Call cost of timing functions (as percentage of GetTickCount()).

Call Resolution

Table 5: Resolution of timing functions.

Listing Nine: Extract from counter_resolution.cpp

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

.NET Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content