Most real-world programs have performance requirements for profiling in order to determine where any bottlenecks may lie. Developers are notoriously bad at intuiting which parts of their code need optimization and which do not, and so are advised to profile their code before attempting optimizations. However, they are often left without adequate guidance as to the best way of determining accurate performance analysis data, or indeed what performance measurement functions are appropriate for a particular scenario. By knowing the costs and benefits of the available timing options, developers can better judge which profiling techniques to use, and this will help them to factor profiling side effects into their data.
The Win32 API provides a number of different functions for eliciting timing information that may be useful in determining performance metrics. These functions, and the timing information they provide, vary in call cost, resolution, accuracy, scope, and availability, and the choice of which should be used depends on a number of factors, particularly the requirements for efficiency, accuracy, and the targeted platform(s).
This article will describe the six main Win32 timing functions and present corresponding performance counter classes that wrap these functions. The classes provide a simple and consistent interface, allowing developers to swap timing functionality as needed. I'll also present a template class that manipulates instances of any of the six timing classes in order to provide scoped timing operations.
All measurement affects that which is being measured, and software profiling is no exception. Indeed, it is often the case that such profiling has a deleterious and misleading effect on the system being measured. I'll compare the various timing functions available by qualitative analyses of documented resolution and availability (OS support), and quantitative analyses of their practical resolutions (how accurately they measure intervals) and call costs (how much the act of measurement costs).
I will discuss the costs and benefits of each approach and offer advice on when each is most suitable, as well as highlighting some techniques for increasing accuracy by reducing the impact of the measurement process. Finally, a seventh performance counter class will be presented that provides an optimal blend of the examined timing functionality, attempting to use the high-resolution functions but defaulting to a less accurate, but ubiquitous, function when the best timer is not available.
Win32 API Timing Functions
The five Win32 timing functions provided by the base API (as implemented in KERNEL32.dll) are GetTickCount
, GetSystemTime()
/GetSystemTimeAsFileTime()
(see the section "System Time"), QueryPerformanceCounter()
, GetThreadTimes()
, and GetProcessTimes()
these are shown in Table 1, along with the commonly used timeGetTime()
function provided by the Windows MultiMedia API, in WINMM.DLL. (The KERNEL32 functions require linking to kernel32.lib, and timeGetTime()
requires linking to winmm.lib.)
Table 1: Win32 timing functions.
GetTickCount()
takes no argument, and simply returns the number of milliseconds that have elapsed since the system was started. GetTickCount()
is the only timing function (that is used see the section"System Time") that is provided by all operating systems and on all hardware. Table 2 lists the functions and their support on Windows 9x (95, 98, and Me), Windows NT (NT 3.5, 3.51, 4, 2000, and XP), and Windows CE operating systems. timeGetTime()
has the same signature and semantics as GetTickCount()
. On Windows 9x systems its resolution is 1ms, whereas on Windows NT systems it is usually 5ms or more, but can be modified by calling the timeBeginPeriod()
function. In the tests described here, it was left at its default behavior.
Table 2: Functions and their support.
System Time
The documentation for GetSystemTimeAsFileTime()
states that it is equivalent to consecutive calls to GetSystemTime()
and SystemTimeToFileTime()
, as in:
void GetSystemTimeAsFileTime(LPFILETIME lpft) { SYSTEMTIME st; GetSystemTime(&st); SystemTimeToFileTime(&st, lpft); }
While this is true from a functional perspective, it is certainly not the case that it is actually implemented in this way on all operating systems, as can be seen in Table 3.
Table 3: Call cost of system time functions (as percentage of GetSystemTimeAsFileTime()).
On Windows 98, the call costs are roughly equivalent. However, on all the NT-family operating systems, the cost of gleaning time in the form of an intermediate SYSTEMTIME is around 400 times that of GetSystemTimeAsFileTime()
.
While in almost all cases the multimedia timer offers no advantage over GetTickCount()
, it still finds popular use since its measurement resolution is configurable. In addition, the fact that its resolution was 10 times better than GetTickCount()
on one of the machines examined shows that it is worth having in one's toolbox. The timeGetSystemTime()
function was not examined since its documentation states it has a higher cost than timeGetTime()
. Also, its use would result in a more complicated class implementation.
GetSystemTime()
retrieves the current system time and instantiates a SYSTEMTIME structure, which is composed of a number of separate fields including year, month, day, hours, minutes, seconds, and milliseconds. A peer function, GetSystemTimeAsFileTime()
, retrieves the current system time in a single 64-bit argument (in the form of the Win32 FILETIME structure), measured in 100ns intervals. See the previous section "System Time" for a discussion of their implementation relationship.
If a system has a high-resolution counter, then QueryPerformanceCounter()
may be used to obtain the current (64-bit) value of the high-performance counter, in the form of a Win32 LARGE_INTEGER structure. The value returned is the current count of the hardware counter and does not, in and of itself, represent a specific time unit. Because different hardware counters may use different counting frequencies, QueryPerformanceFrequency()
must be called (once per host session) to determine the high-performance counter frequency in order to convert the performance counter values into time intervals. For example, if the high-performance counter frequency is 1,000,000, and two successive calls to QueryPerformanceCounter()
yield a difference of 2000, then 2ms have elapsed. When no hardware counter is available, both QueryPerformanceCounter()
and QueryPerformanceFrequency()
return False. In practice, I have not encountered a laptop or desktop machine (running 9x or NT) on which a high-performance counter is not available.
Note that while I have not seen it documented that the value returned by QueryPerformanceFrequency()
is fixed for a particular processor, I have never encountered a machine on which this does not hold true. Indeed, experiments showed that while the processor frequency for one of the laptops used in these tests is affected by running in battery mode, the performance frequency is unaffected (3,579,545 in both cases). I am, therefore, reasonably confident that this assumption holds in all cases.
GetTickCount()
, timeGetTime()
, GetSystemTime()
/GetSystemTimeAsFileTime()
, and QueryPerformanceCounter()
all yield values on a systemwide basis. In other words, they measure absolute times on the system, so if the system has other busy processes, the measured values will reflect that activity. While it is commonly the case that one can run performance tests on a system where all other processes are in a quiescent state, sometimes it is not possible. Furthermore, it is sometimes desirable to get a finer-grained look into a process's activities, in terms of the individual performance costs of the kernel and user components.
On Windows NT operating systems, the GetThreadTimes()
and GetProcessTimes()
functions provide this information on a per-thread and per-process basis, respectively. These Win32 functions provide four 64-bit values (of type FILETIME) to the caller for the creation time, exit time, current kernel time, and current user time for the given thread/process, measured in 100ns intervals.
The Performance Counter Classes
The six classes presented here tick_counter, multimedia_counter, systemtime_counter, highperformance_counter, threadtimes_counter, and processtimes_counter are from the WinSTL performance library, and are based on the six Win32 timing functions described in Table 1. The essentials of each implementation are shown in Listings One, Two, Three, Four, Five, and Six.
Listing One: Extract from winstl_tick_counter.h
/* ///////////////////////////////////////////////////////////// * ... * * Extract from winstl_tick_counter.h * * Copyright (C) 2002, Synesis Software Pty Ltd. * (Licensed under the Synesis Software Standard Source License: * http://www.synesis.com.au/licenses/ssssl.html) * * ... * ////////////////////////////////////////////////////////// */ // Operations inline void tick_counter::start() { m_start = ::GetTickCount(); } inline void tick_counter::stop() { m_end = ::GetTickCount(); } // Attributes inline tick_counter::interval_type tick_counter::get_period_count() const { return static_cast<interval_type>(m_end - m_start); } inline tick_counter::interval_type tick_counter::get_seconds() const { return get_period_count() / interval_type(1000); } inline tick_counter::interval_type tick_counter::get_milliseconds() const { return get_period_count(); } inline tick_counter::interval_type tick_counter::get_microseconds() const { return get_period_count() * interval_type(1000); }
Listing Two: Extract from winstl_multimedia_counter.h
/* ///////////////////////////////////////////////////////////// * ... * * Extract from winstl_multimedia_counter.h * * Copyright (C) 2002, Synesis Software Pty Ltd. * (Licensed under the Synesis Software Standard Source License: * http://www.synesis.com.au/licenses/ssssl.html) * * ... * ////////////////////////////////////////////////////////// */ // Operations inline void multimedia_counter::start() { m_start = ::timeGetTime(); } inline void multimedia_counter::stop() { m_end = ::timeGetTime(); }