Initialized Counters
If you need to have your counter class initialized to meaningful values from the point of its construction, you can derive from the one you are interested in, and call its start()
and stop()
members in the constructor of your derived class. Alternatively, you can use the WinSTL class performance_counter_init template:
template <class C> class performance_counter_init : public C { public: typedef C counter_type; // Conclusion public: performance_counter_init() { counter_type &counter = *this; counter.start(); counter.stop(); } };
which basically does this for you for any class on which you parameterize it, as in:
performance_counter_init<tick_counter> counter; some_timed_operation(); counter.stop(); dump(counter.get_milliseconds());
Call Costs
Any measurements on a system affect the behavior being measured. Therefore, an important characteristic of the performance classes (and their underlying timing functions) is the cost of the timing function calls. The first part of the analysis is to quantify the call costs of the functions.
Listing Eight shows the essentials of the counter_cost application. For each of the counter classes, the template test_cost()
function is called, and the returned timing results, representing the total call costs, are printed to stdout.
The test_cost()
function takes the form of an outer loop (which is executed twice in order to eliminate any caching effects, and the value of the second iteration is used), and an inner loop within which start()
and stop()
are called 1,000,000 times on an instance of the counter type being examined. The main application counter (which is an instance of highperformance_counter) measures the cost of the inner loop using the performance_counter_scope template.
Because the operating systems are on machines with widely different hardware, comparisons of the actual time costs over different systems are not meaningful. Since the call costs of GetTickCount()
were lower than those of any other timing function (except GetSystemTimeAsFileTime()
on XP), the results are expressed as a percentage of the GetTickCount()
time on each platform to provide meaningful comparisons. The results are shown in Table 4.
The results clearly demonstrate that GetTickCount()
has the lowest performance cost on all operating systems, except the single case of GetSystemTimeAsFileTime()
on XP. Also clear is the fact that timeGetTime()
costs between four and 69 times that of GetTickCount()
.
Table 4: Call cost of timing functions (as percentage of GetTickCount()).
On NT operating systems, GetSystemTimeAsFileTime()
has barely any additional cost over GetTickCount()
. It is also notable that GetSystemTimeAsFileTime()
has a relatively better performance on later operating-system variants. However, on Windows 98, this call has an exceedingly high cost, nearly 8000 times that of GetTickCount()
. QueryPerformanceCounter()
has a high call cost on all operating systems, ranging from 49 to 2080 times that of GetTickCount()
.
The cost of GetThreadTimes()
and GetProcessTimes()
is very consistent over all flavors of NT operating systems (between 296 and 924 times that of GetTickCount()
). Note that the figures are not shown for Windows 98, since these two functions are not implemented on 9x.
One final point is that QueryPerformanceCounter
has a higher cost than GetThreadTimes()
/GetProcessTimes()
on single processor machines, but lower on multiprocessor machines. Presumably this is because access to the thread/system time infrastructure on multiprocessor machines requires synchronization, and that to the performance counter hardware does not.
Call Resolution
The other characteristic examined is that of the resolution of the various timing functions. Their documented resolutions are listed in Table 5. The second part of the analysis quantifies the actual resolutions of the functions.
Table 5: Resolution of timing functions.
Listing Nine shows the implementation of the counter_resolution application. For each of the counter classes, the test_resolution()
template function is called, and the returned results, representing the minimum measured resolution for the counter class, are printed to stdout.
Listing Nine: Extract from counter_resolution.cpp
/* ///////////////////////////////////////////////////////////// * ... * * Extract from counter_resolution.cpp * * Copyright (C) 2002, Synesis Software Pty Ltd. * (Licensed under the Synesis Software Standard Source License: * http://www.synesis.com.au/licenses/ssssl.html) * * ... * ////////////////////////////////////////////////////////// */ #include <stdio.h> #define _WINSTL_NO_NAMESPACES #include <winstl.h> #include <winstl_tick_counter.h> #include <winstl_multimedia_counter.h> #include <winstl_systemtime_counter.h> #include <winstl_highperformance_counter.h> #include <winstl_threadtimes_counter.h> #include <winstl_processtimes_counter.h> #include <winstl_performance_counter.h> #include <stlsoft_limit_traits.h> /* ////////////////////////////////////////////////////////////////////// */ const int C_ITERATIONS = 1000000; /* ////////////////////////////////////////////////////////////////////// */ template <ws_typename_param_k C> inline ws_typename_type_k C::interval_type test_resolution(C &counter) { typedef ws_typename_type_k C::interval_type interval_type; interval_type min_inc = stlsoft::limit_traits<interval_type>::maximum(); for(volatile int i = 0; i < C_ITERATIONS; ++i) { counter.start(); // Execute a short inner loop, capping at 2048 repeats for(volatile int j = 0; j < (i & 0x7ff); ++j) {} counter.stop(); interval_type interval = counter.get_microseconds(); if( interval != 0 && interval < min_inc) { min_inc = interval; } } return min_inc; } int main(int /* argc */, char* /* argv */[]) { #if defined(_STLSOFT_COMPILER_IS_BORLAND) || \ defined(_STLSOFT_COMPILER_IS_INTEL) || \ defined(_STLSOFT_COMPILER_IS_MSVC) #define _counter_test_fmt "%I64d" #else #define _counter_test_fmt "%lld" #endif /* compiler */ #define _test_counter(_x) \ do \ { \ _x x; \ \ printf( #_x ": " _counter_test_fmt "us\n", \ test_resolution(x)); \ } \ while(0) _test_counter(tick_counter); _test_counter(multimedia_counter); _test_counter(systemtime_counter); _test_counter(highperformance_counter); _test_counter(threadtimes_counter); _test_counter(processtimes_counter); _test_counter(performance_counter); return 0; }
The test_resolution()
function takes the form of an outer loop, which executes 100,000 times. Within that loop, an inner loop of a limited maximum 2048 iterations is executed, and its execution time measured. The minimum nonzero (since it is likely that some intervals will be reported to be 0) interval is recorded, and returned as the result of the function. The results are shown in Table 5.
The results mainly illustrate that every timing function save QueryPerformanceCounter()
(between 1_s and 5_s) has a significantly lower actual resolution than stated. The three exceptions are GetTickCount()
and timeGetTime()
on Windows 98, and timeGetTime()
on one particular dual-processor Windows 2000 machine (though the other SMP 2000 machine does not show this). In all other cases, the best resolution ranges from 10ms to 20ms.
It is also interesting to note that for most machines, the resolutions obtainable from GetThreadTimes()
, GetProcessTimes()
, GetSystemTimeAsFileTime()
, and timeGetTime()
are (roughly) equivalent to that of GetTickCount()
, suggesting that all these functions derive their timing information from a common low-resolution source.