Win32 64-Bit Integers
The structure layouts of LARGE_INTEGER and FILETIME are such that it is safe to cast them to and from a 64-bit integer ws_sint64_t
(defined to be signed __int64
for Borland, Digital Mars, Visual C++ and Watcom compilers, and signed long long
for Comeau, GCC, and Metrowerks compilers) so long as the platform is little-endian (i.e., Intel), since the LowPart/dwLowDateTime member preceeds the HighPart/dwHighDateTime member. I do not have access to the Win32 headers for any big-endian systems (i.e., PowerPC, ALPHA), so I cannot presume that the layout of the structure members would be reversed (although I hope they would be), which would maintain the compatibility with the 64-bit integers. If they are not, then the systemtime_counter, highperformance_counter, threadtimes_ counter, processtimes_counter,
and highperformance_ counter
classes are not valid for systems that are not little-endian.
highperformance_counter
highperformance_counter records the LARGE_INTEGER values obtained from QueryPerformanceCounter()
in its start()
and stop()
methods, converting to ws_sint64_t
(see the previous section "Win32 64-Bit Integers"). get_seconds()
is implemented by dividing the value returned from get_period_count()
by the frequency (obtained from QueryPerformanceFrequency()
). This frequency is hardware dependent, but is commonly the processor frequency or a small factor thereof. The _frequency()
method obtains the frequency via a once-only (since the s_frequency variable is static) call to _query_frequency()
. _query_frequency()
is implemented such that if QueryPerformanceFrequency
returns False, indicating the absence of high-performance counter support, the value returned is the maximum value for its type, so that future divisions will evaluate to 0, rather than crashing on a divide-by-zero error.
get_milliseconds()
and get_microseconds()
are implemented by multiplying get_ period_count()
by 1000 and 1,000,000, respectively, and dividing by the frequency. In order to avoid truncation of the result when the period_count
is low, or overflow when it is high, the multiplication is carried out first if overflow will not occur, and afterwards if it will.
Since it is required only for calculating the period, rather than measuring it, I employed a combination of late evaluation and statics to defer the expensive call to QueryPerformanceFrequency()
until after the measurement is complete, as well as, of course, only doing it once per process. Indeed, if you only use get_period_count()
and not get_seconds()
, get_milliseconds()
, or get_microseconds()
then this cost is not incurred at all.
processtimes_counter
and threadtimes_ counter
As well as providing the four period methods that all the other counter classes provide, these two classes also provide four corresponding methods each for kernel time and user time. processtimes_counter
and threadtimes_counter
record the values for kernel and user time obtained from GetProcessTimes()
and GetThreadTimes()
, respectively, in their start()
and stop()
methods into the m_kernelStart, m_kernelEnd, m_userStart,
and m_userEnd
members (casting in the same way as in systemtime_counter
). In addition to this being a finer-grained level of measurement, the figures obtained from this class are not affected by other processes, which is the case for the three other classes. (I have included a program, counter_isolation
, in the archive that demonstrates this behavior for the threadtimes_counter
class).
The thread/process handles are specified as the current ones, via GetCurrentThread()
/ GetCurrentProcess()
. threadtimes_counter
records the current thread handle in a member variable to avoid unnecessarily repeating this small but nonzero cost. processtimes_ counter
uses a static technique such that GetCurrentProcess()
is called only once per process. The creation time and exit time values obtained from GetThreadTimes()
/GetProcessTimes()
are ignored. (They are, in fact, fixed values, and the exit time is not actually valid until the given thread has exited.)
These two classes have the following attribute methods in addition to those they share with the three other classes:
... // Attributes public: ... interval_type get_kernel_period_count() const; interval_type get_kernel_seconds() const; interval_type get_kernel_milliseconds() const; interval_type get_kernel_microseconds() const; interval_type get_user_period_count() const; interval_type get_user_seconds() const; interval_type get_user_milliseconds() const; interval_type get_user_microseconds() const; ... };
get_kernel_period_count()
and get_user_ period_count()
are implemented as returning the difference of the kernel members and user members, respectively. The implementation of get_period_count()
is as the sum of get_kernel_period_count()
and get_user_ period_count()
. The calculations of all the seconds, milliseconds, and microseconds are performed in the same way as those of systemtime_counter
.
counter_scope
The similar public interface to each class facilitates the use of a scoping template class, performance_counter_scope (shown in Listing Seven) implementing the "Resource Acquisition Is Initialization" idiom which may be parameterized on a particular counter class. The constructor takes a reference to a counter class instance, and then calls start()
. stop()
is called in the destructor, providing a scoped timing operation. It also provides access to the stop()
method in order to support intermediate staged timings, and a reference to const of the managed counter class such that intermediate timing values can be obtained. An example of its use is shown in Listing Eight.
Listing Seven: Extract from winstl_performance_counter_scope.h
/* ///////////////////////////////////////////////////////////// * ... * * Extract from winstl_performance_counter_scope.h * * Copyright (C) 2002, Synesis Software Pty Ltd. * (Licensed under the Synesis Software Standard Source License: * http://www.synesis.com.au/licenses/ssssl.html) * * ... * ////////////////////////////////////////////////////////// */ // class performance_counter_scope template <ws_typename_param_k T> class performance_counter_scope { public: typedef T counter_type; typedef performance_counter_scope<T> class_type; public: ws_explicit_k performance_counter_scope(counter_type &counter) : m_counter(counter) { m_counter.start(); } ~performance_counter_scope() { m_counter.stop(); } void stop() { m_counter.stop(); } // This method is const, to ensure that only the stop operation // (via performance_counter_scope::stop()) is accessible // on the managed counter. const counter_type &get_counter() const { return m_counter; } // Members protected: T &m_counter; // Not to be implemented private: performance_counter_scope(class_type const &rhs); class_type const &operator =(class_type const &rhs); };
Listing Eight: Extract from counter_cost.cpp
/* ///////////////////////////////////////////////////////////// * ... * * Extract from counter_cost.cpp * * Copyright (C) 2002, Synesis Software Pty Ltd. * (Licensed under the Synesis Software Standard Source License: * http://www.synesis.com.au/licenses/ssssl.html) * * ... * ////////////////////////////////////////////////////////// */ #include <stdio.h> #define _WINSTL_NO_NAMESPACES #include <winstl.h> #include <winstl_tick_counter.h> #include <winstl_multimedia_counter.h> #include <winstl_highperformance_counter.h> #include <winstl_systemtime_counter.h> #include <winstl_threadtimes_counter.h> #include <winstl_processtimes_counter.h> #include <winstl_performance_counter.h> #include <winstl_performance_counter_scope.h> #include <winstl_performance_counter_init.h> /* //////////////////////////////////////////////////////////// */ const int C_ITERATIONS = 1000000; /* //////////////////////////////////////////////////////////// */ typedef highperformance_counter application_counter_type; template< ws_typename_param_k C1 , ws_typename_param_k C2 > inline ws_typename_type_k C1::interval_type test_cost(C1 &app_counter, C2 &counter) { for(int i = 0; i < 2; ++i) { performance_counter_scope<C1> scope(app_counter); for(int j = 0; j < C_ITERATIONS; ++j) { counter.start(); counter.stop(); } } return app_counter.get_milliseconds(); } int main(int /* argc */, char* /* argv */[]) { performance_counter_init<application_counter_type> app_counter; #if defined(_STLSOFT_COMPILER_IS_BORLAND) || \ defined(_STLSOFT_COMPILER_IS_INTEL) || \ defined(_STLSOFT_COMPILER_IS_MSVC) #define _counter_test_fmt "%I64d" #else #define _counter_test_fmt "%lld" #endif /* compiler */ #define _test_counter(_x) \ do \ { \ _x x; \ \ printf( #_x ": " _counter_test_fmt "us\n", \ test_cost(app_counter, x)); \ } \ while(0) _test_counter(tick_counter); _test_counter(multimedia_counter); _test_counter(systemtime_counter); _test_counter(highperformance_counter); _test_counter(threadtimes_counter); _test_counter(processtimes_counter); _test_counter(performance_counter); return 0; }
The original proprietary implementations of the performance classes called their start()
methods in their constructors, as well as initializing their member variables, as a syntactic convenience, such that the following would produce meaningful results:
performance_counter counter; some_operation(); counter.stop(); printf("...", counter.get_xxx());
However, the observed use of these classes in almost all cases along with the strong requirement for them to be as efficient as possible, has shown this to be a mistake. Because instances are often used in a number of start()
-stop()
cycles, as can be seen in the test program, having start()
called in the constructor complicates the semantics for no net benefit. Nor does it ensure that the instance has a coherent state, since only when a subsequent stop()
call is made do the attribute calls have well-defined behavior (see the section "Initialized Counters").
Performance Analysis
The test scenarios described here were executed on the following platforms: Windows 98 (233 MHz), NT 4 (400 MHz), 2000 (650-MHz laptop), 2000 (dual 550 MHz), NT 4 (dual 933 MHz), 2000 (dual 933 MHz), and XP (2 GHz). (All program code and supporting files are included in the archive, along with Visual C++ 6 and Metrowerks CodeWarrior 8 projects.)