Generally speaking, benchmarking is the process of comparing two or more systems to determine which is more efficient or provides better performance, with the ultimate goal of making improvements to one or more of the systems being measured. It's no surprise that few processes in software development are more controversial than benchmarking, as tests are often tweaked to make one product appear to be better than it actually is, or to make it look better than its competitors. In addition, some benchmarks are simply meaningless or irrelevant in what they measure and how they apply to the system under scrutiny. Clearly, for a benchmark to have any value, you need to understand what is actually being measuredand under what conditions.
With all of these caveats in place (along with the familiar cop out "your mileage may vary"), it is worth noting that there are valid reasons for benchmarking systems, the least of which is that benchmarking is in itself interesting. When it comes to computer systems, most benchmarking seems to occur when measuring CPU performance, network throughput, 3D graphic cards, and the like. In this article, however, I compare programming languages with their associated compilers and runtime environments, in particular C++, C#, and Java. To do so for this benchmark, I've modified, extended, and fixed code portions from tests implemented by Christopher W. Cowell-Shah [1] and Doug Bagley [2], whose original shootouts are now out of date or had some flaws [3].
I refer to the tests presented here as "micro benchmarks," which means they perform elementary operations that every programming language should offer. I've split the end results into two categoriessimple data processing and arithmetic functionsbecause these are the two major distinctions of what applications do most of their time (80/20 rule [4]). These tests give a good picture of commonly used algorithms and data structures that you find in most applications today. That's the reason I chose them. I haven't addressed other issues, such as multithreading on multiple CPUs, because they deserve an article in their own rite.
The goal of this micro benchmark is to evaluate the performance characteristics of Java, C#, and C++ on Windows and Linux. The source code files I use can be downloaded from ToMMTi-Systems [5].
Table 1 lists the settings I use for the different language platforms. For some unknown reason(s), compiling with the Intel C/C++ Compiler -fast
option failed under Linux; that's why I used -O3
instead. I would have liked to include the Intel C/C++ Compiler for Windows results, but the STLPort 5.0 RC2 failed to compile under Windows. I used both the GCC STL and STLPort 5.0 RC2 STL to get comparable results for Linux and Windows, because Microsoft Visual C 7.1 has a slow HashMap STL implementation.
Language | Compiler Settings | Runtime Settings |
Java | -g:none -target: 1.4 (JRE1.4) | -server or -client |
-target: 5 (JRE5) | ||
(IBM JRE supports no client/server option) | ||
C# | /optimize+ /debug- /checked- /unsafe+ /target:exe | .NET: none |
Mono: --optimize=all | ||
C++ | GCC: -O3 -march=pentium4 -msse -msse2 -mfpmath=sse | none |
Intel C/C++: -O3 -march=pentium4 -mcpu=pentium4 | ||
Microsoft Visual C 7.1: | ||
/Ox /Og /Ob2 /Oi /Ot /Oy /GT /GL /G7 /GA /GF /FD /EHsc /ML /arch:SSE2 |
Table 1: Compiler settings used for different language platforms.
Because the .NET Framework 1.1 lacks a LinkedList implementation, I used Rodney S. Foley's [6] source code to fill this gap. The .NET Framework 2.0 (currently at beta 2 status) includes a LinkedList implementation under System.Collections.Generic
, but it is not used in this benchmark to make the C# comparison more accurate, but included in the second file of the C# benchmark. My own tests showed that it is a bit slower (by approximately 6 percent) than Foley's implementation, maybe due to the use of generics.
I didn't use Java 5's StringBuilder
implementation for the same reason. I wanted the 1.4.2 baseline comparison to be more accurate. On the client VM, StringBuilder
is 150 percent (dynamic memory allocation) and 310 percent (fixed-memory allocation) faster than StringBuffer
. On the server VM, it's even faster170 percent and 470 percent, respectively.
Most of these micro benchmarks are straightforward and have just enough code to hopefully not get optimized away. They should show a direction, but they should not be taken as absolute values. I've spent time making them as solid as I could in terms of optimization safety, repeatability, and equality. Every test, except the arithmetic and the trigonometric, is run 10 times to let the JIT do its work and to compensate for small errors due to the operating system's background tasks. I won't give any rating because I simply can't rate a platform. It all depends on what you want to make with a platform and what benefits and compromises you can accept and what not. Furthermore, I can't explain the results. I could speculate, but that would not help. There are more knowledgeable people out there who can perform this task.
Configuration wise, the hardware for my testbed is a 3.20-GHz Intel Pentium 4 (Stepping 09) with Hyperthreading Technology, 512 KB L2-Cache, and 1256-MB PC2700 (333 MHz) of memory. There are two software configurations. For Linux, I ran Mandrake Linux 10.1, Vanilla Kernel 2.4.11 (SMP enabled), Glibc 2.3.3, GNU C++ Compiler (GCC) 3.4.1, the Intel C++ Compiler 8.1 (build 024), STLPort 5.0 RC2, Mono 1.0.6, Sun Java 1.4.2_07 (build 05), Sun Java 1.5.0_02 (build 09), and IBM Java 1.4.2 (build cxia32142sr1a-20050209). On the Windows side, I used Windows XP Professional + SP2, Microsoft Visual Studio 7.1 C++ Compiler (build 7.10.3077), .NET Framework 1.1 + SP1 (build 7.10.6001.4), .NET Framework 2.0 Beta 2 (build 8.00.50110.28), the Intel C++ Compiler 8.1 (build 20050201Z), Sun Java 1.4.2_07 (build 05), Sun Java 1.5.0_02 (build 09), IBM Java 1.4.2 (build cxia32142sr1a-20050209), and JIT Excelsior 3.6 (mp3) + Sun Java 1.4.2_06 (build 03).
Before proceeding with the benchmarks, it is important to note that, in contrast to ordinary benchmarks, a lower value in these tests is always better because it is the measured time to perform an operation.
Figure 1 shows the results of the 32-bit integer arithmetic test. It simply does the basic arithmetic operations with 32-bit integer variables in a loop. Here is the pseudocode:
while (i < Max) { Result -= i++ Result *= i++ Result += i++ Result /= i++ }
Figure 1: 32-bit integer arithmetic.
The 64-bit double arithmetic test in Figure 2 is same as the 32-bit integer arithmetic test, but uses 64-bit double variables instead of 32-bit integer. Here is the pseudocode:
while (i < Max) { Result -= i++ Result *= i++ Result += i++ Result /= i++ }
Figure 2: 64-bit double arithmetic.