Advanced PM Features
Traditionally, performance monitoring hardware counts the occurrence or duration of events. However, recent CPUs, such as the Intel Itanium2 and IBM Power4/5, include more advanced features. The Itanium2 contains such features as:
- • Address and opcode matching. With address matching, the event counting can be constrained to events that are triggered by instructions that either occupy a certain address range in the code or reference data in a certain address range. The first case enables you to measure, for instance, the number of mispredicted branches within a procedure in a program. In the second case, you can measure things like all the cache misses caused by references to a particular array.
- Opcode matching constrains counting to events caused by instructions that match a given binary pattern. This allows the counting of instructions of a certain type or execution on a particular functional unit.
- • Branch traces. A CPU with a branch trace buffer (BTB) can store the source and target addresses of branch instructions in the PMU as they happen. On the Itanium2, the branch trace buffer consists of a circular buffer of eight registers in which source and target addresses are stored. Address capturing can be constrained by the type of branch, whether it was correctly predicted and whether the branch was taken.
- • Event addresses. With event address capturing, addresses and latencies associated with certain events are stored in Event Address Registers (EARs). This means that the PM hardware captures the data and instruction addresses that cause, for example, an L1 D-cache miss together with the latency (in cycles) of the associated memory access.
P.M., N.S., and P.E.