Performance Analysis
We studied the performance of the AES-CBC + Elephant diffuser (the original implementation and the one recommended here for maximum performance) and the proposed AES-ECB + Elephant diffuser on a single processor, then estimated their performance on a dual-core processor.
The measurements we report are processor clock cycles on a PIV 3-GHz processor running Windows Vista. The programming environment is Microsoft VC++.
Single Processor
Our optimized implementation for Diffuser A and Diffuser B (using a loop unrolling mechanism) shows that:
- 4560 clock cycles are required for the current implementation of the diffusion layer.
- 256 cycles (128 for the XOR process of the sector key and the other 128 in the XOR process in the CBC mode, using 32-bit XOR operation).
- 13,888 clock cycles for the AES encryption (using optimized assembly language).
That is, it takes 18,704 clock cycles to encrypt a 512-byte sector using the AES-CBC + Elephant diffuser. This value can be reduced to 15,854 if you use (AC=2 and BC=1); that's about an 18 percent enhancement in the total running time.
With the AES-ECB + Elephant diffuser:
- 4560 clock cycles are required for the current implementation of the diffusion layer.
- 128 clocks for the XOR process of the sector key.
- 32 clocks for the addition of the counter.
- 13,888 clock cycles for the AES encryption.
In this case, 18,608 clock cycles are used to encrypt a 512-byte sector using the AES-ECB + Elephant diffuser. This value can be reduced to 16,328 if the minimum recommended values are used to achieve maximum performance (AC=2 and BC=2). That's about a 14 percent enhancement in the total running time.
Dual Processor
To take advantage of dual-core processors, we investigated the AES-ECB + Elephant diffuser with the AES-ECB layer (which can be easily parallelized). Here, we estimated the processing time when a dual-core processor was used. For simplification, we divide the processing time by two when parallelization can be done.
In the case of the AES-CBC + Elephant diffuser, XORing with the sector key can be parallelized, so it takes only 64 clock cycles. Neither the diffusion layer nor the AES-CBC can be parallelized (by definition, they are serial). So the estimated processing times are 15,696 (when AC=5 and BC=3) and 12,846 (when AC=2 and BC=1) clock cycles for encrypting a 512-byte sector.
For the AES-ECB + Elephant diffuser, because counter addition can be parallelized, it takes only 16 clock cycles. The XORing with the sector key can be parallelized, so it only takes 64 clock cycles. The diffusion layer cannot be parallelized. But since the AES-ECB layer can be parallelized, the estimated processing times are 11,584 (when AC=5 and BC=3) and 9304 (when AC=2 and BC=2) clock cycles for encrypting a 512-byte sector. This is about 60-100 percent faster than the original AES-CBC + Elephant diffuser implementation (depending on the values of AC and BC).