Algorithm Three (Cache Line Optimized Light Pipe)
In the aforementioned mechanism there are a few performance problems. One of them is cache trashing (false sharing) which can occur when Writer and Reader are writing to the same cache line (see [1] for more details about false sharing). In this case the cache synchronization mechanism will need to pass written data inside the same cache line a few times between processor caches. To avoid this behaviour the above mechanism needs to be updated:
- Internal data of the Reader and Writer processes should be positioned to different cache lines. If there are a few levels of cache in the system the longest cache line should be taken into account.
- Shared memory needs to be aligned by the cache line length. If there are a few levels of cache in the system the longest cache line should be taken.
- Reader algorithm needs to be updated:
int cacheLineLengthInWords = 8; // Platform dependent int i = 0; while( true ) { word theWord; while( (theWord = SharedMemory[i]) == 0 ) { DoSomethingUseful(); // Or just wait or spin } // If it is the last word in cache line if((i +1) mod cacheLineLengthInWords == 0) { // Clean-up the whole cache line for(j = 0; j < cacheLineLengthInWords; ++j) { // It is important to write words backward SharedMemory[i - j] = 0; } } ProcessDataReceivedFromTheFirstProcess(theWord); i = (i+1) mod N; }
This update will change the behaviour of Reader in such a way that it postpones writing back any zero-valued words to the cache line until the whole line is read. It writes zero values backwards from the end of the cache line deliberately to avoid the situation when Writer starts writing into the cache line before it is completely cleared with zero values by Reader.