Observations
After using the synchronization mechanisms implemented by the QLock class to solve the Singleton problem on Windows (and more generally, the problem with the static initialization of mutexes on Windows), I discovered that QLock has some other important properties that made it attractive for other applications (including on UNIX). QLock::Mutex has a small memory footprint (one pointer), and a small initialization cost (that of initializing a pointer with 0). Additionally, when there is no contention, QLock is fast. If no wait is necessary, locking a QLock requires just one atomic instruction and no system calls. If there are no waiting threads, unlocking is also just one atomic instruction.
Furthermore, when there is contention, the cost of constructing event objects (or semaphores on UNIX) can be virtually eliminated by using a lock-free pool of allocated event objects. This pool works particularly well because the number of event objects required never exceeds the number of waiting threadsan inherently small number.
One common scenario in which QLock appears to be of great use is when you have a large collection of some kind of items that are accessed from multiple threads. To increase concurrency, it may be logically possible and desirable to maintain a mutex per item of the collection (instead of maintaining a single mutex for the entire collection). It is often the case, though, that the memory footprint and the initialization cost of a standard mutex or critical section would be prohibitive due to the (sometimes very large) size of the collection. QLock, because of its small size and negligible initialization cost, may be able to provide a viable solution to fine-grain locking.