"..in all honesty, I would suggest that people who want a **MODERN** "free" OS look around for a microkernel-based, portable OS, like maybe GNU.."-- The Tanenbaum-Torvalds Debate
This famous quote by Andrew Tanenbaum discussing Linux with Linus Torvalds in January 1992 is taken from the often-cited "Linux is obsolete debate." Tanenbaum in part argues that the Linux kernel has a monolithic design and is therefore "old-fashioned". In 1992, there were few compelling reasons to introduce the complexity of the microkernel design into mainstream production ready operating systems and even today most operating systems in-use are of the monolithic variety.
Linus Torvalds agrees with Tanenbaum regarding the microkernel design:
True, Linux is monolithic, and I agree that microkernels are nicer. From a theoretical (and aesthetical) standpoint Linux looses. If the GNU kernel had been ready last spring, I'd not have bothered to even start my project: the fact is that it wasn't and still isn't. Linux wins heavily on points of being available now.
-- The Tanenbaum-Torvalds Debate
As an aside, GNU's The Hurd is now finally in its alpha stages of development--13 years later! The Linux kernel is available for a wide variety of architectures, powerful, flexible, and compact. It also supports symmetric multi-processing for running user space applications in parallel. However, Linux still has a monolithic design legacy derived from the earliest design decisions.
Until now, it has not mattered much whether or not an OS was a monolithic or microkernel design, so research in this area has been left to "academics". The important features required of a production ready operating system include: software/hardware support, reliability and security. An example of software support is real-time scheduling for mission critical applications or a TCP/IP protocol stack. Hardware support includes CPUs and hard drives but also covers flash memory and network chipsets. Reliability is very important to users to prevent service interruption, data loss and the like. Security is critical in today's world of viruses and hackers.
The exponential improvement in CPU clock speed as well as on-chip cache and pipelining has been a "free lunch" for monolithic operating systems and supported applications. This concept is clearly presented in Herb Sutter's article "A Fundamental Turn Toward Concurrency in Software" in the March 2005 issue of Dr. Dobb's Journal. In this article Sutter discusses the idea that concurrency will be the next revolution in software development. Developers will simply focus on support and reliability and ride the wave of performance explosion. This has worked for decades, but times have finally changed.
What has changed is the fact that chip manufacturers have hit a wall in the ability to continue along the path of Moore's Law. This fact is well documented and since 2003 has prevented exponential growth in CPU performance when measured with the traditional clock speed and MIPS measurements. What the industry has done is to begin to implement hardware multi-threading and chip multiprocessors--also known as "multi-core processors."
Chip multiprocessors allow more than one thread of execution to run concurrently with other threads. This device category's defining feature is more than one processor embedded in a single package. Whatever the physical implementation, they present the same potential performance improvement to the programmer, which is to allow multiple threads of execution run concurrently.
Today, and into the foreseeable future, the way software will garner additional performance improvements is through algorithmic optimization. The optimization will fall into two main categories: efficiency and concurrency. Efficiency improvements include optimizing the walking of data structures or hand optimizing "hot spots". This will only get you so far. The big improvements will come from working with concurrency.
Concurrent programming at the application level means implementing parallel algorithms. The same goes for the operating system, it must exploit concurrency so that its performance improves significantly as the number of cores in chip multiprocessors increases.
The operating system will need to support concurrency at a granular level so as to take advantage of the scalability of chip multiprocessors. Today dual core CPUs are available from a handful of vendors and even quad cores can be had. Into the near future we can expect 8, 16, and possibly even more cores.
A worthwhile question to ask is: what is the point of having 16 cores if most of the time I am able to run only seven threads of execution concurrently? The unfortunate answer is that the unused cores sit idle with nothing to do.
Put simply, if you cannot find enough concurrent threads of execution to fully utilize a chip multiprocessor then the left over cores are as useful to the performance of the system as the cup of coffee sitting on your desk!
It is up to the operating system and user applications to efficiently exploit the high throughput of this new design paradigm. Symmetric multiprocessing only partially addresses the problem. To be able to fully scale to the concurrency available in future CPUs we will need highly granular operating systems. Thankfully, this topic has been discussed in detail and solutions have been presented by those very same "academics" that in the past were too far on the bleeding edge.
Threads of Execution, Concurrency and Chip Multiprocessors
A thread of execution is able to run on a single processor core. It is a discrete unit and we use the term when discussing concurrency. The topic of concurrency is concerned with more than one thread of execution running in parallel. Threads of execution running concurrently means that more than one "thread" is running at the same time--one "thread" per processor core. Threads of execution running sequentially means that one "thread" runs after the previous "thread" stopped running so that only one is running at any point in time.
Multi-tasking operating systems have several threads of execution running. They are each in well-defined states. Some are called processes or tasks and others are threads within a process. Even the kernel is a thread of execution. All of these threads of execution run sequentially when running on a single processor core such as what most users have at home today. They all time-share a single processor and do not run concurrently.
Chip multiprocessors are a category of processors that have multiple processor cores in a single package. They can be multi-chip modules where multiple processor cores are mounted to a single substrate within a single package, soft-core microprocessors instantiated in field programmable gate arrays or the multiple processor cores etched on the same silicon die.
Chip multiprocessors are not the only way to run threads of execution concurrently. In fact, for several years parallel super-computers have been at the bleeding edge of high performance computing. More recently, Beowulf clusters and even symmetric multiprocessing servers have been able to run threads concurrently. These systems can all benefit from software designed to run concurrently, and they often do, but it is only recently that the critical mass of the microprocessor industry has pushed the need for concurrency toward mainstream computing applications.
Monolithic versus the Microkernel Operating System Design
The monolithic kernel design is a single program executable that performs all kernel functions. This means the entire operating system has a single thread of execution running in kernel mode. The monolithic kernel contains all of the features of the operating system including: hardware drivers, protocol stacks, process scheduling, filesystem support and memory management.
A microkernel operating system is designed as a set of cooperating user space processes that interact through some form of inter-process communication. The minimal set of functionality that must be performed sequentially, such as certain interrupt handlers and handling critical sections of the inter-process communication, are performed by a single thread. This minimal set of functionality must be carefully considered during the design phase of operating system development to gain optimal benefit.
The microkernel operating system inherently has more threads performing work concurrently than its monolithic cousin. This provides more opportunities for concurrent execution and therefore superior utilization of the multiple processor cores in modern and future chip multiprocessors.
With this understanding we can now take a look at our example operating systems. Linux, being of the monolithic variety, has a single executable that is usually called vmlinux in its uncompressed form. The vmlinux binary is loaded upon system bring-up by the boot loader. This is the operating system kernel and contains system services required by all of the applications executed on the system. This set of services is for the most part chosen at kernel build time and can be extended using loadable modules. Depending on the system hardware, the kernel may support symmetric multiprocessing such that it can run multiple user space processes and the threads of multi-threaded applications concurrently.
GNU's The Hurd is a microkernel operating system. It is a collection of software programs and not a single binary executable. It is designed to provide similar services to applications executed on the system as Linux, but does so using a bare-bones microkernel. The microkernel itself is named Mach, with user space processes performing most of the work. The primary design differentiator is that these services are performed outside of the kernel in contrast to services that Linux does within the kernel itself.
Examples
Let's walk through a couple examples to get a solid understanding of the potential performance benefit that can be realized by using a microkernel operating system on a chip multiprocessor based computing platform. For these examples we will assume we have a brand new platform from our favorite vendor that sports a cutting edge 16-core chip multiprocessor. The processor implements some well-supported instruction set architecture so that it can run your favorite applications--all you need is an operating system. The operating system is required to provide system services such as a filesystem, network connectivity, process scheduling and memory management so that applications can get work done. In both examples the system is performing identical tasks, running on identical hardware and under exactly the same conditions except for the difference of the operating system used.
In this first example, assume I am running a web server and the web server application is multi-threaded. Additionally, most of the time I have six other processes that are independent and often have work to get done. For arguments sake, I will say that on average I am able to run five threads of the multi-threaded web server and six other processes concurrently for a total of 11 concurrent threads of execution.
Under Linux, you can 12 concurrent threads when including the kernel's thread of execution. This is assuming the case where the kernel has something to do while the other threads of execution are busy. That is reasonable since it is handling all of the system services.
Now I take a look at this example when using The Hurd. We will assume that there are four operating system services that are handled as concurrent user space processes and run in parallel most of the time. This yields 15 total concurrent threads of execution. Just as in the Linux case we have five web server threads running and six other processes but now we have more operating system threads of execution thanks to the microkernel design.
In a nutshell, The Hurd utilizes 15 out of 16 processor cores on average where as Linux was able to use 11.
The second example is not a server but instead the desktop scenario. The desktop scenario is one in which a user has the computing platform sitting on their desk and they are using it to read documents, write emails, surf the Net, and so on.
A similar performance improvement can be gained by using microkernels in the desktop scenario. A user running their favorite web browser may have two threads of execution running concurrently on average. The system services they need for this include the filesystem for caching data, the memory manager and network services.
In this example, Linux provides an additional concurrent thread of execution so that a total of three are running on average. One thread of execution for the Linux kernel and two for the web browser are running concurrently on average for a total of three.
The Hurd uses more than one concurrent thread to provide the three system services we are using. For argument's sake, we will say it has a total of three concurrent threads handling those services. So, we have a total of five threads of execution on average.
In this example The Hurd continues to provide more concurrent threads of execution than Linux. Yet again, more processing performance is squeezed from the chip multiprocessor due to The Hurd's advanced design.
Now it should be clear that on a 16-processor core chip multiprocessor you are making a better use of the available processor resource using The Hurd. Concurrency within The Hurd is fine grained so that you are able to use more of the available processing resources. The microkernel is designed with fine-grained concurrency thereby benefiting more from chip multi-processors than a monolithic operating system.
These examples are intentionally oversimplified to emphasize the benefit of the microkernel operating system design when used in conjunction with a chip multiprocessor. A real world system is constantly changing states and is extremely complex.
Conclusion
It is difficult for programmers to think about multiple things happening at the same time. Problems that do not happen in sequential program flow such as shared memory corruption, race conditions and deadlocks are commonplace. Many programming languages are designed for a sequential program flow and writing multi-threaded software is not simple. Certain algorithms simply are not designed to run concurrently. Legacy applications are often not written using a multi-threaded design and until they are re-written they will not have more than a single thread of execution. For these reasons and others, concurrency is difficult to take advantage of and so programmers are better off with as many opportunities to exploit it as possible. This is why we must consider using a microkernel operating system now that concurrency will be the forerunner to achieving performance gains.
Using a microkernel operating system design allows better utilization of the available resources of chip multiprocessors. This operating system design inherently provides more concurrent threads of execution than a monolithic design and therefore is the best choice for the new direction in processor technology. Finding ways to exploit concurrency across a broad range of applications is important if we are to continue to enjoy the improvements in processor technology that we have enjoyed for the past few decades.
On a chip multiprocessor, the microkernel operating system has the advantage of finer granularity in its concurrent threads of execution that provide system services than does the monolithic kernel. For this reason, the benefit gained by running a microkernel applies to a wide range of applications that depend on using these system services. In many cases a microkernel will have more concurrency available for a chip multiprocessor to leverage.
It is not clear that using a microkernel on a chip multiprocessor will translate directly to an improvement in performance. It is application dependent and it is possible that even though a the microkernel is able to run more than one thread of execution to perform a system service it has nothing to do or is blocked for some reason. At that point it does not provide additional performance benefit from concurrency. This is the reason it is very important to invest the proper amount of time in the design of the microkernel operating system so that the performance gain from concurrency is maximized.
There are memory bandwidth, bus contention and other hardware related issues that can become a bottleneck for the overall system performance. When a system suffers these types of ailments then we are really saying that concurrency is not the bottleneck in overall system performance and we cannot expect a chip multiprocessor to solve our problem.
Hopefully, we have shed light on the desirability of using a microkernel operating system on chip multiprocessors. Unfortunately, few studies directly comparing the performance of the microkernel and monolithic designs on chip multiprocessors are available. The developer community needs to perform some real-world experiments on The Hurd and Linux running on chip multiprocessors to quantify the actual performance benefit achieved. This may lead to design improvements in both operating systems.
In closing, we are urging the open source developer and user community to give The Hurd some of their time, even just a small amount, to the benefit of us all. After all, it may just end up being the open source operating system of the future.
Slade Maurer is a software developer in San Francisco. He can be contacted at [email protected] .