The Challenge
Data-management code that operates inside the kernel faces considerable challenges, whether this module is custom-made or off-the-shelf. Kernel-mode drivers or application components have to be nonintrusive to the system. Therefore, the integrated data management cannot write data to a hard disk, even on systems where a large filesystem cache is present, because the transition commit on disk I/O would cause too much impact on the system. Data management can't monopolize or extensively use any of the system's resources, increase interrupt latencies, or noticeably affect overall system responsiveness to outside events. It has to provide simultaneous access to data for all parts of the system, including from multiple user-mode processes and kernel threads. Further, this data access must include both write/read access and be efficient enough to avoid stalling the kernel.
The kernel-mode database is made available to user-mode applications through a set of public interfaces implemented via system calls; see Figure 1.
For the kernel-mode database integration in Figure 1 to work, the kernel-based database runtime must address many of the same challenges as kernel-based programs and embedded systems generally. The issues and solutions involved in deploying a database in kernel space are also highly applicable to device drivers, filesystems, and other kernel modules:
- To provide multithreaded, simultaneous data access, the database runtime must use synchronization mechanisms. Nearly every driver requires synchronization mechanisms as well, because this function is needed for any shared data to be accessed by multiple threads, unless this access is read-only. Synchronization is also required when a set of operations must be performed atomically, inside a transaction. OS synchronization mechanisms can cause performance bottlenecks. To improve locking performance, the kernel-mode database uses locks based on the atomic exchange instructions provided by many CPU architectures. When the nature of the resource requires mutual exclusion between threads, the database runtime claims resources only for a short time. Thus the database locks use a low-overhead spinlock-based mechanism that protects the resource.
- Efficient memory use is often a key to application performance, especially in the kernel where the nonpaged memory pool is limited and frequent paging could increase kernel latencies. To address this, in-memory databases often use a number of custom allocation algorithms to take advantage of problem-specific allocation patterns, such as infrastructure for the data layout, internal database runtime heap management, and so on.
- The kernel-mode stack is a limited storage area that is often used for information that is passed between functions, as well as for local variable storage. Running out of stack space causes the OS to crash. Therefore, the database runtime integrated with the kernel module and other drivers must watch stack usage. It must never allocate large aggregate structures on the stack, and avoid deeply nested or recursive calls. If recursion must be used, the number of recursive calls must be strictly limited.
- Not all of the standard libraries (C and especially C++) are present in kernel mode. Moreover, versions of standard libraries for use in kernel mode are not necessarily the same as those in user mode, as they are written to conform to kernel-mode requirements. Kernel-mode implementations of standard libraries usually have limited functionality and are constrained by other properties of kernel mode. (The eXtremeDB-KM database runtime does not use the C runtime. For instance, instead of relying on the C runtime for memory management, the database replaces those functions with custom allocators.)