Checkpointing, CHPOX, and Linux

Checkpointing lets you grab a snapshot of the current state of a program in execution, then save it on disk.


April 21, 2006
URL:http://drdobbs.com/open-source/checkpointing-chpox-and-linux/186500457

Checkpointing is a mechanism that lets you grab a snapshot of the current state of a program in execution, then save it on disk. Later the state of the program can be resumed, even after the machine has been rebooted or turned off.

Why do you need checkpointing? Consider these scenarios:

All of these scenarios (and there are a lot more) lead to one thing--you need a checkpointing solution so you won't lose a task's progress. Even though the task has native save/resume feature, checkpointing still give some help as the alternative in case there is a failure when you try to roll back using native feature. Keep in mind that both features doesn't guarantee flawless resume nor support every type of tasks.

In this article, we examine on open-source checkpointing project called CHPOX, short for "CHeckPOinter for Linux". CHPOX is a Linux kernel module for transparent dumping of specified processes into disk file, then restarting them. CHPOX was created by Olexander Sudakov and Eugeniy at the National Taras Shevchenko University, Kiev (Ukraine). Since CHPOX uses the kernel module approach, it can be supported by kernel Version (2.4.x) and dynamically inserted to or removed from kernel space, only loading when you need it.

For more information on checkpointing tools, go to www.checkpointing.org.

How Checkpointing Works

Here's a general description of how CHPOX works:

  1. After CHPOX's module (chpox_mod) is successfully compiled, it must be loaded into the kernel before any checkpointing can be done.

  2. To begin checkpointing, users must register the process (its children will automatically get registered) by either directly inserting the parameter to /proc/chpox, or managing it via chpoxctl (a userspace tool). If needed, you can also register the linked library so CHPOX can resume the process successfully. (According to my experience, this can be ignored depending what type of program you have).

    You also need to define a signal number for CHPOX to use. This signal can be set per registered process tree. You usually assign an unused signal number (like "signal number 31--SIGSYS") to avoid conflict with standard signals.

  3. To make a checkpoint, you can simply send the signal to the target process (normally using kill). The target process receives the signal and autonomously does the following:

    1. Save header of the dump file that contain information about hardware architecture, kernel version, dump file creation time and number of processes in the file.

    2. Save process name into a dump file. The name of the children's process are saved later if the parameter passed to CHPOX says to do so.

    3. Via task_struct info, CHPOX finds its related virtual memory area (using the find_vma() procedure) and dumps the content from vm_start to vm_end. To do this, CHPOX utilizes a modified version of VMADump (from BProc and Scyld).

    4. CHPOX also records additional information, such as opened UNIX domain sockets, opened files, status of pipes, current working directory. This information is saved into a single dump file. Checkpointing can be done as often as you like, and the dump file contains the latest successful checkpoint.

    5. If the parameter passed to chpox says to save children processes, the above steps (except for the first) are repeated to each child process.

    6. If you want to resume from the dump file, you must pass the dump file as an argument to the ld-chpox tool. To avoid conflict, make sure the old process has completed or has been killed; otherwise, unexpected errors can occur.

Using CHPOX

To begin using CHPOX, you need to download and install the CHPOX package. The latest CHPOX release had been tested to work on Linux kernel 2.4.28 and should be compatible with the latest 2.4.x Linux kernel releases. CHPOX is released in source-code format and Debian package format. In this article, since we explain how to compile, install, and use CHPOX, we assume that you will use source-code format. You need to compile it against Linux kernel source. CHPOX supports the IA32, PowerPC, s390 and s390x (64-bit version of s390) architectures.

You can either use the kernel header prepackaged with your Linux distribution or download kernel source from www.kernel.org. Just remember that CHPOX is made to be fully compatible with 2.4.x releases, a port to the Linux kernel 2.6 series is still work in progress. You need to unpack the kernel source and at least do:

$ make [config|menuconfig|xconfig]
$ make dep

To prevent problem when loading CHPOX module, configure the kernel source exactly like the running kernel. On most distro, kernel configuration is kept in the /boot directory with names like config-2.4.20-19.9 or something similar. If you have one, simply copy it to the kernel source directory and rename it to:

#make config

do:

#make oldconfig

Doing so produces the kernel configuration needed by CHPOX to detect certain things such as which architecture the kernel is running on, and whether openMosix support is needed or not and so on.

Configuring and installing CHPOX is straightforward. After downloading the latest tarball from the CHPOX Web site, unpack it and do:

# ./configure --with-linux=/path/to/your/linux-kernel-source

or:

# ./configure -with-linux=/path/to/your/linux-kernel-source \
   --with-sysmap=/path/to/your/System.map

Providing the path to System.map is necessary if you plan to use CHPOX on top of a kernel that is patched with openMosix, because there are some symbols (functions) which aren't exported. Therefore, by looking at its offset address, CHPOX can simply call it by reference. First form of configuration is usually the one you need.

To compile and install the actual CHPOX module and the userspace tools, type:

#make
#make install

Testing Your CHPOX Installation

Once you have finished, you can test your first CHPOX installation using Listing One (chpoxtest.c).

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/wait.h>
#include <errno.h>

static volatile sig_atomic_t print_count = 0;

void sighandler(int signo)
{
     
     printf("Executing signal handler\n");
     sleep(10);
     print_count = 1;
}
void reader_process(int fd, const char *fname, int delay)
{
     FILE *input, *pfile;
     char buf[256];
     
     printf("reader enter\n");
     pfile = fdopen(fd, "w");
     input = fopen(fname, "r");
     if (!input) {
          perror(fname);
          exit(EXIT_FAILURE);
     }
     while (fgets(buf, sizeof(buf), input) != NULL) {
          printf("reader\n");
          sleep(delay);
          fputs(buf, pfile);
          fflush(pfile);
     }
     printf("reader exit\n");
}
void writer_process(int fd, const char *fname)
{
     FILE *output, *sfile;
     char buf[256];

     printf("writer enter\n");
     sfile = fdopen(fd, "r");
     output = fopen(fname, "w");
     if (!output) {
          perror(fname);
          exit(EXIT_FAILURE);
     }
     while (fgets(buf, sizeof(buf), sfile) != NULL) {
          printf("writer\n");
          fputs(buf, output);
          fflush(output);
     }
     printf("writer exit\n");
}
int main(int argc, char **argv)
{
     /* Pipe discriptors */
     int pfd[2];
     /* Socketpair descriptors */
     int sfd[2];
     int delay;
     pid_t pid;
     FILE *input, *output;
     char buf[256];
     int val;
     struct sigaction sa;
     int count = 0;
     sigset_t mask;
     
     if (argc != 4) {
          fprintf(stderr, "Usage: %s <delay> <input> <output>\n", argv[0]);
          exit(EXIT_FAILURE);
     }
     delay = atoi(argv[1]);

     /* Let's create communication channels ...*/
     if (pipe(pfd) == -1) {
          perror("pipe");
          exit(EXIT_FAILURE);
     }
     if (socketpair(PF_UNIX, SOCK_STREAM, 0, sfd) == -1) {
          perror("socketpair");
          exit(EXIT_FAILURE);
     }
     /* ... and fork children */
     pid = fork();
     switch (pid) {
          case -1:
               perror("fork 1");
               exit(EXIT_FAILURE);
          case 0:
               close(pfd[0]);
               close(sfd[0]);
               close(sfd[1]);
               reader_process(pfd[1], argv[2], delay);
               exit(EXIT_SUCCESS);
     }
     close(pfd[1]);
     pid = fork();
     switch (pid) {
          case -1:
               perror("fork 2");
               exit(EXIT_FAILURE);
          case 0:
               close(pfd[1]);
               close(sfd[1]);
               writer_process(sfd[0], argv[3]);
               exit(EXIT_SUCCESS);
     }
     close(sfd[0]);
     
     /* And let's do our hard work */
     input = fdopen(pfd[0], "r");
     output = fdopen(sfd[1], "w");
     
     memset(&sa, 0, sizeof(sa));
     sa.sa_handler = sighandler;
     sigaction(SIGUSR1, &sa, NULL);
     sigemptyset(&mask);
     sigaddset(&mask, SIGUSR1);
     
     for (;;) {
          if (print_count) {
               printf("numbers processed: %d\n", count);
               print_count = 0;
          }
          sigprocmask(SIG_BLOCK, &mask, NULL);
          if (fgets(buf, sizeof(buf), input) == NULL)
               break;
          printf("main\n");
          val = atoi(buf);
          fprintf(output, "%d\n", val * val);
          fflush(output);
          sigprocmask(SIG_UNBLOCK, &mask, NULL);
          count++;
     }
     fclose(input);
     fclose(output);
     wait(NULL); wait(NULL);
     exit(EXIT_SUCCESS);
}
Listing One: chpoxtest.c

Listing Two is the Makefile for compiling chpoxtest.c.

The program forks two new children, creates a pipe and a UNIX socket for IPC between forked children, and sets a signal handler for the parent. This enables the parent to print the current progress to standard output every time a SIGUSR1 signal is received.

 CFLAGS = -Wall -O2
 CC=gcc

 chpoxtest: chpoxtest.o

 clean:
     rm -f -- chpoxtest *.o

 .PHONY: all clean

Listing Two: Makefile.

An Example

This example demonstrates all the current CHPOX features. To compile the program, you can use the provided Makefile in Listing Two. The program takes three arguments:

Here are the steps to do checkpointing with CHPOX:

  1. Make sure you have loaded the CHPOX kernel module. If you haven't, type modprobe chpox_mod. Confirm it with lsmod.

  2. Create the appropriate special character file in the /dev directory. Skip this if you use devfs. To automate the work, the CHPOX package provides a script to do this. Simply execute the mkchpoxdev script from the tools directory. Remember to do step #1 first because the script will check the existence of /proc/chpox and will bail out if it found none.

  3. Run the target application. In this case, you can execute the above "chpoxtest" program for a quick start.

  4. Register the target program with chpoxctl. The syntax is:

    
    #chpoxctl add <target-pid> <assigned signal number> <child flag> <checkpoint filename>
    
    

    For example, if you know that the process pid is 4500, you want to use "31" as signal number assigned for CHPOX handler and checkpoint the process' children, you type:

    #chpoxctl add 4500 31 9 /tmp/test.dmp
    

    (See the chpoxctl(1) man page for complete explanation of each paramete.r)

    If you just need to checkpoint a single process only, replace the fourth parameter with "1". For checkpoint filename, you can give any valid pathname.

    For signal handler's number, it is preferred to pick an unused one. The safe choices are SIGUSR1, SIGUSR2, or SIGUNUSED, but if you know there is a definitive handler assigned to these signals from within the target process you must select another one. Check /usr/include/bits/signum.h or type kill -l to see the actual signal number for the related signal name.

    Chpoxtest already implants a signal handler for SIGUSR1,thus you are forced to pick another one. Here you are assumed to pick SIGSYS (31). Please notice that you need to grab the PID of the parent process, not the child. You can use "ps" or "pgrep" looking for "chpoxtest" instance and pick the process with the lowest PID (the parent). Another way is by using "ps auf" or "ps auxf" to get the tree showing parent-child relationship.

  5. Now you are ready to checkpoint. During the execution of chpoxtest, send it a SIGSYS:

    # kill 31 4500
    

    If you succeed, a new file "test.dmp" in the /tmp directory is created. Repeat as needed if you want to checkpoint chpoxtest at a different time. Notice that latest checkpoint is saved on the specified dump file, so if you want to keep the previous checkpoint state, you need to copy/move it first before executing kill again. CHPOX can't do this automatically for you.

  6. Wait until chpoxtest finishes or just stop it (using Ctrl-C). Now you can restore the task using "ld-chpox". The syntax is:

    # ld-chpox <path and filename of the dump file>
    

    In this case, to restore chpoxtest, you can type:

    # ld-chpox /tmp/test.dmp
    

    As you can see from the console output, it continues right from the same spot when you checkpointed it. Make sure the path and the name of input and output file are still same or you will get error that files can't be found.

    At this stage, you won't need CHPOX to remember the PID (process id) you have registered before. Therefore, you can safely delete it from CHPOX list using chpoxctl. This time, the syntax is:

    chpoxctl del <pid of process>
    

    Because we registered using 4500 as PID (pay attention, this is just an assumption), you type:

    # chpoxctl del 4500
    

    You can also type:

    #chpoxctl clear
    

    To clean up all the PIDs in the CHPOX checkpoint list.

You can also try another possibility here. As you can see in the code, sleep() is used inside the SIGUSR1 handler for pending the parent process' execution within several seconds. So you have the possibility to checkpoint when the code hits the signal handler:

  1. Send the parent process the USR1 signal:

    # kill -USR1 4500
    

  2. The application prints the status that it is now inside the signal handler. Quickly checkpoint it:

    # kill -31 4500
    

    Later when you restore it from the dump file, parent processes start from sleeping state, meanwhile both children are still running. Remember here that SIGUSR1 signal handler is assigned ONLY for parent process, thus the one who reacts to SIGUSR1 is just the parent process.

As another test case, you can pick grep. To make grep run long enough to checkpoint it, you can grep over something large; for example, the Linux kernel source. Here we assume you have extracted Linux kernel source in the /usr/src/linux directory. You can use any version of Linux kernel source. Try:

$ grep -I -r -i -n schedule /usr/src/linux/* > /tmp/grep-result.txt

The term "schedule" is selected because it returns bunch of search hit. You are free to pick any term you like. Try to pick the one which is likely found many many times so grep is kept busy and you have plenty of time to checkpoint.

As before, while grep is in action, register and send the checkpoint signal to the grep pid. Perhaps at this point, you might wonder how to prove that restoration is actually successful? Here is the trick:

  1. Do grep and let it finish. Here we assume that you save the output as grep-complete.txt.

  2. Do grep again. This time redirect the output as grep-partial.txt.

  3. Press Ctrl-Z to stop the grep--remember, stop it, not terminate it.

  4. While grep in stopped state, register and checkpoint it. Notice that the dump file is not created yet.

  5. Copy grep-partial.txt as grep-partial-a.txt.

  6. Resume grep work by typing:

    $ fg %1
    

    You might need to adjust the parameter by looking at "jobs" output.

  7. Terminate it by pressing Ctrl-C as soon as possible. Watch that the dump file now exists.

  8. Delete and recreate grep-partial.txt. Simply use touch:

    # rm -f grep-partial.txt
    # touch grep-partial.txt
    

  9. Now restore the process by using the dump file. Let it finish. Now you have grep-partial.txt and grep-partial-a.txt. Concatenate them using cat:

    $ cat grep-partial-a.txt grep-partial.txt > ./grep-cat.txt
    

  10. Compare the content of grep-cat.txt with grep-complete. By observing the result, you can draw your own conclusion. Hint: to get thorough view of file's content, use a file viewer that is able to dump the raw content of the file, for example: hexdump utility.

Conclusion

CHPOX still needs lots of improvement. As noted in its web site, support for Internet sockets, shared memory, System V IPC, processes with multiple threads are still in to-do list. So don't expect CHPOX to work flawlessly in every situation. Always do experimentation before using CHPOX in real production environment.

Reference

Understanding the Linux Kernel, Second Edition. Daniel P. Bovett and Marco Cesati, O'Reilly and Associates, ISBN 0-596-00213-0.

Linux Kernel Development. Robert Love, Sams Publishing, ISBN 0-672-32512-8.

Comparison of the Existing Checkpoint Systems", Byoung-Jip Kim, IBM Watson.

"Process Checkpointing and Restarting System for Linux," Sudakov O.O., Boyko Yu.V., Tretyak O.V., Korotkova T.P., Meshcheryakov E.S., Mathematical Machines and Systems, 2003. N.2, p.146-153.

Mulyadi Santosa is a software developer in Indonesia. He can be contacted at [email protected]. Eugeniy Meshcheryakov is a developer in the Ukraine. He can be contacted at [email protected].

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.