The Foremost Open Source Forensic Tool
Ray Strubinger
Several open source forensic tools have been created for the Unix platform; most notably, The Coroner's Toolkit (TCT) by Dan Farmer and Wietse Venema, as well as The @stake Sleuth Kit (TASK) and its browser-based front end, Autopsy, both created by Brian Carrier. While TCT and TASK/Autopsy have been written about frequently, there is another very useful application known as "foremost" that has received relatively little attention outside of a few computer forensics mailing lists.
Introduction to the Foremost Forensic Tool
Foremost is a free forensics tool created for the Linux platform and developed by Special Agents Kris Kendall and Jesse Kornblum of the U.S. Air Force Office of Special Investigations. (In accordance with 17 USC 105, this tool is not afforded any copyright protection because it is a work of the U.S. government.) The tool was inspired by, and designed to imitate the functionality of, the DOS program CarvThis, written by the Defense Computer Forensics Lab. Foremost enables forensic examiners to automatically recover files or partial files from a bit image (or the media itself) based on file header and footer types specified in a user-defined configuration file.
Foremost works by reading into memory a portion of the media or media image under examination. This portion is searched for a file header contained within the foremost configuration file. If a matching header is found, foremost writes the header and the data following the header to a file. Foremost will append data to the file until the footer is found (which signifies the end of the file) or until the file size limit listed in the configuration file is reached. Foremost will read additional data into memory from the media or media image if necessary to find the file footer or to satisfy the file size limit set within the configuration file. Using a file size limit serves as a means to stop foremost from adding data to a recovered file if the appropriate file footer is not found.
The foremost configuration file also allows the forensic examiner to customize the types of files that will be recovered and enables the use of wildcards for pattern matching. Customizing the configuration file and sources for file headers and footers will be covered later in the article.
As the capacity of storage media grows larger and larger, automated processing tools will become more necessary to supplement hand processing of forensic images. Although a hex editor can be used to recover files from a forensic image (and sometimes may be the best choice), the use of a hex editor can be tedious if a great many files must be recovered. Foremost is useful as a recovery tool when dealing with unfragmented files, because the tool in its current form will simply read data until a condition is met to cause it to stop extracting data. Newly formatted disks or partitions that have data written to them would be unfragmented until files are deleted and new files are written. (This type of situation is often found in a document or image archive.)
Foremost can be used to perform a cursory examination of media or media images for headers specified in the configuration file. When using this mode, foremost will only search the media; it will not extract any files it encounters. This technique is useful in situations where many drives are involved to determine whether a drive should be given a through examination. Foremost can recover files from images made by the Unix utility dd (as shown later in this article), as well as images created by commercial applications such as EnCase and Safeback.
Obtaining and Compiling Foremost
Foremost is available from SourceForge at:
http://sourceforge.net/projects/foremost/The source of the current version, 0.64, is less than 100,000 bytes in size. The program is written in C, and compilation is straightforward on a Linux system. Once you have obtained the archive, the following commands can be used to build the application:
$: tar xvfz foremost-0.64.tar.gz $: cd foremost-0.64 $: makeYou can install the tool to /usr/local/bin with the command:
$: make installThe tool can be uninstalled from the system with the command:
$: make uninstallI have successfully compiled and executed foremost on several versions of Red Hat Linux (7.x-9.x). The documentation included with foremost indicates that some systems using older versions of glibc may produce harmless warnings that can be ignored regarding fseeko and ftello. Using Foremost Foremost opens bit images or devices in read-only mode, which is important for maintaining the forensic integrity and the validity of the investigation -- especially when the unfortunate situation arises whereby a forensic image of the media cannot be obtained. (Generally, forensic examiners only want to use the original media to produce a forensically sound image -- they do not want to conduct an examination on the original media to avoid the risk of contaminating the original in some way.) If you are not a computer programmer or do not want to examine the foremost source code, a simple way to verify that foremost does not alter the image is to compute an MD5 checksum of the forensic image before and after running the tool on the image. If the checksums match, then the forensic image cannot have been altered. The image is read into memory in 10-MB chunks and any matching files are obtained from the image in memory, if possible, in order to increase processing speed by limiting the reads from the disk. Large files (those greater than 2 GB in size) are supported under Linux. As an additional safety precaution, foremost will only extract data to an empty directory. Foremost requires a configuration file that contains, at a minimum, the header, the amount of data to extract for a file type (referred to as "size" in the configuration file), a "y" or "n" denoting the case sensitivity of the header and footer, and the file extension. Header and footer values can be specified in either hexadecimal or octal values and are decoded before use. A maximum of 100 file types may be specified in a single configuration file. Lines in the configuration file that begin with a pound sign (#) are treated as comments. The pound sign can be used at the beginning of a line containing a file type to prevent foremost from searching for that file type. An excerpt from a configuration file is shown in Figure 1. In the example in Figure 1, the default foremost.conf configuration file is used. I encourage you to examine the configuration file and modify it to best suit the needs of your investigation. You may want to create custom configuration files that only contain files of one type, such as documents, graphical images, or binaries. A great Web resource for finding file specifications is Wotsit's Format at:
http://www.wotsit.orgWotsit's Format contains information on hundreds of file formats and makes a nice complement to the foremost application by bringing together many file formats (which contain header and footer information) that can be used to populate the foremost configuration file. Users of the Wotsit site are also encouraged to submit information on file formats that are not listed on the site. A hex editor may also be used to obtain header and footer information from a given file type by opening the file and copying short hex strings that occur at the beginning and end of the file. These extracted strings would then be copied into the foremost configuration file. If you choose to use a hex editor, keep in mind that a machine utilizing little-endian architecture may require you to reverse the byte order of the header and footer to get an accurate representation.
Running foremost without any options will produce an error message, along with a brief explanation of the function of the program and the parameters that may be used with the application as seen in Figure 2. Running foremost with the -h option will also produce the list of parameters.
If the examiner creates an image directory, then copies the forensic image to be analyzed and the configuration file to that directory, the command foremost -v -c foremost.conf ext2-home.dd will produce output similar to that shown in Figure 3 (the output has been edited to save space). With the options shown in the command-line below, foremost will generate verbose output and use the foremost.conf configuration file found in the current directory against the image named ext2-home.dd. Since the output directory was not specified with the -o option, foremost defaults to using a directory called foremost-output for any files that it locates and recovers. This output directory will be created in the current working directory. As foremost recovers files, it assigns them numerical names starting at 00000000 and adds the relevant extension. Figure 3 shows output from a foremost run where the verbose (-v) option has been used.
While foremost executes, it produces an audit.txt file containing the command line options used, path information, and the name of the forensic image. The name foremost is assigned to the recovered file, then offset into the image where the file was found, and the file length is displayed. The column labeled "Interior" within the audit file denotes whether the recovered file was found off the start of a sector. An excerpt from the audit file produced on the above run is shown in Figure 4.
To recover entire files (assuming that the entire file is still intact), you should pay attention to the file sizes specified in the foremost.conf file. In this example, the audit.txt file shows that two recovered files reached the maximum size allowed by the configuration file. The default maximum size for a PDF file is 100000 bytes, and the default for an html file is 50000 bytes. Adjusting the file sizes in the foremost.conf file to support the recovery of larger PDF and html files and using the information in the "Found at Byte" column with the -s option, makes it easy to skip through the image and make another recovery attempt on the PDF and html file.
Conclusion
Although foremost has been used in various investigations, I am unaware at the time of writing of any court cases where the tool has been judicially validated or utilized in the gathering of submitted evidence. Bug reports are taken seriously by the tool's creators, due to the fact that it could be used to produce evidence. I have used foremost to recover data from forensic images from several file system types including ext2/3, Linux swap, UFS, JFS, NTFS, and various flavors of FAT including FAT 12, 16, and 32. I have also used foremost to extract data from single-disk elements in a multi-disk RAID-5 array and from partial bit images. Although no single tool can serve all needs because of the variations found in each situation, overall I have found foremost to be a useful addition to the growing collection of open source forensics tools.
References
Foremost Web Site -- http://sourceforge.net/projects/foremost/
Wotsit's Format (file specifications) -- http://www.wotsit.org
Ray Strubinger is a SANS GIAC Certified Forensic Analyst with Dacrobatics, Inc. (http://www.dacrobatics.com), a data recovery and computer forensics firm. Ray has a decade of administration experience on multiple platforms and has worked in the IT field in the financial, educational, and private sector.