When Dr. Sally Ride and NASA created the ISS EarthKAM program in 1996 as a way to help middle school students learn about the Earth, they soon realized that they had also created a significant data management challenge. EarthKAM, which is a loose acronym for Earth Knowledge Acquired by Middle school students, lets participating students remotely control a digital camera mounted on the International Space Station (ISS). The resulting images of the Earth are downloaded from ISS each hour, and are stored in the ISS EarthKAM Datasysteman archive that has grown to contain over five thousand cataloged images with corresponding metadata.
For the team behind the projecta group of professors, university students, teachers, and NASA officialsmanaging the archive has been a unique technical challenge, complete with lessons in database management, disk allocation, and bandwidth optimization.
Dr. Ride, who in 1983 became the first American woman in space, conceived of the program as a way to let students share her experience of looking down on the Earth from space. Originally dubbed KidSat when the camera was mounted on select Space Shuttle missions, ISS EarthKAM has evolved over the past seven years to include more schools and more frequent missions now that the camera is permanently mounted on the space station.
Students plan the images they want to take by tracking the camera's movement above the Earth. After entering coordinates and other related data into the camera control form online, they log on to the Datasystem to see how their images turned out. The students use the images to complete a variety of classroom "missions" in history, geography, geology, physics, oceanography, mathematics, and current events.
Bringing Images Home
EarthKAM images are all taken by a Kodak DCS 460 camera mounted in a window onboard the space station. Each photo would normally result in an 18MB file if it were stored as a standard TIFF image. However, space is critical in space. The larger each file is, the fewer total files the camera can store; and the longer it takes to download the images to the data center on Earth. To cope with this difficulty, the camera immediately compresses each image into Kodak's proprietary DCR format, which incorporates a lossless compression algorithm, reducing each picture to about 6.5MB. The resulting files are saved via an SCSI connection to an IBM ThinkPad computer on the space station. There, the files are further compressed to approximately 2.6MB using gzip compression.
The gzip files are transferred via a Tracking and Data Relay Satellite, capable of 600 to 800Mbps data transfers, to a ground tracking station at NASA's White Sands Complex in New Mexico. From there, the digital pictures are downloaded to NASA's Johnson Space Center (JSC) in Houston, TX. Finally, an application at the University of California San Diego (UCSD) pulls the data to a RAID array on a Sun Ultra 60 server. At this point, it becomes part of the EarthKAM Datasystem, where each image is processed and made available online to students and the public at datasystem.earthkam.ucsd.edu. (See a screenshot of the Datasystem and an image from the Datasystem of England's coast.)
Sometimes, all of this happens quickly enough that students can access the Datasystem and view or download pictures within as little as five or ten minutes. Typically, though, the process takes closer to an hour. The most important variable is the time it takes to download images to JSC. Images aren't always immediately downloaded after they're taken because data transfers relating to the primary mission of astronauts on ISS take precedence over "hitchhikers" like EarthKAM.
File Formats and Storage
A DCR file with gzip compression may be efficient for downloading, but it isn't convenient for the typical Mac or Windows machine that a middle school student is most likely using. Likewise, an 18MB TIFF isn't exactly a Web-friendly format either, especially given that middle schools seldom own state-of-the-art computers or have high-speed Internet connections. So staff members at NASA's Jet Propulsion Lab (JPL) in Pasadena, CA, created a system that converts incoming images to Portable Pixel Map (PPM) format at resolutions of 3060 x 2036 pixels and 768 x 512 pixels. The system performs these conversions automatically when it detects and downloads new images from the JSC server.
Paul Andres, a technical staff member at JPL and the Datasystem lead, optimized the application by splitting it into two components. The first program handles the file transfer, and the second handles the processing. In this way, both steps can be performed at the same time on different images.
After the initial processing, C and Perl scripts create three differently sized JPEG files, each composed of 384 x 256 pixel tiles. When a student or teacher requests an image in TIFF, PICT, or GIF format, a script creates the file on the fly based on the PPM image. TIFFs and PICTs are the most often requested image formats because students can examine and modify them in Adobe Photoshop. Some schools also use NIH Image, a public domain image processing and analysis program developed for the Macintosh at the Research Services Branch of the National Institute of Mental Health.
To help track images, the camera embeds metadata in the header of each file, including a unique identifier associated with the photo request. When the images are downloaded to the Datasystem servers, they are run through scripts that retrieve the embedded data. From the time associated with the photo ID, the script calculates the space station's position relative to the surface of the Earththis is valuable in determining what part of the Earth was captured in the photo.
The metadata for each image is stored in an Oracle 8.1.5 database on the Sun Ultra 60 server located at UCSD's Science and Engineering Research Facility. The system has been at the facility for less than a year. Until early 1998, the Datasystem was housed in the Digital Image Animation Laboratory at JPL. The system was also located at the San Diego Supercomputer Center (SDSC) for a few years beginning in 1998, after the fourth EarthKAM mission.
Today, after more than seven years of acquiring images, the Datasystem has come within 5GB of maxing out a 140GB RAID array. There are more than 5,000 images, each in multiple formats. Clearly, more storage is needed. Prior to an upcoming ISS mission in the Fall of 2002, the Datasystem will receive a second RAID system, built with inexpensive IDE drives and a SCSI host interface. "We are currently looking at a RAID box that will give us about three quarters of a Terabyte of Level 5 RAID storage for approximately $5,000," says CTO Alann Lopes. "It's not the fastest RAID in the world, but it's a good value."
Value is certainly an important factor in any NASA-sponsored project. The Ultra 60 is one of Sun's most inexpensive dual-processor workstations (UCSD's is a single-processor configuration). And storage has been selected more for value, in cost per megabyte, than for speed.