CL-GODB: A Common Lisp GO Database Manipulation Library
Name: Samantha Kleinberg
Contact: [email protected]
School: New York University
Major: Physics and Computer Science
Project: CL-GODB
Project Page: http://common-lisp.net/project/cl-godb/
Mentor: Marco Antoniotti
Mentoring Organization: LispNYC (http://www.lispnyc.org/)
CL-GODB is a new interface to the GO Database (http://www.geneontology.org/) written in Common Lisp. The Gene Ontology (GO) is a collection of terms organized in a taxonomy representing a controlled vocabulary used to describe genes, gene products, their functions, and the processes they are involved in for a variety of organisms. The GO Database (GODB) represents the ontological information and gene product annotations in a convenient relational database format (the GO database uses MySQL).
Until now, there have been no interfaces to the database that use Common Lisp. This is inconvenient as there are Bioinformatics and Systems Biology tools that employ the language (BioLingua, GOALIE, and the BioCYC suite, for instance).
GOALIE, developed by Marco Antoniotti and Bud Mishra in NYU's Bioinformatics Group (http://bioinformatics.nyu.edu/~marcoxa/work/GOALIE/) analyzes time course data from micro-array clustering experiments. The CL-GODB library will be integrated into GOALIE, improving the tool's functionality and efficiency.
The library works by building an incremental, as-needed, internal image of the GO database contents in core. This improves the speed of queries and facilitates the construction of more complex predicates that may be needed in an application such as GOALIE.
Users start by creating a handle that identifies their session and is linked to several hash indexes used in the in-core caching. Once they have connected to their copy of the GO database, they have access to a variety of built-in SQL queries, which take advantage of the indexing and add to the stored data. The queries range from getting basic information about a term, to finding a term's lineage using a choice of hierarchies.
As a testbed for the CL-GODB library, we built a GUI application that is available as a standalone executable. The CL-GODB Viewer lets users browse the hierarchy with a graphical tree view and provides information about each term and its associated genes, in a manner similar to that of several other GO viewer applications available online.
Creating the CL-GODB was challenging at times, as it was my first project in Common Lisp. The biggest hurdle was making sure that case-sensitivity vagaries were taken care of, as Common Lisp and MySQL behave differently under Windows and UNIX. In the end, it did work and I learned more about the intricacies of SQL syntax than I ever wanted to know.
Figure 1
DDJ