The match() Function
Here's where things get tricky. Consider the query:
SELECT * FROM fs WHERE path='/var/log' OR path='/usr/lib';
What happens here? Well, you might think that you simply intercept the path column in xBestIndex() and pass it on to xFilter(), but as it turns out SQLite never calls xBestIndex() here.
Why? Because xBestIndex() is subject to SQLite's index conditions just like any other query. There are times when SQLite will use indexes, and times when it will use sequential scans. When you use OR for multiple constraints on a given column, SQLite uses a sequential scan. That puts us right back into the situation where we have to search the whole filesystem again, instead of just two directories.
Fortunately, there is a way around thisthe match() function. If you notice , one of the defined relational operators is SQLITE_INDEX_CONSTRAINT_MATCH. So for example, if we recast the aforementioned query to
SELECT * from fs WHERE path match('/var/log', '/usr/lib');
then SQLite calls xBestIndex() with the path column, and specifies that the SQLITE_INDEX_CONSTRAINT_MATCH operator was used. This is our solution.
To do this, however, you must register a function to handle the match() SQL function. By default it is unimplemented and will trigger an error if you try to use it. To register it, we use the xFindFunction() callback, which we implement as vt_find_function() (Listing Seven, also available online).
When a SQL function uses a column from a virtual table, SQLite calls xFindFunction() to give the virtual table an opportunity to overload the function. The first three parameters are inputs: the virtual table, the number of arguments to the function, and the name of the function. To overload the function, xFindFunction() passes a function pointer back via *pxFunc (and any user data via *ppArg) and returns 1. If it doesn't want to overload it, then xFindFunction() simply returns 0.
In this case, you just want SQLite to see match() as a valid function so it will use xBestIndex(). You don't really care what match() does. In fact, our implementation of match always returns true (meaning everything matches no matter what). What matters here is that you get the values of match() into xFilter() so you can narrow your search path. So what you look for in xBestIndex() is any column whose ordinal is 1 (p_info->aConstraint[x].iColumn == 1) and whose relational operators are equality (SQLITE_INDEX_CONSTRAINT_EQ) or match (SQLITE_INDEX_CONSTRAINT_MATCH). If you see that, then you pass that column on to xFilter().
So xFilter() simply looks for a string containing one or more paths in argv. In the example, argc is 1 and argv will be:
argv[0] = "'/var/log','/usr/lib'"
It is your responsibility to interpret the argument passed to match(). Thus, you have to parse the string looking for paths. In this implementation, I use the convention that paths are separated by commas. I parse everything between commas and build a list of directories to search. This is then stored in the cursor structure. The search works just as before, except instead of recursively searching one directory, I search multiple directories. The search logic is the same, I just repeat it for every directory passed into match().
Again, for each row in the result set, SQLite calls xColumn() to get at the values for each row in the result set. It passes xColumn() the ordinal of the column whose value is to return. To fulfill the request, you just implement a switch statement covering all ordinals in our virtual table, passing back the appropriate value for each column. This works similar to SQLite's user-defined function interface.
This then is the basic workings of an optimized virtual table. The full implementation is given in the file fs.c (available online; see www.ddj.com/code/). Additionally, there is a stripped down, bare-bones virtual table that does nothing in the file example.c.
Conclusion
While powerful in themselves, virtual tables are the foundation of some of SQLite's larger features, such as full text search (FTS). Even more powerful is the idea that you can join virtual tables like any other tables, thus cross referencing disparate information. Also, you can use virtual tables as a way to aggregate information. You could, for example, use the filesystem table to generate a reference list of all files in the filesystem, and then issue queries each day to scan for changes. The possibilities abound. With virtual tables, the world is your database.