Sidebar
High-Traffic Recording
On popular Web sites, traffic is increasing exponentially. Traditional Web-log analysis can take too long to read and process large log files. Even recording the raw data in a database is too slow.
Data-cube recorders don't actually store the raw event data at all. Instead, the recorder creates new visitor and content categories on the fly, and assembles a statistical model of visitor behavior as event data flows in (see Figure 2). In OLAP lingo, the statistical model is called a complete or partial "data cube." It lets marketers rapidly roll-up and drill-down to see different views of the data. This is the method Andromedia Aria has adopted and it accounts for the product's unique analysis and reporting capabilities.
The advantage of data-cube recorders is that reporting on preanalyzed data can be very fast. Furthermore, the statistical behavior model is typically much smaller than a collection of raw events, requiring less disk space. Finally, because a data-cube reporter writes less data and makes less frequent commits to the database, it can keep up with extremely popular sites where other recording techniques have difficulty.
The downside of data-cube recorders is that raw data isn't saved. If the recorder was not set up to generate the desired visitor or content categories automatically, it can be impossible to go back and regenerate the statistical behavior model after the fact.
To enable regeneration, Aria also provides a log recorder that creates compressed output files -- optionally deleting files older than a preconfigured retention period -- along with a log reader. Compressed-log recorders are inappropriate for most production traffic analysis, because decompression consumes precious processor time during the (also processor-intensive) analysis phase.
However, sites can run a data-cube recorder and a compressed-log recorder simultaneously, giving them the best of both worlds. The data-cube recorder provides on-the-fly data analysis for realtime reporting, while the compressed-log recorder lets a Webmaster restructure categories and regenerate the statistical model afterwards, if necessary. In practice, the combination is not used that often. Most Webmasters set up category-generation correctly in advance, and don't want to waste disk storage and processor time creating compressed log files. -- DG