Derek lives in New York City and works for azurance.com, an open source and security consulting firm that he cofounded. He is the author of the upcoming book, Managing RAID on Linux (O'Reilly and Associates, 2003), and can be contacted at [email protected].
In my article, "Parsing RSS Files with XML::RSS," (TPJ, Fall 2002), I covered using XML::RSS to locate, parse, and reuse dynamic content found on the World Wide Web. But what if you want to provide original material for others? XML::RSS can also be used to generate a properly formatted RSS file. You can create feeds on the fly using a database back end, or generate a static RSS file that gets updated at regular intervals.
The <channel> Element
RSS files are composed of various elements that describe a channel (or feed) and its dynamic content. Each item contained within a channel should contain a <title> and <link> element and may also contain the optional <description> element. Likewise, the <channel> element, which stores metadata about the channel, has its own <title>, <link>, and <description> subelements. In addition, the <channel> structure also contains an <items> subelement that provides a table of contents for the RSS document. So a bare-bones RSS <channel> element might look something like Example 1.
In the aforementioned example, our RSS feed contains three items (typical feeds contain about 10 items) and they are indexed by their URL, or in RSS-speak, they are indexed by each item's <link> element. Begin creating your RSS file by initializing a new RSS object and using the channel() method to populate the required items within the <channel> element, as in Example 2.
Only a title, link, and description are required, but other elements are also available. For example, <image> and <textinput> may be used to provide links to a site logo or newsletter subscription form. In addition to these attributes, several modules that extend the base RSS schema are available. These extensions provide elements that can include metadata about a site's topic, authors, and update frequency, and are categorized as modules that are part of the RSS specification. The Dublin Core module (http://web.resource.org/rss/1.0/modules/dc/), for example, includes provisions for information about copyright, publisher, publication date, and language. The Syndication module (http://web .resource.org/rss/1.0/modules/syndication/) provides elements that describe how often a feed is updated. I'll cover a few elements from each module. The specification for each module contains a comprehensive list of options. A complete list of modules is available from http://web.resource.org/rss/1.0/.
Use second-level hashes to compartmentalize RSS module metadata. In Listing 1, I have added a few elements from the Dublin Core and Syndication modules to my channel element.
The Dublin Core elements that I have added are straightforward, but the Syndication elements require a short explanation. The <syn:updatePeriod> specifies a time interval in which to measure the number of updates. In this case, like many RSS feeds, I have chosen one hour. Possible choices are hourly, daily, weekly, monthly, and yearly. The <syn:updateFrequency> specifies how many times the feed is updated during each period. So in this example, the feed is updated four times per hour, or every 15 minutes. The <syn:updateBase>, though it looks a bit confusing, simply represents the first time the feed was published. In this case, November 5, 1999 at 9:00am Eastern Standard Time. This information, combined with the update frequency and update period, allows users and applications to determine a publishing schedule.
Some module extensions may be applied to individual items in addition to the <channel> element. For example, specifying a <dc:creator> for each item is useful for sites that have articles written by more than one author. Add a second-level hash to each item for the modules and subelements that you want to include. Just follow the same examples I used for the <channel> element.
Adding Items
Next, I'll add some <item> elements using the add_item() method (see Listing 2).
In compliance with the RSS specification, the add_item() method requires a title and link, but may also include an optional description. Notice how each item corresponds to an entry in the <rdf:Seq> metadata from my previous examples. That information is automatically generated by XML::RSS as items are added to the RSS object.
Combining DBI and XML::RSS
At this point, it's probably obvious that generating an RSS file using Perl is not much easier than creating one by hand using a text editor. Therefore, creating a reusable program that can automatically generate and update your RSS feed is desirable. While you could use nearly anything on your system as the data source (like text files, DB files, an LDAP server, or a combination of sources), using a back-end SQL database is a popular choice. My rss_items function queries a SQL back end (MySQL in my case) and calls XML::RSS's add_item() method to populate the RSS object (see Listing 3).
rss_items takes a positive integer as input. This number is used to determine how many entries are added to the RSS object. Since I want to extract rows from the end of the table, I count the number of rows in the table and use this number to generate an offset (lines 20-23) for the first row that I want to return. The LIMIT portion of my second query (line 24) uses that offset and returns rows between that number ($offset) and the end of the table (-1). Depending on which back-end RDBMS you are using and what your schema looks like, you might need to perform some different steps to achieve this effect.
Finally, the while loop (line 27) iterates through each row of data returned, and calls the XML::RSS add_item() method using the title and link that was returned from my database.
Generating the File
Now I can call the as_string() or save() functions to output the data to standard out or to a file. For example:
1 print $rss->as_string; 2 save("/var/www/azurance.rss");
Calling either as_string() or save() results in the output shown in Listing 4.
Checking Your Work
After you're done creating a feed, you might want to check whether it complies with current RSS standards. Mark Pilgrim and Sam Ruby have made available an RSS validator (http://feeds .archive.org/validator/check). Quite an invaluable tool, the validator allows you to enter the URL of an RSS file to be checked for errors.
TPJ
Listing 1
<channel rdf:about="http://www.azurance.com"> <title>azurance.com</title> <link>http://www.azurance.com</link> <description>Open Source and Security Consulting</description> <dc:language>en-us</dc:language> <dc:rights>Copyright © 1999-2002, Azurance.com</dc:rights> <dc:publisher>Azurance</dc:publisher> <dc:creator>[email protected]</dc:creator> <dc:subject>Open Source, Security</dc:subject> <syn:updatePeriod>hourly</syn:updatePeriod> <syn:updateFrequency>4</syn:updateFrequency> <syn:updateBase>1999-11-05T09:00:00-05:00</syn:updateBase> <items> <rdf:Seq> <rdf:li rdf:resource="http://www.theregister.co.uk/content/55/27734.html" /> <rdf:li rdf:resource="http://www.vnunet.com/News/1136204"/> <rdf:li rdf:resource="http://www.internetnews.com/infra/article.php/1486121"/> </rdf:Seq> </items> </channel>
Listing 2
1 $rss->add_item( 2 3 title => "Baltimore launches Trusted Business apps", 4 link => "http://www.theregister.co.uk/content/55/27734.html" 5 ); 6 7 $rss->add_item( 8 9 title => "FBI investigates major web slowdown", 10 link => "http://www.vnunet.com/News/1136204" 11 ); 12 13 $rss->add_item( 14 15 title => "Cisco Boosts Security, Caters To Small Business", 16 link => "http://www.internetnews.com/infra/article.php/1486121" 17 );
Listing 3
1 sub rss_items { 2 3 use DBI; 4 5 my $itemCount = shift @_; 6 my ($dsn, $dbh, $sth, $rv, @row); 7 8 my $driver = "mysql"; 9 my $database = "rss_news"; 10 my $hostname = "localhost"; 11 my $port = "3306"; 12 my $user = "username"; 13 my $pw = "password"; 14 my $table = "news"; 15 16 $dsn = "DBI:$driver:database=$database;host=$hostname;port=$port"; 17 $dbh = DBI->connect($dsn, $user, $pw); 18 $dbh->{PrintError} = 1; # turn off errors, we'll deal with it # ourselves 19 20 $sth = $dbh->prepare("SELECT COUNT(*) FROM news"); 21 $rv = $sth->execute; 22 @count = $sth->fetchrow_array; 23 $offset = $count[0] - $itemCount; 24 $sth = $dbh->prepare("SELECT title, link FROM news LIMIT $offset, - 1"); 25 $rv = $sth->execute; 26 27 while (@row = $sth->fetchrow_array) { 28 29 my ($title, $link) = @row; 31 $rss->add_item( 32 33 title => "$title", 34 link => "$link" 35 ); 36 } 37 }
Listing 4
<?xml version="1.0" encoding="UTF-8"?> < rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"> <channel rdf:about="http://www.azurance.com"> <title>azurance.com</title> <link>http://www.azurance.com</link> <description>Open Source and Security Consulting</description> <dc:language>en-us</dc:language> <dc:rights>Copyright © 1999-2002, Azurance.com</dc:rights> <dc:publisher>Azurance</dc:publisher> <dc:creator>[email protected]</dc:creator> <dc:subject>Open Source, Security</dc:subject> <syn:updatePeriod>hourly</syn:updatePeriod> <syn:updateFrequency>4</syn:updateFrequency> <syn:updateBase>1999-11-05T09:00:00-05:00</syn:updateBase> <items> <rdf:Seq> <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/xml/02/10/24/ 021024hnnpcwest.xml?s=IDGNS" /> <rdf:li rdf:resource="http://www.idg.net/ic_959380_1794_9-10000.html" /> <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/ xml/02/10/25/021025hnsecurelinux.xml?s=IDGNS" /> <rdf:li rdf:resource="http://www.cnn.com/2002/TECH/internet/ 10/23/net.attack/index.html" /> <rdf:li rdf:resource="http://www.infoworld.com/articles/hn/ xml/02/10/23/021023hnopteron.xml?s=IDGNS" /> <rdf:li rdf:resource="http://www.businessweek.com/technology/ cnet/stories/963054.htm" /> <rdf:li rdf:resource="http://www.itweb.co.za/sections/ internet/2002/0210240947.asp?A=HOME&O=FPIN" /> <rdf:li rdf:resource="http://www.internetwk.com/ security02/INW20021023S0001" /> <rdf:li rdf:resource="http://www.pcw.co.uk/News/1136211" /> <rdf:li rdf:resource="http://zdnet.com.com/2100-1105-963087.html" /> </rdf:Seq> </items> </channel> <item rdf:about="http://www.infoworld.com/articles/hn/xml/ 02/10/24/021024hnnpcwest.xml?s=IDGNS"> <title>Network chip makers focus on security</title> <link>http://www.infoworld.com/articles/hn/xml/02/10/ 24/021024hnnpcwest.xml?s=IDGNS</link> </item> <item rdf:about="http://www.idg.net/ic_959380_1794_9-10000.html"> <title>'The Golden Age of Hacking rolls on'</title> <link>http://www.idg.net/ic_959380_1794_9-10000.html</link> </item> <item rdf:about="http://www.infoworld.com/articles/hn/xml/ 02/10/25/021025hnsecurelinux.xml?s=IDGNS"> <title>Secure Linux maker teams with IBM in U.S.</title> <link>http://www.infoworld.com/articles/hn/xml/02/10/25/ 021025hnsecurelinux.xml?s=IDGNS</link> </item> <item rdf:about="http://www.cnn.com/2002/TECH/internet/10/23/ net.attack/index.html"> <title>FBI seeks to trace massive Net attack</title> <link>http://www.cnn.com/2002/TECH/internet/10/23/ net.attack/index.html</link> </item> <item rdf:about="http://www.infoworld.com/articles/hn/xml/ 02/10/23/021023hnopteron.xml?s=IDGNS"> <title>RSA, AMD team up on security for Opteron chips</title> <link>http://www.infoworld.com/articles/hn/xml/ 02/10/23/021023hnopteron.xml?s=IDGNS</link> </item> <item rdf:about="http://www.businessweek.com/technology/ cnet/stories/963054.htm"> <title>Encryption method getting the picture</title> <link>http://www.businessweek.com/technology/cnet/ stories/963054.htm</link> </item> <item rdf:about="http://www.itweb.co.za/sections/ internet/2002/0210240947.asp?A=HOME&O=FPIN"> <title>Internet banking security revolutionised with SMS-based cross-checking</title> <link>http://www.itweb.co.za/sections/internet/2002/ 0210240947.asp?A=HOME&O=FPIN</link> </item> <item rdf:about="http://www.internetwk.com/security02/INW20021023S0001"> <title>Vendor Warns Of New IE Holes; Microsoft Calls Reports Irresponsible</title> <link>http://www.internetwk.com/security02/INW20021023S0001</link> </item> <item rdf:about="http://www.pcw.co.uk/News/1136211"> <title>PGP poised for major comeback</title> <link>http://www.pcw.co.uk/News/1136211</link> </item> <item rdf:about="http://zdnet.com.com/2100-1105-963087.html"> <title>P2P hacking bill may be rewritten</title> <link>http://zdnet.com.com/2100-1105-963087.html</link> </item> < /rdf:RDF>