Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Keeping Up With the World


Simon is a freelance programmer and author, whose titles include Beginning Perl (Wrox Press, 2000) and Extending and Embedding Perl (Manning Publications, 2002). He's the creator of over 30 CPAN modules and a former Parrot pumpking. Simon can be reached at simon@ simon-cozens.org.


I love the holiday period—a chance to go back and see family, and to get away from it all in an isolated cottage in rural Wales. Of course, there's a downside to this: being in an isolated cottage in rural Wales. I still like to keep up with what's going on in the outside world, with the latest tech news and what people are saying on their blogs. Unfortunately, there's only dial-up available and that is time-metered, so I'm very much on a bandwidth budget.

Normally, I'd use an RSS reader. We've met RSS before—it's a way for sites to publish a machine-readable file of their latest news and items, so we can suck down these XML files and see if there's anything we haven't seen yet. Unfortunately, these XML files can get quite big, averaging around 10K, and if you're following a lot of them, that soon adds up. One possible solution is to run a console-based RSS reader on my server, and use a text-mode interface to that: This drastically cuts down the amount of data I need to download.

One last stipulation—when I get back from my holiday and back to bandwidth and my desktop, I don't want to lose track of what I've already read; I want to be able to carry on where I left off. At college, I use NetNewsWire, a Mac RSS client, so I want my text-based reader to read the NNW history format as well. Naturally, there aren't many of these around; several come close, but nothing was quite right. So I did the obvious thing: I wrote one. It's called "press," from "Perl RSS."

We'll look at how I did it this month and next: This month, we'll concentrate on getting a basic RSS aggregator working, and next month, we'll see how to interface that to NetNewsWire.

Scanning CPAN

The first port of call for any application like this is CPAN. What modules can we find that will do the work for us? I already knew about POE::Component::RSSAggregator, which does most of the job: It polls news sites, downloads their RSS, and alerts the POE event loop if there are any new headlines. Next, we need a UI to display it all: Curses is the right idea, but it's a bit low level, so I decided to check out Curses::UI. We'll need to format the stories contained in the RSS files, which are usually in HTML, for display as plain text, so we pick up HTML::FormatText as well. Finally, for dealing with the Macintosh property list files used by NetNewsWire, we can grab the aptly named Mac::PropertyList.

If it sounds like we've got most of the application done already, we have—the finished product was just over 200 lines of code.

Hacking the UI

Next, I got myself familiar with Curses::UI by creating a mock-up of the application. Curses::UI allows you to create "widgets," such as windows, list boxes, text areas, and so on, and stick them all together in a nice object-oriented way. So for instance, I wanted the display to have three panels, just like NetNewsWire (and many other readers), one for the list of the feeds I'm following, one for the headlines in an individual feed, and one for the story behind the headline. Figure 1 is a picture of what we're aiming toward.

I started by creating the three windows and adding them to the main Curses::UI object:

my $cui = new Curses::UI(-color_support => 1);
my $win1 = $cui->add( "feedlist", "Window",
    -titlereverse => 0, -htmltext => 1, 
    -border => 1, -bfg=>"blue",
    -title => "Feed List", -width => 30);

my $win2 = $cui->add( "headlist", "Window",
    -titlereverse => 0, -htmltext => 1, 
    -border => 1, -bfg=>"blue",
    -title => "Headline List", 
    -x => ($win->width+1), -height => 10);

my $win3 = $cui->add( "datawin", "Window",
    -titlereverse => 0, -htmltext => 1, 
    -border => 1, -bfg=>"blue",
    -x => ($win->width+1), 
    -y => ($win2->height+1) );

Each call to cui->add is followed by a name for the widget so that we can get at it later, the name of the widget (here we're starting with Windows), and then some options—we don't want the title to be reversed, we want to be able to use simple markup in titles, we want a border, and so on. We declare the sizes and positions of the latter two windows relative to the first, so that if we change the width of that, everything else will still be in the right place. Incidentally, the first line of options is always the same, so we factor that out to avoid repeating code:

@mywindow = ( "Window",
    -titlereverse => 0, -htmltext => 1, 
    -border => 1, -bfg=>"blue");
my $win1 = $cui->add( "feedlist", @mywindow, ...);
my $win2 = $cui->add( "headlist", @mywindow, ...);

Now we can place some widgets inside the windows—a "Listbox" widget in each of the feed and headline lists, and a "TextViewer" in the data window:

my $feedbox = $win1->add('feedlistbox', "Listbox",
                         -vscrollbar => 1);
my $headbox = $win2->add('headlistbox', "Listbox",
                         -vscrollbar => 1);
my $viewer  = $win3->add('data', "TextViewer");

And now we can tell the main Curses event loop to start:

$cui->mainloop;

When we run this, we get presented with a nice three-panel interface, two empty lists, and...no way to quit. Oops. We'd better add some simple navigation. First, we tell Curses::UI that the "^C" and "q" keys can be used to quit:

$cui->set_binding(sub { exit }, $_) for "\cC", "q";

Next, we'll tell each window that the tab key can be used to move to the next window, just as we'd expect. We do this by moving the focus to the next window in sequence:

$win1->set_binding(sub { $win2->focus }, "\cI");
$win2->set_binding(sub { $win3->focus }, "\cI");
$win3->set_binding(sub { $win1->focus }, "\cI");

Now our interface looks a bit better, and we can start thinking about how we're going to implement the logic.

What's POE?

This application is going to have multiple sources of input, and they could happen at any time: The user could hit a key on the keyboard and we'd have to update the screen, or an updated RSS feed could come in and we'd possibly have to deal with adding new headlines to the display. We also want to fire off HTTP requests occasionally to ask for new news. All of these things are called "events," and we're now entering the world of event-driven programming.

In an ordinary, procedural program, we'd have to deal with the user's keystrokes, then send off the HTTP request and block there until we got a response, and the user couldn't use the application's UI until we cede control back to her and wait for another keystroke. This leads to a horrible user experience. In the event world, we have a main event loop that watches for things happening. We tell the event loop that we want to schedule web requests periodically. When something happens, such as a keystroke or a response from the HTTP request, the event loop calls a routine. In this sense, the event loop is a bit like an operating system—it looks after scheduling events and dispatching them to the appropriate bit of code that's currently listening for them.

POE is one such event loop, and an award-winning one at that. Part of the beauty of POE is that major chunks of event-generating and event-responding code are packaged up as "components," like our POE::Component::RSSAggregator. There are also POE components that act as HTTP servers, IRC clients, watch for changes to files, or even control MP3 players. POE can look fiendish if you've never seen it before, but we'll explain it as we go along.

Subclassing CPAN

While I'm a big fan of doing as much as possible with CPAN modules, I'm not going to pretend that CPAN modules are always a perfect fit for the job in hand. However, they're usually pretty good, and if they're not a perfect fit, you can usually get some use out of them by subclassing and bending them to your will.

So with the current program, there are two slight mismatches that we need to fix. First, POE::Component::RSSAggregator uses XML::RSS::Feed objects, which are great, but unfortunately they assume that every time a new article appears, the old articles aren't new any more; this is perfectly good behavior if you're writing a news ticker where you only want to display each new item once. However, we're writing a news reader, and we want articles to stay new until the user has read them. So we get subclassing!

Looking at XML::RSS::Feed, we find that every time an XML file is parsed, it calls _mark_all_headlines_seen, and this method puts the ID of the headline object in the rss_headline_ids hash. So we create a new class that doesn't _mark_all_headlines_seen automatically, but does allow us to specify manually when a headline has been read:

package XML::RSS::Feed::Manual;
use base 'XML::RSS::Feed';
sub _mark_all_headlines_seen {} 
sub mark_read {
    my ($self, $head) = @_;
    $self->{rss_headline_ids}{$head->id} =1;
}

The second mismatch is that we want to use POE as our event loop so that POE::Component::RSSAggregator can post events about new articles. Unfortunately, we're also using Curses::UI, which has its own event loop. There's a Curses loop for POE, POE::Wheel::Curses, but we still need to glue it all together. So we look at Curses::UI's mainloop and see how it works:

sub mainloop ()
{
    my $this = shift;

    # Draw the initial screen.
    $this->focus(undef, 1); # 1 = forced focus
    $this->draw;
    doupdate();

    # Infinite event loop.
    for(;;)
    {
        $this->do_one_event
    }
}

We also find that do_one_event reads a key from the keyboard, unless $this->{-feedkey} is set to a pending keystroke. POE::Wheel::Curses also reads a key from the keyboard, so we can use that as our main loop, then have it feed the key that it has just read into $cui->{-feedkey} and call do_one_event: This will cause Curses::UI to dispatch the key to the appropriate widget and do the right thing. So we convert our dummy UI to use POE:

POE::Session->create(inline_states => {
    _start => sub {
        my ($heap) = $_[HEAP];
        $heap->{console} = 
            POE::Wheel::Curses->new(
               InputEvent => "got_keystroke"
            );
        $cui->focus(undef, 1);
        $cui->draw;
        Curses::doupdate();
    },
    got_keystroke => sub {
        $cui->{-feedkey} = $_[ARG0];
        $cui->do_one_event;
    }
});
POE::Kernel->run;

If POE is like an operating system, then POE::Session objects are its processes. An operating system with no processes is boring, so we create a new one. This session will respond to two events: the _start event is called when the session begins, and the got_keystroke will be called every time POE::Wheel::Curses sees a keystroke. Inside the start event, we set up POE::Wheel::Curses, and put its data on the "heap"—this is just a storage area that the POE kernel sets up for us, and means that the Curses handler is going to stick around for the whole of the application. We also tell the Curses wheel what event to fire when a key is pressed, and then we copy in the initialization code from Curses::UI's mainloop.

When a key is pressed, we feed the key into the Curses::UI object and run one event. Once we're all set up, we tell the POE kernel to run, and now we can test that our application correctly dispatches keystrokes from POE to Curses::UI.

Adding the News

So far so good, but now we need to think about the RSS handling part. For this month, we'll assume that we have a static list of feeds, like so:

my %feeds = (
    "http://planet.perl.org/rss10.xml" => 
        "Planet Perl",
    "http://slashdot.org/slashdot.rss" =>
        "Slashdot",
"http://interglacial.com/~sburke/torgo_x_upo.rss"
      => "Torgo-X zhoornal",
    # ...
);

Now we'll create our XML::RSS::Feed::Manual objects:

my @values;
my %labels;
while (my ($rss, $name) = each %feeds) {
    my $feed = XML::RSS::Feed::Manual->new(
        rss => $rss, name => $name
    );
    push @values, $feed;
    $labels{$feed} = $name;
}

We now have a list of feed objects and a hash that turns the feed object into a name. We can use these as the values and labels of our list box:

$feedbox->values(\@values);
$feedbox->labels(\%labels);

This means that the feed list will be full of items labeled according to the name of the feed: We will see a list "Planet Perl," "Slashdot," and so on, as we might expect. However, when one of these items is selected, we can call a method on the list box and get back the underlying XML::RSS::Feed::Manual object for that item.

For instance, as we select each feed in the feed list, we want the list of headlines to change. To do this, we add an onselchange handler to the feed list:

my $feedbox = $win->add('feedlistbox', 
    "Listbox", -vscrollbar => 1, 
    -onselchange => \&select_feed );

Now as we cursor up and down in that box, select_feed is called:

sub select_feed {
    my $feed = $feedbox->get_active_value;
    my @headlines = $feed->headlines;
    $headbox->values(\@headlines);
    $headbox->labels({
        map { $_ => $_->headline } @headlines
    });
    $headbox->layout_content->draw(1);
}

As mentioned, we can ask the feed box for the currently selected feed object. As this changes, we ask the feed for its headline objects, which are in the XML::RSS::Headline class. In the same way as with the feed box, we fill the headline box with these objects, and then give each object a label that is the human-readable headline. Since we have changed the data in the headline box, we need to force it to redraw, which we do with an incantation stolen from the Class::UI::Dialog::Filebrowser.

Next, as we select a headline, we want to display the story in the data window. So, in exactly the same way, we attach a handler to the headlines box:

my $headbox = $win2->add('headlistbox', "Listbox",
    -vscrollbar => 1, 
    -onselchange => \&select_story );

And this does much the same thing, looking at the headline object this time instead of the feed object:

sub select_story {
    my $head = shift->get_active_value;
    if ($head) {
        $viewer->text(
            HTML::FormatText->format_string(
                $head->description, 
                lm=>0, rm => $viewer->width -2
            )
        );
        my $feed = $feedbox->get_active_value;
        if (!$feed->seen_headline($head->id)) {
           $feed->mark_seen($head);
        }
    } else { $viewer->text("Nothing selected"); }
    $viewer->layout_content->draw(1);
}

We take the description from the headline object and format that up as plain text, putting the result in the viewer. Now we need to tell the feed that this item has been read—this currently doesn't make any difference, since we don't distinguish between read and unread items in the aggregator yet. But we will.

We're very nearly there. The only thing we don't have is data. But once we plug in POE::Component::RSSAggregator, we'll get that automatically. So in our session _start event, we add the following code:

$heap->{rssagg} = 
   POE::Component::RSSAggregator->new(
      alias => "rssagg"
   );
$kernel->post('rssagg', add_feed => $_) 
                           for @values;

This is all we need to do. Having told the aggregator about the feed objects that we want to watch, it will arrange with the POE scheduler to update them for us. When the program starts, the aggregator component will fire off a set of HTTP requests; when it sees a response to one of them, it will create the appropriate headline objects in the feed. The next time the user selects a new feed, the new headlines will already be there. Then the aggregator schedules another bunch of requests, 10 minutes later. We don't need to do any work.

Finishing Touches

The whole point of this exercise was for me to be able to quickly and easily check RSS news with a minimum of bandwidth. Unfortunately, at present, I have to check every single feed since there's no visual indication of which feeds have new items, or even which items are new and which have been seen before.

Here's one subroutine that will help:

sub bolden_news {
    my $labels = $feedbox->labels;
    for my $elem (@values) {
        if ($elem->late_breaking_news and $labels->{$elem} !~ /<bold>/) {
            $labels->{$elem} = "<bold>".$labels->{$elem}."</bold>"
        } elsif (!$elem->late_breaking_news) {
            $labels->{$elem} =~ s{</?bold>}{}g;
        }
    }
    $feedbox->labels($labels);
    $feedbox->layout_content->draw(1);
}

The key here is the late_breaking_news method on a feed object. It returns True if the feed contains any articles we haven't yet marked. We look at each of the feeds in the feed box; if there are some new articles, and the label doesn't have bold tags around it (this is the simple markup we were talking about when we created the windows), then we add some tags. If we've read everything in this feed, then we take the bold tags out again.

We can do the same trick for headlines in the headlines list:

sub select_feed {
    my $feed = $feedbox->get_active_value;
    my @headlines = $feed->headlines;
    $headbox->values(\@headlines);
    $headbox->labels({ map {
            my $head = $_->headline;
            $_ => (!$feed->seen_headline($_->id)
                    ? "<bold>$head</bold>" : $head)
        } $feed->headlines });

    $headbox->layout_content->draw(1);
}

If we haven't seen this headline before, we "bolden" it. Now we just need to arrange for bolden_news and select_feed to be called when the POE::Component::RSSAggregator sees new news:

my ($kernel, $heap, $session) = @_[KERNEL, HEAP, SESSION];
...
$heap->{rssagg} = POE::Component::RSSAggregator->new(
    alias    => 'rssagg',
    callback => $session->postback("handle_feed"),
);

sub handle_feed { bolden_news(); select_feed() }

When a new news item comes in, the aggregator asks the session to call the handle_feed subroutine; in this, bolden_news will highlight the feed that the new item is in, and select_feed will highlight the unread items.

Now we have a useful news aggregator that will keep track of new news articles for us while requiring minimal bandwidth. Next month, we'll see how to pull the feeds out of the NetNewsWire preferences and how to read and store history so that the information about which articles we've read becomes persistent.

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.