Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Watching a Logfile in an IRC Channel


February, 2004: Watching a Logfile In an IRC Channel

Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at [email protected].


The other day, I was hanging out on IRC when Keven Lenzo (creator of the YAPC conference and current director of The Perl Foundation) popped in and asked if anyone had written a tool to dribble a slowly growing file into an IRC channel. I jumped right in and said, "That can be done rather simply with POE," then realized that I hadn't actually done anything quite like that yet.

In this column last year (May 2003), I had written a file watcher that provided a web interface, but not a file watcher that would write to IRC. Since I hadn't yet played much with the POE IRC component, I grabbed the ball and ran with it.

As an interesting part of the problem, I had to figure out how to deal with IRC's antiflood provisions. Most IRC servers these days will boot off any IRC client that sends too much text at once, and the POE IRC component throttles messages by default, dribbling them out at a rate that appears to be about three messages every 10 seconds. But a logfile that is being watched might grow faster than that. It wouldn't make sense for my bot to just spew all the lines blindly, getting more and more behind as time went on, so I had to have my code figure out when I was being throttled. Luckily, since the code is all open source, I was able to peer inside and figure out how to adapt. The result is in Listing 1.

Lines 1 through 3 are my standard preamble, turning on warnings, enabling "big program" compiler restrictions, and disabling buffering on standard output.

Lines 5 through 22 delimit the "user serviceable parts" from the rest of the file. Line 7 is the IRC nickname to be used by this bot. Lines 8 through 12 define to which server the bot connects, and the bot's real name in a manner consistent with the POE IRC component's initialization parameters. Line 13 defines the channel that the bot will join, sending public messages to that channel. Line 14 defines an internal alias for the virtual IRC client created by the POE IRC Component.

Lines 16 to 20 define the files that the bot will be watching. Each key should be a short identifying string that will show up in the bot's output and the value of the corresponding file to be tailed. In this case, I'm watching my web server's access and error log files.

Lines 24 and 25 are used in the throttling process, described later. They have to be defined outside the handler that uses them so that they will persist between handler invocations.

Line 27 pulls in POE, as well as the POE components for IRC and for following a file.

Line 29 creates the virtual IRC client, giving the session the name defined in $IRC_ALIAS.

Lines 31 through 119 define a second session. If the first session is like the IRC client, the second session being created here is like the virtual human running the client. As such, we'll have steps here to cause the IRC client to connect to the server, then react as the IRC client notices various conditions. This session doesn't have a name. For the most part, the session's behavior is defined by a series of inline states, beginning in line 33.

Lines 33 to 36 define the actions taken by this session as it is starting up. The session pokes at the virtual IRC client to have it respond to all IRC events, then it tries to connect the IRC client to its server.

If the connection is successful, eventually an irc_255 event is posted back to the virtual-human session (at the end of all the connecting messages), triggering the handler beginning in line 37. This handler tells the IRC client to join a particular channel (line 38), and then start up the heartbeat and listeners (lines 39 and 40). The details of the startup are given later.

Lines 42 through 49 cause null handlers to be installed for many common but uninteresting IRC events. This list started out empty, but after running my bot for a while, I figured out more things that I should ignore (because they were making noise in my default handler) and added them here.

Similarly, line 50 keeps _child events from hitting the default handler because I have nothing to do for that either.

Lines 51 to 57 define a _default event handler, mostly so that I can diagnose all the events being sent to my virtual human, including dumping the data in a nice way.

Up to this point, we have a fairly generic but stupid IRC bot that knows how to connect to a server and join a channel. The remaining events give the bot its personality. The my_add handler (defined starting in line 58) tells the bot to start watching a particular file. We'll do this by adding a watcher session using POE::Wheel::FollowTail.

First, lines 59 and 60 set up two closure variables: the key into the %FOLLOWS hash, and the session ID of the virtual-human session. Lines 61 to 76 create the watcher session, watching the filename from the %FOLLOWS hash (in line 67) and handling events at each new noticed line via the got_line handler (line 68).

The got_line handler posts an event back to the virtual-human session to the my_tailed handler, including the timestamp, short identifying word, and the line of text that was seen. As each new line is noticed by the POE wheel, we'll get another event in sequence, carefully timestamped and labeled appropriately.

Although this strategy makes some logical sense (having a separate session for each wheel), at the cost of some complexity, I could have also done the work entirely with one session. I usually go for the dirt-simple methods first, though, and it worked for me this way.

Now that we have line events flowing in, it's time to handle them using the handler starting in line 79. This handler is normally called with the timestamp (integer), the file, and the line of text, which are extracted in line 80. However, under circumstances described shortly, this handler may also be called with no parameters, leaving these three variables undef.

Because I wanted this virtual human to be aware of the built-in throttling of the virtual IRC client (to avoid the heavy-handed throttling of the real IRC server), I have to reach under the hood in lines 85 and 86. By staring at the source code for POE::Component::IRC, I was able to determine that the arrayref stored in its heap at send_queue contained one element per deferred IRC message. If the array is empty, then the next message I send will most likely be sent immediately to the IRC server. I cache this arrayref in the outer $SEND_QUEUE variable just to avoid the bizarre sequences of steps to find it each time.

Lines 89 to 111 implement the throttling-aware transmission. I chose to implement a bit of hysteresis in the design. When the queued elements reach three or more, I start discarding and keep discarding until the queue is completely empty. With this strategy, the discarded lines tend to come in clumps, rather than throwing away every other line or every third line. I chose three as the threshold because it appears that it takes about 10 seconds to dump three average-sized messages.

The variable $SKIPPING keeps track of this state. If it's 0, we're just pushing things through as they arrive. Otherwise, it's the number of messages that we've had to discard because we're skipping.

Lines 89 to 93 handle the skipping to nonskipping transition. If we're currently skipping, but the queue is now empty, then it reports how many messages were discarded by sending a message to the channel (lines 90 and 91) and then resets the $SKIPPING variable.

Lines 96 to 106 execute only if we have a line to display, verified by a nonfalse $time value in line 96.

Lines 97 to 98 handle the transition from nonskipping to skipping, if needed. If we're already skipping, or the queue has now exceeded three messages, then we discard the line and note that it has been discarded (line 98).

Otherwise, it's time to say something. The timestamp is expanded to the time components (seconds, minutes, hours, and so on) in line 100. Line 101 sends a public message to the channel, labeled with the timestamp in HH:MM:SS 12-hour format and the file from which the line comes. The fancy bit of math in line 103 converts 0-23 into 12,1-12,1-11, as needed.

Lines 109 to 111 cause this handler to be reinvoked automatically every half second if we're currently skipping. The purpose of this action is to display the discarded N messages message as quickly as possible, instead of waiting for more input to come from the file. Note that we reinvoke the handler but pass no parameters, which will make $time false and will cause the middle portion of this handler to be skipped. Using $_[STATE] here is a shortcut meaning, "The same handler as the one we're in."

Finally, lines 114 to 117 define the heartbeat task. Every 10 seconds, a line that looks like "HH:MM:SS: heartbeat: beep" is shown in the channel. I did this for testing before I had the file-tailing added, but I like how it keeps channel occupants aware that there's stuff coming in from the bot.

Once the POE event handlers and sessions are created, the final line of this program (line 121) tells POE to start the event loop, and away we go. This program never exits because there are always sessions that are alive, but a quick INT character (usually Control-C) takes care of the appropriate termination. The bot rudely disconnects from the IRC server: If I had been more concerned, I could have set up a _signal handler to send a proper quit message to the IRC server with an explanation.

Hopefully, you've seen how easy it is to construct an IRC bot. I must caution you, though, that many people (myself included) despise noisy or spooky bots, so have some common sense when you set these up. Until next time, enjoy!

TPJ



Listing 1

=1=     #!/usr/bin/perl -w
=2=     use strict;
=3=     $|++;
=4=     
=5=     ## CONFIG
=6=     
=7=     my $NICK = 'weblogger';
=8=     my $CONNECT =
=9=       {Server => 'irc.some.borg',
=10=       Nick => $NICK,
=11=       Ircname => 'weblogger: see [email protected]',
=12=      };
=13=    my $CHANNEL = '#weblogger';
=14=    my $IRC_ALIAS = "irk";
=15=    
=16=    my %FOLLOWS =
=17=      (
=18=       ACCESS => "/var/log/access_log",
=19=       ERROR => "/var/log/error_log",
=20=      );
=21=    
=22=    ## END CONFIG
=23=    
=24=    my $SKIPPING = 0;               # if skipping, how many we've done
=25=    my $SEND_QUEUE;                 # cache
=26=    
=27=    use POE qw(Component::IRC Wheel::FollowTail);
=28=    
=29=    POE::Component::IRC->new($IRC_ALIAS);
=30=    
=31=    POE::Session->create
=32=      (inline_states =>
=33=       {_start => sub {      
=34=          $_[KERNEL]->post($IRC_ALIAS => register => 'all');
=35=          $_[KERNEL]->post($IRC_ALIAS => connect => $CONNECT);
=36=        },
=37=        irc_255 => sub {            # server is done blabbing
=38=          $_[KERNEL]->post($IRC_ALIAS => join => $CHANNEL);
=39=          $_[KERNEL]->yield("heartbeat"); # start heartbeat
=40=          $_[KERNEL]->yield("my_add", $_) for keys %FOLLOWS;
=41=        },
=42=        (map
=43=         {
=44=           ;"irc_$_" => sub { }}
=45=         qw(join public
=46=            connected snotice ctcp_action ping notice mode part quit
=47=            001 002 003 004 005
=48=            250 251 252 253 254 265 266
=49=            332 333 353 366 372 375 376)),
=50=        _child => sub {},
=51=        _default => sub {
=52=          printf "%s: session %s caught an unhandled %s event.\n",
=53=            scalar localtime(), $_[SESSION]->ID, $_[ARG0];
=54=          print "The $_[ARG0] event was given these parameters: ",
=55=            join(" ", map({"ARRAY" eq ref $_ ? "[@$_]" : "$_"} @{$_[ARG1]})), "\n";
=56=          0;                        # false for signals
=57=        },
=58=        my_add => sub {
=59=          my $trailing = $_[ARG0];
=60=          my $session = $_[SESSION];
=61=          POE::Session->create
=62=              (inline_states =>
=63=               {_start => sub {
=64=                  $_[HEAP]->{wheel} =
=65=                    POE::Wheel::FollowTail->new
=66=                        (
=67=                         Filename => $FOLLOWS{$trailing},
=68=                         InputEvent => 'got_line',
=69=                        );
=70=                },
=71=                got_line => sub {
=72=                  $_[KERNEL]->post($session => my_tailed =>
=73=                                   time, $trailing, $_[ARG0]);
=74=                },
=75=               },
=76=              );
=77=        }
=78=       },
=79=        my_tailed => sub {
=80=          my ($time, $file, $line) = @_[ARG0..ARG2];
=81=          ## $time will be undef on a probe, or a time value if a real line
=82=    
=83=          ## PoCo::IRC has throttling built in, but no external visibility
=84=          ## so this is reaching "under the hood"
=85=          $SEND_QUEUE ||=
=86=            $_[KERNEL]->alias_resolve($IRC_ALIAS)->get_heap->{send_queue};
=87=    
=88=          ## handle "no need to keep skipping" transition
=89=          if ($SKIPPING and @$SEND_QUEUE < 1) {
=90=            $_[KERNEL]->post($IRC_ALIAS => privmsg => $CHANNEL =>
=91=                             "[discarded $SKIPPING messages]");
=92=            $SKIPPING = 0;
=93=          }
=94=    
=95=          ## handle potential message display
=96=          if ($time) {
=97=            if ($SKIPPING or @$SEND_QUEUE > 3) { # 3 msgs per 10# seconds
=98=              $SKIPPING++;
=99=            } else {
=100=             my @time = localtime $time;
=101=             $_[KERNEL]->post($IRC_ALIAS => privmsg => $CHANNEL =>
=102=                              sprintf "%02d:%02d:%02d: %s: %s",
=103=                              ($time[2] + 11) % 12 + 1, $time[1], $time[0],
=104=                              $file, $line);
=105=           }
=106=         }
=107=   
=108=         ## handle re-probe/flush if skipping
=109=         if ($SKIPPING) {
=110=           $_[KERNEL]->delay($_[STATE] => 0.5); # $time will be undef
=111=         }
=112=   
=113=       },
=114=       my_heartbeat => sub {
=115=         $_[KERNEL]->yield(my_tailed => time, "heartbeat", "beep");
=116=         $_[KERNEL]->delay($_[STATE] => 10);
=117=       }
=118=      },
=119=     );
=120=   
=121=   POE::Kernel->run;
Back to article


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.