brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].
Plain Old Documentation, or simply Pod, is a simple text format for embedded documentation. The pod format is described in the perlpod man page, so if you aren't already familiar with it, check out its documentation before you read on. You can translate the pod format to man pages, LaTeX, rich text, and several other formats with programs that come with Perl. You don't have to stop there, though, if you want to write your own pod translator. In this article, I'll write a simple translator to illustrate the process. The trick is to start with something that somebody has already done.
Sean Burke recently rewrote the pod tools as modules, making it
extremely easy to write your own translator. His Pod::Simple
module handles all of the parsing. If you want to do something
different, you subclass Pod::Simple
by following the instructions
in Pod::Simple::Subclassing
. Sean certainly went the extra mile
by providing three modules to parse pod: event-based using
Pod::Simple::Methody
, token-based using Pod::Simple::PullParser
,
and XML-like-parser Pod::Simple::SimpleTree
. One of these ways is
going to work for you, and if they don't, there are some
specialized subclasses you may want to use, but I don't discuss
those here.
Randal Schwartz, Tom Phoenix, and I recently updated Learning
Perl to its fourth edition. Being that Randal Schwartz is the
original author, the sources are in Pod, although they are in a
special sort of Pod called PseudoPod that O'Reilly Media uses. It
has extra features to handle footnotes (you'll notice quite a bit
of those in Learning Perl), cross-references, index entries,
and a few other things mentioned in Pod::PseudoPod::Tutorial
.
O'Reilly editor Allison Randal wrote the Pod::Simple
subclass
Pod::PseudoPod
as a base class for PseudoPod translators.
Let's start with a basic PseudoPod document. I've taken this pod directly from the Learning Perl sources. It's part of Chapter 11, which covers the file test operators. Notice the N<> sequence which denotes a footnote.
# $Id: ch11.pod 108 2005-04-04 21:31:46Z brian $ =pod =head0 File Tests Earlier, we showed how to open a filehandle for output. Normally, that will create a new file, wiping out any existing file with the same name. Perhaps you want to check that there isn't a file by that name. Perhaps you need to know how old a given file is. Or perhaps you want to go through a list of files to find which ones are larger than a certain number of bytes and not accessed for a certain amount of time. Perl has a complete set of tests you can use to find out information about files. =head1 File Test Operators The third example is more complex. Here, let's say that disk space is filling up and rather than buy more disks, we've decided to move any large, useless files to the backup tapes. So let's go through our list of filesN<It's more likely that, instead of having the list of files in an array, as our example shows, you'll read it directly from the filesystem using a glob or directory handle, as we show in Chapter 12. Since you haven't seen that yet, we'll just start with the list and go from there.> to see which of them are larger than 100 K. But even if a file is large, we shouldn't move it to the backup tapes unless it hasn't been accessed in the last 90 days (so we know that it's not used too often):N<There's a way to make this example more efficient, as you'll see by the end of the chapter.> =cut
I want to translate this to something else. If I want to
translate it to HTML, like I did when I wanted to provide the
reviewers with something a bit easier to read, most of my work is
already done. I use Pod::PseudoPod::HTML
, set a few options,
and tell it where to send the output.
#!/usr/bin/perl use strict; use Pod::PseudoPod::HTML; foreach my $file ( @ARGV ) { my $parser = Pod::PseudoPod::HTML->new(); $parser->no_errata_section(1); # don't put errors in doc output $parser->complain_stderr(1); # output errors on STDERR instead unless( -e $file ) { warn "Unable to open '$file': $!\n"; next; } $parser->output_fh( *STDOUT ); $parser->parse_file( $file ); }
Using the basic script, I get some simple HTML. It's nothing
really fancy, but it gets the job done. Notice that there isn't
any HTML <HEAD> section or opening <BODY> tag, the footnotes are
actually inline with the body text, and there is nothing at the
end (there are Pod::PseudoPod::HTML
options to fix this, but I'm
going to change all that so I'll skip talking about those).
<h1>File Tests</h1> <p>Earlier, we showed how to open a filehandle for output. Normally, that will create a new file, wiping out any existing file with the same name. Perhaps you want to check that there isn't a file by that name. Perhaps you need to know how old a given file is. Or perhaps you want to go through a list of files to find which ones are larger than a certain number of bytes and not accessed for a certain amount of time. Perl has a complete set of tests you can use to find out information about files.</p> <h2>File Test Operators</h2> <p>The third example is more complex. Here, let's say that disk space is filling up and rather than buy more disks, we've decided to move any large, useless files to the backup tapes. So let's go through our list of files (footnote: It's more likely that, instead of having the list of files in an array, as our example shows, you'll read it directly from the filesystem using a glob or directory handle, as we show in Chapter 12. Since you haven't seen that yet, we'll just start with the list and go from there.) to see which of them are larger than 100 K. But even if a file is large, we shouldn't move it to the backup tapes unless it hasn't been accessed in the last 90 days (so we know that it's not used too often): (footnote: There's a way to make this example more efficient, as you'll see by the end of the chapter.)</p>
Now I want to change the output. I don't want the stuff that
Pod::PseudoPod::HTML
gives me, so I need to override some of its
behavior. First, I want to change the header and the footer. I'll
create my own subclass, Pod::PseudoPod::MyHTML
that does this. My
class will inherit from Pod::PseudoPod::HTML
and replace just the
bits that I want. Anything I don't replace in my new subclass
still does it the Pod::PseudoPod::HTML
way.
The beginning and ending portions of Pod::PseudoPod::HTML
's
output are decided by the two methods start_Document
and
end_Document
. It's using the event-like processing where each
event has its own method to handle it, and defining a new
document as a sort of event. I pulled the method sources directly
from Pod::PseudoPod::HTML
0.12. Each method adds text to a
scratchpad called 'scratch,' then sends it to the output channel
by calling emit()
, which also clears the scratchpad.
# Pod::PseudoPod::HTML sub start_Document { my ($self) = @_; if ($self->{'body_tags'}) { $self->{'scratch'} .= "<html>\n<body>"; $self->{'scratch'} .= "\n<link rel='stylesheet' href='style.css' type='text/css'>" if $self->{'css_tags'}; $self->emit('nowrap'); } } sub end_Document { my ($self) = @_; if ($self->{'body_tags'}) { $self->{'scratch'} .= "</body>\n</html>"; $self->emit('nowrap'); } }
I'll change start_Document
to output something that's a bit better. I'll
include a document type declaration, a proper <HEAD> section, and some
other goodies.
# Pod::PseudoPod::MyHTML sub start_Document { my ($self) = @_; $self->{'scratch'} .= <<"HTML"; <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN"> <html> <head> <title>This is a page</title> </head> <body> HTML $self->emit('nowrap'); }
I'll change end_Document
so I can add a "last modified" and copyright
statement to the end of each page.
# Pod::PseudoPod::MyHTML sub end_Document { my ($self) = @_; $self->{scratch} .= "<hr />\n"; $self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n"; $self->{scratch} .= "Copyright (c) brian d foy\n"; $self->{scratch} .= "</body></html>\n"; $self->emit('nowrap'); }
Good enough. I changed a couple of short methods to do what I
wanted, and I get different output. I can do the same for the
other pod parts I run into. For instance, by default,
Pod::PseudoPod::HTML
turns the =head
directives into the
equivalent HTML header tags. Text after =head1
is wrapped by an
<H2> tag in the output (and yes, that's off by one since an HTML
document should only have one <H1> tag, but Pod usually has many
=head1
directives, so everything moves down a level for the
HTML). The start_head1()
method builds up text in the scratchpad,
and that text sticks around until the end of the =head1
event
when end_head1()
emits the text.
# Pod::PseudoPod::HTML sub start_head1 { $_[0]{'scratch'} = '<h2>' } sub end_head1 { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() }
Instead of a simple <H2> tag, I want to add some stylesheet information. It doesn't really matter how I add the style information: if you don't like my way you now know how to make it happen your way.
# Pod::PseudoPod::MyHTML sub start_head1 { $_[0]{'scratch'} = '<h2 class="main_header">' } sub end_head1 { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() }
That works for the pod directives, but what about the escape
sequences like E<>, L<>, and so on? The PseudoPod format defines
a new N<> escape sequence for footnotes. When I encounter that
sequence, the start_N()
method gets control. When I finish it,
the end_N()
sequence takes over. It doesn't emit the text because
we're in the middle of handling text. Somebody else will decide
what to do. The default action of Pod::PseudoPod::HTML
is to
simply put the footnotes inline with the text.
# Pod::PseudoPod::HTML sub start_N { my ($self) = @_; $self->{'scratch'} .= '<font class="footnote">' if ($self->{'css_tags'}); $self->{'scratch'} .= ' (footnote: '; } sub end_N { my ($self) = @_; $self->{'scratch'} .= ')'; $self->{'scratch'} .= '</font>' if $self->{'css_tags'}; }
I want to put the footnotes at the end of the text. I already know how to change the ending of the document, so I know I can handle the footnotes there. To make them show up at the end of the page, I need to store them until I am ready for them. I'll create a new object data member (which I probably shouldn't be looking at since it breaks encapsulation) to hold the footnotes. I'll initialize this accumulator in the constructor and add a method to add the footnotes.
# Pod::PseudoPod::MyHTML sub new { my $self = shift; my $new = $self->SUPER::new(); $new->{'footnotes'} = []; return $new; } sub push_footnote { my $self = shift; push @{ $self->{footnotes} }, @_; }
Once I have that, I modify the start_N()
and end_N()
to set a
flag telling me that I'm in the middle of a footnote. I'll insert
a footnote link into the scratchpad in start_N()
. It gets a bit
tricky here since handle_text()
is going to get control inside
the N<> sequence, but I need to handle the paragraph and footnote
text separately. I'll have to look at the flag for the footnote
and use that in handle_text()
to decide what to do, and do that
without interfering with the normal paragraph processing. I'll
leave the paragraph handling as is and build up the footnotes in
a separate scratchpad. When I end the N<> sequence, I'll push the
footnote onto my stack for later and clear the footnote flag and
scratchpad.
sub start_N { my ($self) = @_; $_[0]{'footnote_flag'} = 1; my $fn = ++$_[0]{'footnote_count'}; $_[0]{'scratch'} .= qq|<sup><a href="#f$fn">[$fn]</a></sup>|; } sub end_N { my ($self) = @_; $self->push_footnote(); $_[0]{'footnote_flag'} = 0; $_[0]{'footnote_text'} = ''; } sub handle_text { my $scratch = $_[0]{footnote_flag} ? 'footnote_text' : 'scratch'; $_[0]{$scratch} .= $_[0]{'in_verbatim'} ? encode_entities( $_[1] ) : $_[1] }
Finally, when I'm at the end of the document and ready to print
footnotes, I use format_footnotes()
to format the data I saved in
the footnote stack. I modify my end_Document()
method to call my
footnote formatter and wrap some text around it.
sub format_footnotes { $_[0]{'scratch'} .= "<h2>Footnotes</h2>\n\n<ol>\n"; my $fn = 0; foreach my $footnote ( @{ $_[0]{'footnotes'} } ) { $fn++; $_[0]{'scratch'} .= qq|\t<li><a name="f$fn">$footnote</a></li>\n|; } $_[0]{'scratch'} .= "</ol>\n\n"; } sub end_Document { my ($self) = @_; $self->{scratch} .= "<hr />\n"; $self->add_footnotes; $self->{scratch} .= "<hr />\n"; $self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n"; $self->{scratch} .= "Copyright (c) brian d foy\n"; $self->{scratch} .= "</body></html>\n"; $self->emit('nowrap'); }
Putting all of that together gives me my little Pod::PseudoPod::MyHTML
module.
package Pod::PseudoPod::MyHTML; use strict; use base 'Pod::PseudoPod::HTML'; use HTML::Entities qw(encode_entities); sub new { my $self = shift; my $new = $self->SUPER::new(); $new->{'footnotes'} = []; return $new; } sub push_footnote { my $self = shift; push @{ $self->{'footnotes'} }, $self->{'footnote_text'}; } sub format_footnotes { $_[0]{'scratch'} .= "<h2>Footnotes</h2>\n\n<ol>\n"; my $fn = 0; require Data::Dumper; #$_[0]{'scratch'} .= Data::Dumper::Dumper( $_[0] ); foreach my $footnote ( @{ $_[0]{'footnotes'} } ) { $fn++; $_[0]{'scratch'} .= qq|\t<li><a name="f$fn">$footnote</a></li>\n|; } $_[0]{'scratch'} .= "</ol>\n\n"; } sub start_Document { my ($self) = @_; $self->{'scratch'} .= <<"HTML"; <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Strict//EN"> <html> <head> <title>This is a page</title> </head> <body> HTML $self->emit('nowrap'); } sub end_Document { my ($self) = @_; $self->{scratch} .= "<hr />\n"; $self->format_footnotes; $self->{scratch} .= "<hr />\n"; $self->{scratch} .= "Last Modified: " . localtime() . "<br/>\n"; $self->{scratch} .= "Copyright (c) brian d foy\n"; $self->{scratch} .= "</body></html>\n"; $self->emit('nowrap'); } sub start_head1 { $_[0]{'scratch'} = '<h2 class="main_header">' } sub end_head1 { $_[0]{'scratch'} .= '</h2>'; $_[0]->emit() } sub start_N { my ($self) = @_; $_[0]{'footnote_flag'} = 1; my $fn = ++$_[0]{'footnote_count'}; $_[0]{'scratch'} .= qq|<sup><a href="#f$fn">[$fn]</a></sup>|; } sub end_N { my ($self) = @_; $self->push_footnote(); $_[0]{'footnote_flag'} = 0; $_[0]{'footnote_text'} = ''; } sub handle_text { my $scratch = $_[0]{footnote_flag} ? 'footnote_text' : 'scratch'; $_[0]{$scratch} .= $_[0]{'in_verbatim'} ? encode_entities( $_[1] ) : $_[1] } 1;
And here's my little script that uses my module. The script is
almost identical to the one I showed you before save the
different parser module name. All of the good stuff is in the
module Pod::PseudoPod::MyHTML
.
#!/usr/bin/perl use strict; require "MyHTML.pm"; my $parser = Pod::PseudoPod::MyHTML->new(); foreach my $file ( @ARGV ) { $parser->no_errata_section(1); $parser->complain_stderr(1); unless( -e $file ) { warn "Unable to open $file: $!\n"; next; } $parser->output_fh( *STDOUT ); $parser->parse_file( $file ); }
The rest is the same. For whatever you want to do, simply take
the appropriate method and make it happen. It's SMOP (a Simple
Matter of Programming). All of the hard work is already done for
you by Pod::Simple
and some of its subclasses. Now that you've
made it this far in the article, you should be able to modify a
Pod::Simple subclass to do just about anything you want. Good
luck!
PS: In my last article I mentioned a one argument form of open()
.
Did anyone bother to find out what it was? With an explicit
filename to open, open()
looks in the scalar package variable
with the same name as the filehandle (meaning you have to name
the filehandle and not use a variable). For instance, if you say
open FILE;
, perl will look in the scalar variable $FILE
for
the filename. Remember all that one-liner magic for reading from
files? The current file name shows up in $ARGV
and perl uses the
ARGV
filehandle.
TPJ