brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].
I was having Sunday brunch with a colleague a couple of weeks ago, and we started talking about tracing the execution of a program. I wanted to figure out which subroutines the program was calling and which subroutines those subroutines were calling, and so on. Most of you have probably been given the Big Ball of Mud that the previous developer left behind right after he burned his hand-written documentation and right before he walked out the door with all of his personal possessions (and maybe a Springline stapler) never to be seen again. Where do you start?
My first requirement is that I don't want to modify the code to see what's going on. Truthfully, I don't want to ever modify the code and I wish I had never seen it, but back here on planet Earth a lot of people have had this same problem and you can find a lot of tools to help you tear apart a program while leaving its source intact. Look in the Devel::* namespace. Some people have even used things like GraphViz to make pictures out of the results.
After using some of those tools, I figure out which subroutines I need to pay attention to. Instead of working over the program flow with a sledge hammer, I want a scalpel. Consider this little program which makes a subroutine call (and pretend its in the middle of a Big Ball of Mud). I want to watch what happens when I pass those arguments to that subroutine, and I want to see what it returns. I don't want to modify the subroutine though.
#!/usr/bin/perl cant_touch_this( qw( Fred Barney Betty Wilma ) ); sub cant_touch_this { ...do stuff... }
The way that we already know involves a lot of code, and it's a lot of code that I have to repeat everywhere that I want to see what's going on (and yes, debuggers do exist, and I could set all sorts of breakpoints, but I'm going to pretend that doesn't exist since there is some other cool stuff coming up). I have one line to store the arguments ahead of the subroutine call so I can print them and then pass them to the subroutine, and then I have to save the result of the subroutine so I can print that afterwards. I could make this a bit shorter with some Perl golf, but things are already getting to look ugly.
my @args = qw( Fred Barney Betty Wilma ); print "Args are [@args]\n"; my $result = cant_touch_this( @args ); print "Result was $result\n";
One possibility is Damian Conway's Hook::LexWrap
. I can define
handlers that execute before and after the actual subroutine call, and
those will work for every call, not just one (more on that coming up).
I load the Hook::LexWrap
module and use its wrap()
function to tell it which subroutine I want to examine. Now my program is only a bit
longer
#!/usr/bin/perl use Hook::LexWrap; wrap 'cant_touch_this', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; cant_touch_this( qw( Fred Barney Betty Wilma ) ); sub cant_touch_this { return 42; }
The Hook::LexWrap
wrappers get the original argument list in their @_
,
but with and additional value at the end. That extra value is the
return value. In the pre-wrapper the return value will be undefined
(we wouldn't have to call the subroutine if we already knew the
value!), and that element could have some value after the original
subroutine runs. In my post-wrapper, I just want to see the result, so
I only look at $_[-1]
.
However, when I run this, I don't see the a result. Why not? I called
cant_touch_this()
in void context, so there is no return value.
The arguments are [Fred Barney Betty Wilma ] Result was []
I only get the return value in the post wrapper when I would actually save the result. I have to call the routine in either scalar or list context.
#!/usr/bin/perl use Hook::LexWrap; wrap 'cant_touch_this', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; my $result = cant_touch_this( qw( Fred Barney Betty Wilma ) ); sub cant_touch_this { return 42; }
Now I see the return value.
The arguments are [Fred Barney Betty Wilma ] Result was [42]
There are other modules that do this, such as Hook::PreAndPost
and
Hook::WrapSub
, but if Damian is writing a module, there must be some
cool trick to it. In Hook::LexWrap
, caller()
keeps working.
#!/usr/bin/perl use Hook::LexWrap; wrap 'who_called_me', pre => sub { print "pre caller is @{[caller]}\n" }, post => sub { print "post caller is @{[caller]}\n" }; who_called_me(); sub who_called_me { print "caller is @{[caller]}\n" }
When I run this, I get the same output for all three print statements because all of the subroutines think they are being called from the same place: not only the same file and package, but they also think they come from the same line.
pre caller is main /Users/brian/Desktop/wrap-caller.pl 9 caller is main /Users/brian/Desktop/wrap-caller.pl 9 post caller is main /Users/brian/Desktop/wrap-caller.pl 9
Think about what has to actually happen behind the scenes to make this happen: we have to construct a brand new subroutine that gets the original argument list, calls the pre-subroutine, calls the original subroutine, and then calls the post-subroutine. Not only does this brand new subroutine have to do all that, it has to insert itself into the named slot where the original subroutine lives. Damian uses a lot of typeglob magic to move the original subroutine out of the way (more on this coming up), and more typeglob magic to move the replacement subroutine into its place.
Let me give you just a little taste of what is going on in there.
Here's the bit from Hook::LexWrap
that stores the original subroutine
right before the module replaces it. Simple, right? Grab the
subroutine definition out of the symbol table and store it in
$original
. The $typeglob
is the first argument to wrap()
, which is the
subroutine name. You noticed the lack of strict
at the top of the
module, right? For extra credit, which of the three strict checks does
this violate? It's okay to do this stuff if you know why you need to
do it, but don't try this at home.
my $original = ref $typeglob eq 'CODE' && $typeglob || *$typeglob{CODE} || croak "Can't wrap non-existent subroutine ", $typeglob;
Now that the original subroutine is out of the way, we can make the
new one. I'll spare you the details on the inside. If you have a hour
(or a weekend) free, call up the source of Hook::LexWrap
and take a
look between these two braces.
$imposter = sub { ... };
Once we have the $imposter
subroutine, we shove it into the place where
the original subroutine use to live.
*{$typeglob} = $imposter;
It's actually not that much code if you care to peek under the hood.
Still, with all that magic, the replacement subroutine is virtually
invisible to caller()
. It's like its not there even though it is.
That's not enough though. As I've done it so far in this article,
every instance of the target subroutine gets the new behavior. Check
out the name of the module: it's got that "Lex" in it, and just like
other things in Perl that start with those letters, I can limit the
scope of this effect. In this example, I have to wrap the subroutine
differently. Previously, I called wrap()
in void context (meaning I
did not use the result for anything, so wantarray
returns undef
) so
Hook::LexWrap
made the effect global. If I store the result of wrap()
in a lexical variable, the effect disappears when that lexical
variable goes out of scope. That is, I've effectively unwrapped the
subroutine at the end of the naked block which I used to define the
scope.
#!/usr/bin/perl use Hook::LexWrap; { my $lexical = wrap 'who_called_me', pre => sub { print "pre caller is @{[caller]}\n" }, post => sub { print "post caller is @{[caller]}\n" }; who_called_me(); } who_called_me(); sub who_called_me { print "caller is @{[caller]}\n" }
If Hook::LexWrap
is the scalpel that I mentioned earlier, this lexical
feature is arthroscopic surgery with miniature cameras. This is pretty
close to what my colleague and I were talking about at brunch. This
module can have limited scope by wrapping particular subroutines, and
it can even have a shorter effect by working only within a lexical scope.
I still have to add some code, and that's what I really wanted to avoid.
Why can't I just turn this on from the command line and let it do its
magic? Well, maybe I can, but you'll have to read about that in a future
article.
TPJ