Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at [email protected].
In my recently released File::Finder module, one of the more interesting problems I found myself solving was how to evaluate the Boolean expression resulting from the find-like predicates. The find command evaluates the various tests using a combination of AND, OR, NOT, and parentheses operators with the traditional precedence rules, including short circuiting. For example:
find /tmp -size +30 -atime +3 -print
In this case, find first tries the size test. Only if that succeeds, the implied AND operator then tests the access time. Again, only if that succeeds do we finally evaluate the print operation, which always returns true. Had we included an OR operator in the chain:
find /tmp -size +30 -atime +3 -o -print
then the rules for short circuiting an OR apply: If the file is both big enough and old enough, the left side of the OR is true, so we do not execute the right side, and the printing is skipped. Similarly, if we move the OR:
find /tmp -size +30 -o -atime +3 -print
then the implied AND between the time rule and the printing rule now binds tighter, so we'll print only small old files now, instead of big ones.
In File::Finder, I chose to represent an expression such as this using an array of individual steps, with coderefs for the actual tests and simple strings for the operators (the AND operator always being implied). So, the test would look something like:
my @steps = ( \&code_for_testing_size_greater_than_30, "OR", \&code_for_testing_atime_greater_than_3, \&code_for_printing, );
In practice, these coderefs were generated by closures that held the magic 30 and 3 constants in closure variables, but let's not get distracted here. The code to evaluate a series of coderefs, stopping at the first false coderef, started out looking something like this:
my $state = 1; # 1 = true, 0 = false for (@steps) { if ($state) { # should we execute this? my $result = $_->(); # run the coderef unless ($result) { # did this fail? $state = 0; # result is false } } }
This ensures that we stop running steps when the first step fails. This is our implied AND operator. The reason we keep going rather than exiting the loop is that even if we end up in a false state, we need to notice if there's an OR operator. This complicates things a bit. We'll need to introduce a third state: skipping. Why? Because we now have three states:
true AND ... # we execute this, thus true state false AND ... # we don't execute this, thus false state false OR ... # we do execute this, thus true state true OR .. # we don't execute this, but... true OR ... OR ... # we don't want the third part to execute!
So, after the OR is seen in a true state, we need to enter a sticky false state; hence, skipping.
my $state = 1; # 1 = true, 0 = false, -1 = skipping for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef unless ($result) { # did this fail? $state = 0; # result is false } } } elsif ($_ eq "OR") { if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } }
So, an OR causes true to become skipping, false to become true, and skipping to stay skipping. If you work through this, you'll see that this three-state decision tree now properly handles the implied AND and the explicit OR operator.
But wait, there's more. Let's add a NOT prefix operator into the mix, represented as a "NOT" string for a step. And let's presume that we can stack them. I found the easiest way to handle that was to introduce a fourth state: true (but invert the next test), encoded as 2 in $state. Now we get something more complicated again:
my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not" for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state > 1; # flip result if needed $state = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state > 1; if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state == 2) { $state = 1; } elsif ($state == 1) { $state = 2; } } }
Note that if a NOT was seen before a coderef, we essentially need to act on the reverse of the outcome of the coderef, so we negate the result logically. Also note that it'd be wrong to copy $result into $state there because we don't demand that the coderefs return a simple 1 or 0: They can return any true/false value, so we have to normalize the results with a small question-colon operator.
It's also tempting to flip between the 2 and 1 in the last elsif block by trying something clever like:
$state = 3 - $state;
until you realize that $state could also be 0 or -1, and we're trying to leave those alone. It's always better to do some explicit tests than to try to perform fancy math to get it to work right. The one notable exception I've seen to that is the AM/PM modulator to convert a 24-hour hour into a 12-hour hour:
my $hour_12 = ($hour_24 + 11) % 12 + 1;
although the magical "11" constant there deserves a well-placed comment in the source.
At this point, we have a full expression evaluator for AND, OR, and NOT operators with proper relative precedence. Let's add the COMMA operator now, from GNU find. This has the effect of restoring a true-state in the expression, ignoring any previous values (although their side-effect has already been acted upon). Seems simple enough:
my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not" for (@steps) { if (ref $_) { # is it a coderef? if ($state > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state > 1; # flip result if needed $state = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state > 1; if ($state > 0) { # true becomes skipping $state = -1; } elsif ($state == 0) { # false becomes true $state = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state == 2) { $state = 1; } elsif ($state == 1) { $state = 2; } } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state > 1; $state = 1; # restore true } }
Now we're happy. We're dealing with NOT, AND, OR, and COMMA. At least, we remain happy until we remember that it's time to look at how to handle parentheses!
Shaking loose some old cobwebs in my brain, I remember an expression evaluator I saw some 30 years ago in a textbook that used a stack to handle the nested subexpressions. So, I took the same tactic here. (No doubt Mark Jason-Dominus will write me and tell me that a stack wasn't needed here, provided I applied clever trick N, but as I am a bear-of-very-little-brain, I'll take the straightforward approach.)
So, we'll do that by changing $state into @state, and every place we had $state before, we'll simply use $state[-1] (note that the top of the stack is the highest element). Without changing the functionality (or adding the paren-handling), that gets us this code:
my @state = (1); # start as true for (@steps) { if (ref $_) { # is it a coderef? if ($state[-1] > 0) { # should we execute this? my $result = $_->(); # run the coderef $result = !$result if $state[-1] > 1; # flip result if needed $state[-1] = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state[-1] > 1; if ($state[-1] > 0) { # true becomes skipping $state[-1] = -1; } elsif ($state[-1] == 0) { # false becomes true $state[-1] = 1; } } elsif ($_ eq "NOT") { # "true" and "not" swap states if ($state[-1] == 2) { $state[-1] = 1; } elsif ($state[-1] == 1) { $state[-1] = 2; } } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state[-1] > 1; $state[-1] = 1; # restore true } }
A little messier, but no change in functionality. Next, we have to figure out how the state changes as we enter and leave a parenthesized area. First, let's see what happens when we enter a parenthesized area:
not ( ... now true true ( ... now true false ( ... now skipping skipping ( ... now skipping
So, on a left paren, if $state[-1] is greater than 0, we push a 1, otherwise we push a -1. That's easy enough. The harder part is what to do at the end of the subexpression. The false or skipping value in front of the subexpression won't change at the end, so we can write that off right away. But we have six other combinations to consider: all the combinations of not or true before the subexpression crossed with true, false, and skipping at the end of the subexpression. Working through the combinations, we find that the rules are:
not ( ... true ) ... now false not ( ... false ) ... now true not ( ... skipping ) ... now false true ( ... true ) ... now true true ( ... false ) ... now false true ( ... skipping ) ... now true
So, it looks like we can take true and skipping xored whether there's a not-state in front of the subexpression. The one other thing is that comma no longer restores always to true: It restores to whatever was set up at the start of the subexpression. Adding that code back in to the engine gives us the final result:
my @state = (1); for (@steps) { if (ref $_) { if ($state[-1] > 0) { my $result = $_->(); $result = !$result if $state[-1] > 1; $state[-1] = ($result ? 1 : 0); } } elsif ($_ eq "OR") { die "NOT before OR!" if $state[-1] > 1; if ($state[-1] > 0) { $state[-1] = -1; } elsif ($state[-1] == 0) { $state[-1] = 1; } } elsif ($_ eq "NOT") { if ($state[-1] == 2) { $state[-1] = 1; } elsif ($state[-1] == 1) { $state[-1] = 2; } } elsif ($_ eq "LEFT") { push @state, ($state[-1] > 0 ? 1 : -1); } elsif ($_ eq "RIGHT") { die "RIGHT without LEFT!" unless @state > 1; die "NOT before RIGHT!" if $state[-1] > 1; my $child = pop @state; $child = !$child if $state[-1] > 1; # reverse for NOT $state[-1] = ($child ? 1 : 0) if $state[-1] > 0; # inherit state } elsif ($_ eq "COMMA") { die "NOT before COMMA!" if $state[-1] > 1; if (@state > 1) { # in subexpression, restore to initial value $state[-1] = ($state[-2] > 0 ? 1 : -1); } else { $state[-1] = 1; # restore true if at outer level } } }
Whew! I hope you enjoyed walking through this logic as much as I did in creating it. And if not, aren't you happy it's all nicely encapsulated in File::Finder so that you don't have to figure it out? Until next time, enjoy!
TPJ