Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Evaluating Short-Circuited Boolean Expressions


January, 2004: Evaluating Short-Circuited Boolean Expressions

Randal is a coauthor of Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming, as well as a founding board member of the Perl Mongers (perl.org). Randal can be reached at [email protected].


In my recently released File::Finder module, one of the more interesting problems I found myself solving was how to evaluate the Boolean expression resulting from the find-like predicates. The find command evaluates the various tests using a combination of AND, OR, NOT, and parentheses operators with the traditional precedence rules, including short circuiting. For example:

find /tmp -size +30 -atime +3 -print

In this case, find first tries the size test. Only if that succeeds, the implied AND operator then tests the access time. Again, only if that succeeds do we finally evaluate the print operation, which always returns true. Had we included an OR operator in the chain:

find /tmp -size +30 -atime +3 -o -print

then the rules for short circuiting an OR apply: If the file is both big enough and old enough, the left side of the OR is true, so we do not execute the right side, and the printing is skipped. Similarly, if we move the OR:

find /tmp -size +30 -o -atime +3 -print

then the implied AND between the time rule and the printing rule now binds tighter, so we'll print only small old files now, instead of big ones.

In File::Finder, I chose to represent an expression such as this using an array of individual steps, with coderefs for the actual tests and simple strings for the operators (the AND operator always being implied). So, the test would look something like:

my @steps = (
\&code_for_testing_size_greater_than_30,
"OR",
\&code_for_testing_atime_greater_than_3,
\&code_for_printing,
);

In practice, these coderefs were generated by closures that held the magic 30 and 3 constants in closure variables, but let's not get distracted here. The code to evaluate a series of coderefs, stopping at the first false coderef, started out looking something like this:

my $state = 1; # 1 = true, 0 = false
for (@steps) {
if ($state) { # should we execute this?
  my $result = $_->(); # run the coderef
  unless ($result) { # did this fail?
    $state = 0; # result is false
  }
}
}

This ensures that we stop running steps when the first step fails. This is our implied AND operator. The reason we keep going rather than exiting the loop is that even if we end up in a false state, we need to notice if there's an OR operator. This complicates things a bit. We'll need to introduce a third state: skipping. Why? Because we now have three states:

true AND ... # we execute this, thus true state
false AND ... # we don't execute this, thus false state
false OR ... # we do execute this, thus true state
true OR .. # we don't execute this, but...
true OR ... OR ... # we don't want the third part to execute!

So, after the OR is seen in a true state, we need to enter a sticky false state; hence, skipping.

my $state = 1; # 1 = true, 0 = false, -1 = skipping
for (@steps) {
if (ref $_) { # is it a coderef?
  if ($state > 0) { # should we execute this?
    my $result = $_->(); # run the coderef
    unless ($result) { # did this fail?
      $state = 0; # result is false
    }
  }
} elsif ($_ eq "OR") {
  if ($state > 0) { # true becomes skipping
    $state = -1;
  } elsif ($state == 0) { # false becomes true
    $state = 1;
  }
}
}

So, an OR causes true to become skipping, false to become true, and skipping to stay skipping. If you work through this, you'll see that this three-state decision tree now properly handles the implied AND and the explicit OR operator.

But wait, there's more. Let's add a NOT prefix operator into the mix, represented as a "NOT" string for a step. And let's presume that we can stack them. I found the easiest way to handle that was to introduce a fourth state: true (but invert the next test), encoded as 2 in $state. Now we get something more complicated again:

my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not"
for (@steps) {
if (ref $_) { # is it a coderef?
  if ($state > 0) { # should we execute this?
    my $result = $_->(); # run the coderef
    $result = !$result if $state > 1; # flip result if needed
    $state = ($result ? 1 : 0);
  }
} elsif ($_ eq "OR") {
  die "NOT before OR!" if $state > 1;
  if ($state > 0) { # true becomes skipping
    $state = -1;
  } elsif ($state == 0) { # false becomes true
    $state = 1;
  }
} elsif ($_ eq "NOT") {
  # "true" and "not" swap states
  if ($state == 2) {
    $state = 1;
  } elsif ($state == 1) {
    $state = 2;
  }
}
}

Note that if a NOT was seen before a coderef, we essentially need to act on the reverse of the outcome of the coderef, so we negate the result logically. Also note that it'd be wrong to copy $result into $state there because we don't demand that the coderefs return a simple 1 or 0: They can return any true/false value, so we have to normalize the results with a small question-colon operator.

It's also tempting to flip between the 2 and 1 in the last elsif block by trying something clever like:

$state = 3 - $state;

until you realize that $state could also be 0 or -1, and we're trying to leave those alone. It's always better to do some explicit tests than to try to perform fancy math to get it to work right. The one notable exception I've seen to that is the AM/PM modulator to convert a 24-hour hour into a 12-hour hour:

my $hour_12 = ($hour_24 + 11) % 12 + 1;

although the magical "11" constant there deserves a well-placed comment in the source.

At this point, we have a full expression evaluator for AND, OR, and NOT operators with proper relative precedence. Let's add the COMMA operator now, from GNU find. This has the effect of restoring a true-state in the expression, ignoring any previous values (although their side-effect has already been acted upon). Seems simple enough:

my $state = 1; # 1 = true, 0 = false, -1 = skipping, 2 = "not"
for (@steps) {
if (ref $_) { # is it a coderef?
  if ($state > 0) { # should we execute this?
    my $result = $_->(); # run the coderef
    $result = !$result if $state > 1; # flip result if needed
    $state = ($result ? 1 : 0);
  }
} elsif ($_ eq "OR") {
  die "NOT before OR!" if $state > 1;
  if ($state > 0) { # true becomes skipping
    $state = -1;
  } elsif ($state == 0) { # false becomes true
    $state = 1;
  }
} elsif ($_ eq "NOT") {
  # "true" and "not" swap states
  if ($state == 2) {
    $state = 1;
  } elsif ($state == 1) {
    $state = 2;
  }
} elsif ($_ eq "COMMA") {
  die "NOT before COMMA!" if $state > 1;
  $state = 1; # restore true
}
}

Now we're happy. We're dealing with NOT, AND, OR, and COMMA. At least, we remain happy until we remember that it's time to look at how to handle parentheses!

Shaking loose some old cobwebs in my brain, I remember an expression evaluator I saw some 30 years ago in a textbook that used a stack to handle the nested subexpressions. So, I took the same tactic here. (No doubt Mark Jason-Dominus will write me and tell me that a stack wasn't needed here, provided I applied clever trick N, but as I am a bear-of-very-little-brain, I'll take the straightforward approach.)

So, we'll do that by changing $state into @state, and every place we had $state before, we'll simply use $state[-1] (note that the top of the stack is the highest element). Without changing the functionality (or adding the paren-handling), that gets us this code:

my @state = (1); # start as true
for (@steps) {
if (ref $_) { # is it a coderef?
  if ($state[-1] > 0) { # should we execute this?
    my $result = $_->(); # run the coderef
    $result = !$result if $state[-1] > 1; # flip result if needed
    $state[-1] = ($result ? 1 : 0);
  }
} elsif ($_ eq "OR") {
  die "NOT before OR!" if $state[-1] > 1;
  if ($state[-1] > 0) { # true becomes skipping
    $state[-1] = -1;
  } elsif ($state[-1] == 0) { # false becomes true
    $state[-1] = 1;
  }
} elsif ($_ eq "NOT") {
  # "true" and "not" swap states
  if ($state[-1] == 2) {
    $state[-1] = 1;
  } elsif ($state[-1] == 1) {
    $state[-1] = 2;
  }
} elsif ($_ eq "COMMA") {
  die "NOT before COMMA!" if $state[-1] > 1;
  $state[-1] = 1; # restore true
}
}

A little messier, but no change in functionality. Next, we have to figure out how the state changes as we enter and leave a parenthesized area. First, let's see what happens when we enter a parenthesized area:

not ( ... now true
true ( ... now true
false ( ... now skipping
skipping ( ... now skipping

So, on a left paren, if $state[-1] is greater than 0, we push a 1, otherwise we push a -1. That's easy enough. The harder part is what to do at the end of the subexpression. The false or skipping value in front of the subexpression won't change at the end, so we can write that off right away. But we have six other combinations to consider: all the combinations of not or true before the subexpression crossed with true, false, and skipping at the end of the subexpression. Working through the combinations, we find that the rules are:

not ( ... true ) ... now false
not ( ... false ) ... now true
not ( ... skipping ) ... now false
true ( ... true ) ... now true
true ( ... false ) ... now false
true ( ... skipping ) ... now true

So, it looks like we can take true and skipping xored whether there's a not-state in front of the subexpression. The one other thing is that comma no longer restores always to true: It restores to whatever was set up at the start of the subexpression. Adding that code back in to the engine gives us the final result:

my @state = (1);
for (@steps) {
if (ref $_) {
  if ($state[-1] > 0) {
    my $result = $_->();
    $result = !$result if $state[-1] > 1;
    $state[-1] = ($result ? 1 : 0);
  }
} elsif ($_ eq "OR") {
  die "NOT before OR!" if $state[-1] > 1;
  if ($state[-1] > 0) {
    $state[-1] = -1;
  } elsif ($state[-1] == 0) {
    $state[-1] = 1;
  }
} elsif ($_ eq "NOT") {
  if ($state[-1] == 2) {
    $state[-1] = 1;
  } elsif ($state[-1] == 1) {
    $state[-1] = 2;
  }
} elsif ($_ eq "LEFT") {
  push @state, ($state[-1] > 0 ? 1 : -1);
} elsif ($_ eq "RIGHT") {
  die "RIGHT without LEFT!" unless @state > 1;
  die "NOT before RIGHT!" if $state[-1] > 1;
  my $child = pop @state;
  $child = !$child if $state[-1] > 1; # reverse for NOT
  $state[-1] = ($child ? 1 : 0) if $state[-1] > 0; # inherit state
} elsif ($_ eq "COMMA") {
  die "NOT before COMMA!" if $state[-1] > 1;
  if (@state > 1) { # in subexpression, restore to initial value
    $state[-1] = ($state[-2] > 0 ? 1 : -1);
  } else {
    $state[-1] = 1; # restore true if at outer level
  }
}
}

Whew! I hope you enjoyed walking through this logic as much as I did in creating it. And if not, aren't you happy it's all nicely encapsulated in File::Finder so that you don't have to figure it out? Until next time, enjoy!

TPJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.