brian has been a Perl user since 1994. He is founder of the first Perl Users Group, NY.pm, and Perl Mongers, the Perl advocacy organization. He has been teaching Perl through Stonehenge Consulting for the past five years, and has been a featured speaker at The Perl Conference, Perl University, YAPC, COMDEX, and Builder.com. Contact brian at [email protected].
Back in the day when I was a kid and I had to walk to school uphill
(both ways!) in the snow to school, open()
only had two arguments, and it
was enough for us. Since then open()
is a lot more rich and
feature full. Indeed, it is so rich that it gets its own tutorial,
perlopentut, and it's perlfunc entry is almost 400 lines long (which is a
lot longer than this article).
Before starting, I have a bit of a challenge for you which lets me
skip talking about a part of open()
and leave that up to you. There is a
one argument form of open. Do you know what it is, and can you give an
example of how you would use it? I'll answer this next month. It does
have a useful application, but you have to keep in mind Perl's origins to
be in the right mindset.
When I started in Perl, open()
took a filehandle identifier and file.
Filehandle identifiers don't get a special sigil, so they appear as
barewords. Since they appear as barewords, Perl's convention was to
completely capitalize them. They certainly stood out in the program text.
open( FILE, "README" ) || die "Could not open README! $!";
We ran into problems when we wanted to pass around filehandle around. If
I want to open a filehandle in one place but use it in another, I had to
do a lot of ugly typing. For instance, to pass a filehandle to a
subroutine, I pass its typeglob, or even a reference to it. In the
subroutine, I saved that in a scalar and let print()
figure it out (even
if I or the maintenance programmers had to scratch our heads).
open( LOG, ">> logfile" ) || die "Could not write to logfile: $!"; here_i_am( *LOG ); log_this( \*LOG, "Hey, I'm done!" ); sub here_i_am { local $fh = shift; print $fh "Here I am!\n"; } sub log_this { local *FILE = shift; print FILE @_; }
In 5.6, Perl got smart enough to skip the typeglob step. I could create
a filehandle in a scalar variable directly as an "indirect filehandle".
This looks a lot better. I don't have to explain typeglobs or references
to typeglobs, or why some people use one or the other. This works when
the variable, in this case $fh
, is undefined.
open( $fh, ">> logfile" ) || die "Could not write to logfile: $!"; print "$fh\n"; # something like GLOB(...)
That variable has to be uninitialized. The following example doesn't do
what I want because $fh
doesn't end up with a filehandle reference since it
is already defined.
my $fh = "I'm not a filehandle!"; open( $fh, ">> logfile" ) || die "Could not write to logfile: $!"; print "$fh\n"; # prints "I'm not a filehandle!"
Luckily, Perl convention gets around this by declaring the variable
directly in the open()
. All of your other variables are lexicals, right?
So why not this one too?
open( my $fh, ">> logfile" ) || die "Could not write to logfile: $!";
That's much nicer. I can now simply pass around scalar variables, and most people already know how to do that. This is one of the telling marks of the intermediate Perl programmer. We don't talk about this in Learning Perl because we want to get people to open files the fastest (in student time) way possible, then introduce better programming idioms once they understand the quick-and-dirty way.
That didn't fix all of the problems though. There was this thing known as "magic open" that was obscure enough to be a Final Jeopardy question. Perl does some guessing on what we want it to do. For instance, in my previous example, Perl looks at the second argument and pulls it apart to figure out what to do. It sees the >> and guesses that we mean to open something in append mode, and that we don't mean to read from the file named ">> foo" (and yes, I can create a file with that name). After the >> it keeps guessing, and it figures that the leading whitespace is not part of the filename (and, yes, I can create a filename with leading whitespace). Moving on, it gets the name of the file, "logfile":
open( my $fh, ">> logfile" ) || die "Could not write to logfile: $!";
This magic also discards trailing whitespace too, so all of these open the same file. Perl does what is common, and file names " logfile", "logfile ", and " logfile " aren't common, or at least they should be. If you really want to annoy someone, put files with a trailing space in their directory and watch how long it takes them to delete it (especially if you make it so they can't use a glob. Don't tell anyone I told you about this.):
open( my $fh, ">>logfile" ); open( my $fh, ">> logfile" ); open( my $fh, ">>logfile " ); open( my $fh, ">> logfile " );
Sometimes, however, we don't want this magic open. We can use Perl's three
argument form. Instead of lumping a bunch of things together in the second
argument, I break apart the open()
mode and the filename. When I do that,
the filename is exactly what I specify, whitespace and all:
open( my $fh, ">>", "logfile " );
Okay, three arguments should be enough for anyone, right? Not really. Magic
open()
often causes problems because we're opening things in a pipe. Just
like system()
and exec()
have list forms where they don't let the shell
handle special characters as special characters. The example in the perlfunc
has five arguments:
open(FOO, '-|', "cat", '-n', $file);
Now that you know that open()
is a lot more special than the two argument
form that you may be used to, get ready to take it to a whole other level
with new "plumbing" (in the words of perlopentut) for the IO framework.
With PerlIO, I can think about IO in layers. The first layer is the
sequence of bytes in the file, the second layer is the stuff that perl
reads, and another layer is the stuff that ends up in my program. With
PerlIO, I can do different things to the layers. For instance, I can read
a gzipped file but end up with the uncompressed output when I read from
it. No fuss no muss! Well, I do have to install the PerlIO::gzip
module,
but that's not a big deal.
For instance, CPAN.pm
uses a couple of files (02packages.details.txt.gz,
03modlist.data.gz) to figure out where distributions are and how to
install them. CPAN distributes these are gzipped files to cut down on
space since an installer utility needs to download these files to start
its work.
Without PerlIO, I'd have to un-zip them myself, or provide some way to
read them as a stream and uncompress them on the fly. That's a huge pain.
Without PerlIO, this doesn't work like I want it to. It thinks it's
opening a text file and reading until the first newline (or whatever is
in $/
). It reads quite a bit of data and prints a bunch of gook to my
screen:
open( my $fh, "/MINICPAN/modules/03modlist.data.gz" ); print scalar <$fh>;
With PerlIO, I just need to stick in another layer. The PerlIO::gzip
module
can unzip the data on the fly and give it back to me as the uncompressed
text with almost no work on my part. In the second argument, when Perl
sees the ":gzip
", it automatically looks for and loads PerlIO::gzip
:
open( my $fh, "<:gzip", "/MINICPAN/modules/03modlist.data.gz" ); print scalar <$fh>;
Instead of gobbledygook, I get the first line of uncompressed text:
File: 03modlist.data
There are many other pre-existing filters for PerlIO. Don't like all
those DOS CRLF pairs? No problem. You don't have to run dos2unix on the
file. Just use the built-in PerlIO::crlf
filter. The PerlIO will
automatically convert line endings:
open( my $fh, "<:crlf", "dosfile1.txt" );
Do you want to get the raw data, rather than some unicode-aware layer
that knows about wide characters? Use PerlIO::byte
:
open( my $fh, "<:bytes", "dosfile1.txt" );
Want to turn on binmode directly in the open()
? Use PerlIO::raw
:
open( my $fh, "<:raw", "brian.jpg" );
Although I've shown examples for reading data, these work the other way
too. Besides the layers that you see in the PerlIO man page, there are
many extra ones on CPAN in either the PerlIO or the PerlIO::via
namespaces. If you don't find what you need, you can even crib off one
that is already there to create your own.
So open()
has come a long way since I started using Perl, and the kids
today have it much easier--not only do you get more arguments, but the
arguments can do more. You don't even have to know you are doing complex
IO transformations when PerlIO does them for you. Maybe in 10 years
you'll look back on these fancy features and complain about how hard it
was it your day and how easy kids have it.
TPJ