Dr. Dobb's Journal February 1998
What's New in Python 1.5?
Dr. Dobb's Journal February 1998
By Guido van Rossum
Guido, Python's creator, works at the Corporation for National Research Initiatives in Reston, Virginia. He can be contacted at [email protected].
Python 1.5 has some powerful improvements over previous versions of the language. I'll briefly describe some of the major modifications here. For more information, see the Python web site at http://www.python.org/.
Packages. Perhaps the most important change is the addition of packages. A Python "package" is a named collection of modules, grouped together in a directory. A similar feature was available in earlier releases through the ni module (named after the Knights Who Say "new import"), but was found to be too important to be optional. Starting with 1.5, it is a standard feature, reimplemented in C, although it is not exactly compatible with ni.
A package directory must contain a file __init__.py -- this prevents subdirectories that happen to be on the path or in the current directory from accidentally preempting modules with the same name. (The __init__.py file was optional with ni.) When the package is first imported, the __init__.py file is loaded in the package namespace. (This is the other main incompatibility.)
For example, the package named "test" (in the Python 1.5 library) contains the expanded regression test suite. The driver for the regression test is the submodule regrtest, and the tests are run by invoking the function main() in this submodule. There are several ways to invoke it:
import test.regrtest test.regrtest.main()
If you don't want to use fully qualified names for imported functions and modules, you can write:
from test import regrtest regrtest.main()
or even:
from test.regrtest import main main()
Assertions. There's now an assert statement to ease the coding of input requirements and algorithm invariants. For example,
assert x >= 0
will raise an AssertionError exception when x is negative. The argument can be any Boolean expression. An optional second argument can give a specific error message; for example:
assert L <= x <= R,\"x out of range"
Once a program is debugged, the assert statements can be disabled without editing the source code by invoking the Python interpreter with the -O command-line flag. This also removes code like this:
if __debug__: statements
This form can be used for coding more complicated requirements, such as a loop asserting that all items in a list have the same type.
Perl-style regular expressions. A new module, re, provides a new interface to regular expressions. The regular expression syntax supported by this module is identical to that of Perl 5.0 to the extent that this is feasible, with Python-specific extensions to support named subgroups. The interface has been redesigned to allow sharing of compiled regular expressions between multiple threads. A new form of string literals, dubbed "raw strings" and written as r"...", has been introduced, in which backslash interpretation by the Python parser is turned off. Example 5, for instance, searches for identifiers and integers in its argument string.
import re, sys text = sys.argv[1] prog = re.compile( r"\b([a-z_]\w*|\d+)\b", re.IGNORECASE) hit = prog.search(text) while hit: print hit.span(1), print hit.group(1) hit = prog.search(text, hit.end(0))
Example 5: Using Python 1.5 regular expressions.
Standard exception classes. All standard exceptions are now classes. There's a (shallow) hierarchy of exceptions, with Exception at the root of all exception classes, and its subclass StandardError as the base class of all standard exception classes. Since this is a potential compatibility problem (some code that expects exception objects to have string objects will inevitably break), it can be turned off by invoking the Python interpreter with the -X command-line flag. To minimize the incompatibilities, str() of a class object returns the full class name (prefixed with the module name) and list/tuple assignment now accepts any sequence with the proper length on the right side.
Performance. The 1.5 implementation has been benchmarked as being up to twice as fast as Python 1.4. The standard Python benchmark, pystone, is now included in the test package (import test.pystone; test.pystone.main()).
The biggest speed increase is obtained in the dictionary lookup code. It is aided by a better, more uniformly randomizing hash function for string objects, and automatic "string interning" for all identifiers used in a program (this turns string comparisons into more efficient pointer comparisons). Some new dictionary methods make faster code possible if you don't mind changing your program: d.clear(), d.copy(), d.update(), d.get().
Other speed increases include some inlining of common operations and improved flow control in the main loop of the virtual machine.
I/O speed has also been improved. On some platforms (notably Windows) the speed of file.read() (for large files) has improved dramatically by checking the file size and allocating a buffer of that size, instead of extending the buffer a few KB at a time.
Miscellaneous. The default module search path is chosen much more intelligently, so that a binary distribution for UNIX no longer requires a fixed installation directory. There are also provisions for site additions to the path without recompilation.
If you are embedding Python in an application of your own, you will appreciate the vastly simplified linking process -- everything is now in a single library. There's also much improved support for nonPython threads, multiple interpreters, and explicit finalization and reinitialization of the interpreter.
For those of us who like to read the source, the code now uses a uniform naming scheme (the "Great Renaming") wherein all names have a "Py" prefix. For example, the function known as getListitem() is now called PyList.GetItem().
DDJ
Copyright © 1998, Dr. Dobb's Journal