Learning Python 3rd Edition: Python 2.6 and 3.X Notes (original) (raw)
Latest update on this page: July 19, 2009
Please see the main updates page for more details about this page's content, as well as links to the book corrections list, and general book notes.
This page collects notes about recent changes in Python, designed to augment or clarify the published material. These are not errata, and require no patches; they are just supplements for readers of this book. Some of this page's material may also ultimately appear in future editions of this book, so this page is also provided as something of a very early draft preview of upcoming additions; it will change arbitrarily (and radically!) before publication.
Contents
This list is roughly ordered by date of addition, from older to newer, not by page number. Items here:
- Set and dictionary comprehensions in 3.0 (page xxxvii)
- Reserved word changes in 3.0 and 2.6 (pages xxxviii, 226)
- 3.0 print function emulation (pages xxxviii, 234)
- Binary digit strings, int(B, 2), and binary in 2.6 and 3.0 (page 139)
- Class decorators in 2.6 and 3.0 (page 556)
- "Private" and "Public" attributes with class decorators (page 499)
- Validating function arguments with decorators [off-page]
- String format method in 2.6 and 3.0 (page 140)
- Fraction number type in 2.6 and 3.0 (page 107)
- String types model in 3.0 (chapters 4 and 7) [off-page]
- New iterators in 3.0: range, dictionary views, map and zip (pages 265, 81, 160)
- The "nonlocal" statement in 3.0 (pages 318-326)
- Division operator change in 3.0 (pages 102-103)
- Function annotations and keyword-only arguments in 3.0 [to be written]
- Metaclasses in 2.6 and 3.0 [to be written]
- More on relative import syntax for packages in 3.0 [to be written]
- Python 2.6 and 3.0, and this book
- Python 3.0 concerns
- Python 3.X performance issues
Set and dictionary comprehensions in 3.0 (page xxxvii)
This page in the book describes the upcoming set literal and set comprehension syntax to be added in Python 3.0: the new literal syntax {1, 3, 2} is equivalent to the current set([1, 3, 2]), and the new set comprehension syntax {f(x) for x in S if P(x)} is like the current generator expression set(f(x) for x in S if P(x)).
In addition, although not mentioned in the text, Python 3.0 will now also have a dictionary comprehension syntax: {key:val for (key, val) in zip(keys, vals)}, works like the current dict(zip(keys, vals)), and {x:x for x in items} is like the current dict((x, x) for x in items). Here's a summary of all the comprehension alternatives in 3.0; the last 2 are new, and are not available in 2.6:
[x*x for x in range(10)] # list comprehension: builds list ([x, y] is a list)
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
(x*x for x in range(10)) # generator expression: produces items (parens often optional)
<generator object at 0x009E7328>
{x*x for x in range(10)} # set comprehension, new in 3.0 ({x, y} is a set)
{0, 1, 4, 81, 64, 9, 16, 49, 25, 36}
{x:x*x for x in range(10)} # dictionary comprehension, new in 3.0 ({x:x, y:y} is a dict)
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
Reserved word changes in 3.0 and 2.6 (pages xxxviii, 226)
The 5th bullet up from the end of page xxxviii should probably mention that "print" will no longer be a reserved word in 3.0, because the new print() built-in function is replacing the current statement. See the next notes for more on the new print() function.
The full list of 3.0 reserved word changes:
- "print" is no longer reserved, because it is morphing from statement to built-in function.
- "yield" is still reserved; in 2.5 it changed from a statement to an expression, not a built-in function.
- "exec" is no longer a reserved word; it's also moving from statement to function (as it was years ago).
- "nonlocal", "with", and "as", all become new reserved words, to support the new enclosing scope assignment and context manager statements.
- "True", "False", and "None' become full-blown reserved words (alas, you won't be able to assign True = False anymore...).
Note: although the list above describes 3.0, in Python 2.6 "with" and "as" have already become reserved words, because the context manager statement has been officially enabled. This is suggested in the table on page 226. See pages 596-600 for a full discussion of this feature.
3.0 print function emulation (pages xxxviii, 234)
On pages xxxviii and 234, I mention that the print statement is to become a function call in 3.0, to support more features. To help describe how it will work, the following is a function that emulates much of the working of the 3.0 print function. Also note that "print" might no longer be a reserved word in 3.0; it is today, though, which is why we can't call this function "print":
""" emulate most of the 3.0 print function for use in 2.X call signature: print(*args, sep=' ', end='\n', file=None) """ import sys
def print30(*args, **kargs): sep = kargs.get('sep', ' ') # keyword arg defaults end = kargs.get('end', '\n') file = kargs.get('file', sys.stdout) output = '' first = True for arg in args: output += ('' if first else sep) + str(arg) first = False file.write(output + end)
if name == 'main': print30(1, 2, 3) # "1 2 3\n" print30(1, 2, 3, sep='') # "123\n" print30(4, 5, 6, sep='', end='') # "456" print30() # "\n" print30(1, 2, 3, sep='??', file=sys.stderr) # "1??2??3\n"
Binary digit strings, int(B, 2), and binary in 2.6 and 3.0 (page 139)
On page 139, there is a loop that uses the ord(char) function to convert a binary digit string to integer. It works as shown and serves to demo ord, but note that the same effect can be had in Python 2.5 today by calling the built-in function int(binstr, 2), giving an explicit base of 2. See page 105 for other examples of using a base with this built-in function.
That is the only built-in support for binary representation in Python 2.5, though. Perhaps a more interesting exercise is to also convert the other way, from integer to binary digit string. As mentioned at the top of page xxxviii, Python 2.6 and 3.0 will support this with:
- A new built-in function bin(int) to return the bit string for a number
- A new 0b1010 binary literal syntax for integers (octal literals also become 0oNNN, not 0NNN except in 2.6, and hex remains 0xNNNN)
- A new "b" type code for bit strings in the new string format method (though not in the "%" formatting expression): '{0:b}'.format(99)
- No binary escape code for characters in string literals, though hex and octal remain ('\NNN' for octal, and '\xNN' for hex)
Although not available as a built-in in 2.5, to-binary conversion can be emulated in code today, by using the bitwise operations described on page 103 to extract bits. The following module, binary.py, shows one way to code this (albeit with possible platform size and endianness dependencies that could be tailored by checking sys.byteorder and sys.maxint):
def fromBinary(B): """ convert binary string B to its decimal integer value; this can also be done today by the built-in: int(B, 2); caveat: this doesn't do any error checking in the string """ I = 0 while B: I = I * 2 + (ord(B[0]) - ord('0')) # or: I << 2 B = B[1:] return I
def toBinary(I): """ convert 32-bit integer I to a binary digits string; there is no built-in for this today, but 2.6 and 3.0 will have a new bin(I) call; both also support binary literals: 0b1010 will equal 10, and bin(10) will return "0b1010"; caveat: this depends on integer size and bit endian-ness """ B = '' while I: low = I & 0x00000001 # extract low bit I = (I >> 1) & 0x7FFFFFFF # shift off low bit, 0 high B = chr(ord('0') + low) + B # or: '1' if low else '0' return B
if name == 'main': # self-test code for binstr in ('1101', '01111111', '10000000', '10000001', '0', '111'): print fromBinary(binstr) for intobj in (13, 127, 128, 129, 1, -1, -13): print toBinary(intobj)
Class decorators in 2.6 and 3.0 (page 556)
The book discusses function decorators, available in Python 2.5, a way to add automatically invoked logic to a function or method call. Python 2.6 and 3.0 extend this concept to add class decorators, a way to augment or manage instances when they are created. In short, class decorators are very similar to function decorators, but add a layer of extra logic to instance creation calls, rather than to particular functions.
Note: the following emphasizes decorators' role as wrapper-insertion hooks, to intercept later function and class-creation calls. Decorators can also be used to manage functions and classes directly, though--by returning the decorated object itself, decorators can be used to do things like augment classes with new methods, register functions and classes to APIs, and so on. As such, class decorators intersect strongly with metaclasses in terms of roles (indeed, both are run at the end of the class creation process). See the web, or the next edition of this book for more details.
Class decorators: the basics
Class decorators' semantics and syntax are very similar to function decorators. To help you understand their meaning, let's begin by reviewing the details behind function decorators more formally, and see how they apply to class decorators.
Function decorators wrap function calls
Recall that function decorators are largely just syntactic sugar, which runs one function through another. In terms of code, function decorators automatically map the following syntax:
@decorator def F(arg): pass
... F(99) # call function
into this equivalent form, where "decorator" is a 1-argument callable object that returns another callable:
def F(arg): pass
F = decorator(F) # rebind function name to decorator result
... F(99) # really calls decorator(F)(99)
When the function is later called, it's actually calling the object returned by the decorator, which implements the required wrapping logic. This works on any def statement, whether a it's simple function, or a method within a class. The decorator returns the object to be called later when the decorated function is invoked:
def decorator(F): # save or use function F # return a callable: nested def, call, etc.
The decorator itself receives the decorated function, and the callable returned by the decorator receives whatever arguments are passed to the decorated function's name. In skeleton terms, here's one common coding pattern which captures this idea:
def decorator(F): def wrapper(*args): # decorator returnd wrapper # use F and args # wrapper remembers F in enclosing scope return wrapper # wrapper gets args passed to func
@decorator # like func = decorator(func) def func(x, y): # func is passed to decorator's F ...
func(6, 7) # 6, 7 are passed to wrapper's *args
Class decorators wrap instance creation calls
Class decorators are strongly related. Rather than wrapping individual functions or methods, though, class decorators are a way to wrap up instance construction calls, with extra logic that manages or augments instances created. In the following, assuming that "decorator" is a 1-argument function that returns a callable, the Python 2.6 and 3.0 class decorator syntax:
@decorator class C: pass
... x = C(99) # make an instance
is now equivalent to the following -- the class is automatically passed to the decorator function, and the decorator's result is assigned back to the class name:
class C: pass
C = decorator(C) # rebind class name to decorator result
... x = C(99) # really calls decorator(C)(99)
The net effect is that calling the class name later to create an instance winds up triggering the callable returned by the decorator function, instead of calling the original class itself. The decorator's result implements the required instance wrapping logic:
def decorator(C): # save or use class C # return a callable: nested def, call, etc.
Notice that in the decorator is a callable that must return another callable to be invoked later, when the class is later called to create an instance. Because of that, decorators are commonly either factory functions that create and return other functions, or classes that use "__call__" methods to intercept call operations. Factory functions typically retain state in enclosing scope references, and classes in attributes.
Nesting and arguments
Just as for functions, multiple class decorators result in multiple nested function calls, and hence multiple levels of wrapper logic around instance creation calls; the following are equivalent too:
@foo @bar class C: pass
same as...
class C: pass
C = foo(bar(C))
Both function and class decorators can also seem to take arguments -- really, the arguments are passed to a function that in effect returns the decorator, which in turn returns a callable. The following, for instance:
@decorator(A, B) def F(arg): pass
... F(99)
is mapped into this equivalent form, where decorator is a callable that returns the actual decorator; the returned decorator returns the callable run later for calls to the orginal function name:
def F(arg): pass
F = decorator(A, B)(F) # rebind F to result of decorator's return value
... F(99) # really calls decorator(A, B)(F)(99)
The decorator function in this example might take a form like the following:
def decorator(A, B): # save or use A, B def actualDecorator(F): # save or use function F # return a callable: nested def, call, etc. return actualDecorator
The outer function in this structure generally saves the decorator arguments away as state information, for use in either the actual decorator, the callable it returns, or both. We'll see examples of decorator arguments emploted later in this section.
Function decorator example: tracing calls
Let's turn to some examples to demonstrate the abstract ideas of the prior section in action. This book discusses function decorators on pages 556-558, as a way to wrap up a specific function or method calls with extra logic that generically augments the call in some fashion. For example, the decorator may add logic that adds call tracing; performs argument validity testing during debugging; automatically acquires and releases thread locks; times calls made to function for optimization; and so on. To get started, here's a function decorator example taken from the book:
class tracer: def init(self, func): self.calls = 0 self.func = func def call(self, *args): self.calls += 1 print 'call %s to %s' % (self.calls, self.func.name) self.func(*args)
@tracer def spam(a, b, c): # Wrap spam in a decorator object print a, b, c # same as: spam = tracer(spam)
spam(1, 2, 3) # Really calls the tracer wrapper object call 1 to spam 1 2 3
spam('a', 'b', 'c') # Invokes call in class call 2 to spam a b c
In this example, the tracer class saves away the decorated function, and intercepts later calls to it, in order to add a layer of trace logic that counts and prints each call.
For function calls, this "@" syntax can be more convenient than modifying each call to account for the extra logic level, and avoids accidentally calling the original function directly. A non-decorator equivalent, such as the following, can be used on any function and without the special "@" syntax, but requires extra syntax when the function is called, may not be as obvious in intent, and does not ensure that the extra layer will be invoked for normal calls.
calls = 0 def tracer(func, *args): global calls calls += 1 print 'call %s to %s' % (calls, func.name) func(*args)
def spam(a, b, c): print a, b, c
spam(1, 2, 3) # normal non-traced call: accidental? 1 2 3
tracer(spam, 1, 2, 3) # special traced call without decorators call 1 to spam 1 2 3
State information retention options
Interestingly, function decorators can use a variety of ways to retain state information provided at decoration time, for use during the actual function call. For example, here is an augmented version of the book's example above, which adds support for keyword arguments, returns the wrapped function's result, and uses print function syntax to work under both Python 2.6 and 3.0:
class tracer: def init(self, func): # on @ decorator self.calls = 0 # save func for later call self.func = func def call(self, *args, **kwargs): # on call to original function self.calls += 1 print('call %s to %s' % (self.calls, self.func.name)) return self.func(*args, **kwargs)
@tracer def spam(a, b, c): # same as: spam = tracer(spam) print(a + b + c) # triggers tracer.init
@tracer def eggs(x, y): # same as: eggs = tracer(eggs) print(x ** y) # wraps eggs in a tracer object
spam(1, 2, 3) # really calls tracer instance: runs tracer.call
spam(a=4, b=5, c=6) # spam is an instance attribute
eggs(2, 16) # really calls tracer instance, self.func is eggs eggs(4, y=4) # self.calls is per-function here (need 3.0 nonlocal)
Like the original, this uses class instance attributes to save state explitly. The wrapped function, as well as the calls counter, are per-instance information -- each decoration gets its own copy. When run as a script under either 2.6 or 3.0, the output of this version is as follows; notice how the spam and egss functions each have their own calls counter, becuase each decoration creates a new class instance:
call 1 to spam 6 call 2 to spam 15 call 1 to eggs 65536 call 2 to eggs 256
Enclosing def scope references and nested defs can often achieve the same effect. In this example, though, we need a counter that changes on each call, and that's not possible in Python 2.6. In 2.6, we can either use classes and attributes as above, or move the state variable out to the global scope, with global declarations:
calls = 0 def tracer(func): # state via nested scope and global def wrapper(*args, **kwargs): # instead of class attributes global calls # calls is global, not per-function calls += 1 print('call %s to %s' % (calls, func.name)) return func(*args, **kwargs) return wrapper
@tracer def spam(a, b, c): # same as: spam = tracer(spam) print(a + b + c)
@tracer def eggs(x, y): # same as: eggs = tracer(eggs) print(x ** y)
spam(1, 2, 3) # really calls wrapper, bound to func
spam(a=4, b=5, c=6) # wrapper calls spam
eggs(2, 16) # really calls wrapper, bound to eggs eggs(4, y=4) # global calls is not per-function here!
Unfortunately, moving the counter out to the common global scope to allow it to be changed also means that it will be shared by every wrapped function. Unlike class instance attributes, global counters are cross-program, not per-function -- the counter is incremented for any traced function call. You can tell the difference if you compare this version's output with the prior: the single, shared global call counter is incorrectly updated by calls to every decorated function:
call 1 to spam 6 call 2 to spam 15 call 3 to eggs 65536 call 4 to eggs 256
Shared global state may be what we want in some cases. If we really want a per-function counter, though, we can either uses classes, or make use of the new nonlocal statement in Python 3.0, described elsewhere on this page, which allows enclosing function scope varaibles to be changed:
def tracer(func): # state via nested scope and nonlocal calls = 0 # instead of class attrs or global def wrapper(*args, **kwargs): # calls is per-function, not global nonlocal calls calls += 1 print('call %s to %s' % (calls, func.name)) return func(*args, **kwargs) return wrapper
@tracer def spam(a, b, c): # same as: spam = tracer(spam) print(a + b + c)
@tracer def eggs(x, y): # same as: eggs = tracer(eggs) print(x ** y)
spam(1, 2, 3) # really calls wrapper, bound to func
spam(a=4, b=5, c=6) # wrapper calls spam
eggs(2, 16) # really calls wrapper, bound to eggs eggs(4, y=4) # nonlocal calls is not per-function here
Now, because enclosing scope variables are not cross-program globals, each wrapped function gets its own counter again, just as for classes and attributes. Here's the output when run under 3.0:
call 1 to spam 6 call 2 to spam 15 call 1 to eggs 65536 call 2 to eggs 256
See also the following link to file decorator0.py for more on running this example under either 2.6 or 3.0; "nonlocal" is a syntax error in 2.6, so we need to exec the code as a string for 2.6, not as normal code:
- <decorator0.py> -- off page, function decorators
Function decorator example: timing calls
To sample the full flavor of what function decorators are capable of, here is another example that times calls made to a decorated function, both for one call, and the total time among all calls. The decorator is applied to two functions, in order to compare the time requirements of list comprehensions and the map built-in call (see pages 366-369 in the book for another non-decorator example that times iteration alternatives like these):
import time
class timer: def init(self, func): self.func = func self.alltime = 0 def call(self, *args, **kargs): start = time.clock() result = self.func(*args, **kargs) elapsed = time.clock() - start self.alltime += elapsed print('%s: %.5f, %.5f' % (self.func.name, elapsed, self.alltime)) return result
@timer def listcomp(N): return [x * 2 for x in range(N)]
@timer def mapcall(N): return map((lambda x: x * 2), range(N))
result = listcomp(5) # time for this call, all calls, return value
listcomp(50000)
listcomp(500000)
listcomp(1000000)
print(result)
print('allTime = %s' % listcomp.alltime) # total time for all comp calls
print('') result = mapcall(5) mapcall(50000) mapcall(500000) mapcall(1000000) print(result) print('allTime = %s' % mapcall.alltime) # total time for all map calls
print('map/comp = %s' % round(mapcall.alltime / listcomp.alltime, 3))
In this case, a non-decorator approach would allow the subject functions to be used with or without timing, but it would also complicate the call signature when timing is desired (we'd add code at the call instead of at the def), and there would be no direct way to guarantee that all builder calls in a program are routed through timer logic, short of finding and potentially changing them all.
When run in Python 2.6, the output of this file's self-test code is as follows:
listcomp: 0.00002, 0.00002 listcomp: 0.00910, 0.00912 listcomp: 0.09105, 0.10017 listcomp: 0.17605, 0.27622 [0, 2, 4, 6, 8] allTime = 0.276223304917
mapcall: 0.00003, 0.00003 mapcall: 0.01363, 0.01366 mapcall: 0.13579, 0.14945 mapcall: 0.27648, 0.42593 [0, 2, 4, 6, 8] allTime = 0.425933533452 map/comp = 1.542
Testing subtlety: as described elsewhere on this page, map() returns an iterator in Python 3.0, instead of an actual list as in 2.6, so it doesn't quite compare directly to a list comprehension's work. If you wish to run this under 3.0 too, use list(map()) to force it to build a list like the list comprehension does, or else you're not really comparing apples to apples. Don't do so in 2.6, though -- otherwise, the map test would be charged for building two lists, not one. Tge following sort of code would pick fairly for 2.6 and 3.0; not that altough this makes the comparison between list cmprehensions and map() more fair in either 2.6 or 3.0, because range() is also an iterator in 3.0, the results for 2.6 and 3.0 won't compare directly:
... import sys
@timer def listcomp(N): return [x * 2 for x in range(N)]
if sys.version_info[0] == 2: @timer def mapcall(N): return map((lambda x: x * 2), range(N)) else: @timer def mapcall(N): return list(map((lambda x: x * 2), range(N))) ...
Adding decorator arguments
The timer decorator of the prior section works, but it would be nice if it was more configuranble -- providing an outpout label, and turning trace messages on and off, for instance, might be useful in a general purpose tool like this. Decorator arguments come in handy here -- when coded properly, we can use them to specify configuration options that can vary for each decorated function:
def timer(label=''): def decorator(func): ... print label, ... # label retained in enclosing scope return decorator # returns that actual decorator
@timer('==>') # like listcomp = timer('==>')(listcomp) def listcomp(N): ... # really is rebound to decorator
We can put this structure to use in our timer, to allow a label and trace control flagto be passed in at decoration time:
def timer(label='', trace=True): class Timer: def init(self, func): self.func = func self.alltime = 0 def call(self, *args, **kargs): start = time.clock() result = self.func(*args, **kargs) elapsed = time.clock() - start self.alltime += elapsed if trace: format = '%s %s: %.5f, %.5f' values = (label, self.func.name, elapsed, self.alltime) print(format % values) return result return Timer
if name == 'main': # allow timer to be imported elsewhere too
@timer(trace=True, label='[CCC]==>')
def listcomp(N): # like listcomp = timer(...)(listcomp)
return [x * 2 for x in range(N)] # listcomp(...) triggers Timer.__call__
@timer(trace=True, label='[MMM]==>')
def mapcall(N):
return map((lambda x: x * 2), range(N))
for func in (listcomp, mapcall):
print('')
result = func(5) # time for this call, all calls, return value
func(50000)
func(500000)
func(1000000)
print(result)
print('allTime = %s' % func.alltime) # total time for all calls
print('map/comp = %s' % round(mapcall.alltime / listcomp.alltime, 3))
Noice that all we've really done here is embed the original timer class in an enclosing function, in order to creat a scope that retains the decorator arguments. The outer "timer" function is called before decoration occurs, and simply returns the Timer class to serve as the actual decorator; on decoration an instance of Timer is made which remembers the decorated function itself, but also has access to the decorator arguments in the enclosing function scope. When run, this file prints the following output:
[CCC]==> listcomp: 0.00003, 0.00003 [CCC]==> listcomp: 0.00640, 0.00643 [CCC]==> listcomp: 0.08687, 0.09330 [CCC]==> listcomp: 0.17911, 0.27241 [0, 2, 4, 6, 8] allTime = 0.272407666337
[MMM]==> mapcall: 0.00004, 0.00004 [MMM]==> mapcall: 0.01340, 0.01343 [MMM]==> mapcall: 0.13907, 0.15250 [MMM]==> mapcall: 0.27907, 0.43157 [0, 2, 4, 6, 8] allTime = 0.431572169089 map/comp = 1.584
As usual, we can also test this interactively to see how the configuration arguments come into play:
from decorator1 import timer @timer(trace=False) # no tracing, collect total time ... def listcomp(N): ... return [x * 2 for x in range(N)] ... x = listcomp(5000) x = listcomp(5000) x = listcomp(5000) listcomp <decorator1.Timer instance at 0x025C77B0> listcomp.alltime 0.0051938863738243413
@timer(trace=True, label='\t=>') # turn tracing on ... def listcomp(N): ... return [x * 2 for x in range(N)] ... x = listcomp(5000) => listcomp: 0.00155, 0.00155 x = listcomp(5000) => listcomp: 0.00156, 0.00311 x = listcomp(5000) => listcomp: 0.00174, 0.00486 listcomp.alltime 0.0048562736325408196
We'll see another example of decorator arguments in the privacy class decorators of the next note section. See also the following file, which collects the timing function decorator examples in this section for you to experiment with on your own:
- <decorator1.py> -- off page, function decorators
Class decorator example: managing singletons
Python 2.6 and 3.0 extend decorators to work on classes too. As described earlier, the concept is similar to function decorators, but class decorators augment instance creation calls with extra logic, instead of a particular function or method. Also like function decorators, class decorators are really just optional syntactic sugar, though some view them as a way to make a programmer's intent more obvious and minimize erroneous calls.
Here's a larger example, run under 2.6, to demonstrate -- the classic singleton coding pattern, where at most one instance of a class ever exists; "singleton" defines and returns a function for managing instances, and the "@" syntax automatically wraps up the class in this function:
instances = {} def getInstance(aClass, *args): # manage global table if aClass not in instances: # add **kargs for keywords instances[aClass] = aClass(*args) # one dict entry per class return instances[aClass]
def singleton(aClass): # on @ decoration def onCall(*args): # on instance creation return getInstance(aClass, *args) return onCall
@singleton # Person = singleton(Person) class Person: # rebinds Person to onCall def init(self, name, hours, rate): # onCall remembers Person self.name = name self.hours = hours self.rate = rate def pay(self): return self.hours * self.rate
@singleton # Spam = singleton(Spam) class Spam: # rebinds Spam to onCall def init(self, val): # onCall remembers Spam self.attr = val
bob = Person('Bob', 40, 10) # really calls onCall print(bob.name, bob.pay())
sue = Person('Sue', 50, 20) # same, single object print(sue.name, sue.pay())
X = Spam(42) # one Person, one Spam Y = Spam(99) print(X.attr, Y.attr)
Now, when the Person or Spam classes are later used, the wrapping logic layer provided by the decorator routes instance construction calls to "onCall", which calls "getInstance" to manage and share a single instance, regardless of how many construction calls are made:
Bob 400 Bob 400 42 42
Interestingly, you can code a more self-contained solution here, if youÂre able to use the nonlocal statement available in Python 3.0 or later, to change enclosing scope names -- the following alternative achieves an identical effect, by using one enclosing scope per class, instead of one global table entry per class:
def singleton(aClass): # on @ decoration instance = None def onCall(*args): # on instance creation nonlocal instance # 3.0 and later nonlocal if instance == None: instance = aClass(*args) # one scope per class return instance return onCall
In either Python 2,6 or 3.0, you can also code a self-contained solution with a class instead -- the following uses one instance per class, rather than an enclosing scope or global table, and works the same as the other two versions:
class singleton: def init(self, aClass): # on @ decoration self.aClass = aClass self.instance = None def call(self, *args): # on instance creation if self.instance == None: self.instance = self.aClass(*args) # one instance per class return self.instance
Class decorator example: wrapping up entire interfaces
Let's look at larger use-case example. On pages 527-528, the __getattr__ method is shown as a way to wrap up entire object interfaces of embedded instances. Here's the book's original example for reference, working on a built-in list object:
class wrapper: def init(self, object): self.wrapped = object # Save object def getattr(self, attrname): print 'Trace:', attrname # Trace fetch return getattr(self.wrapped, attrname) # Delegate fetch
x = wrapper([1,2,3]) # Wrap a list x.append(4) # Delegate to list method Trace: append x.wrapped # Print my member [1, 2, 3, 4]
In this code, the wrapper class intercepts access to any of the wrapped object's attributes, prints a message, and uses getattr to pass off the access to the wrapped object. This differs from function decorators, which wrap up just one specific method. In some sense, class decorators provide an alternative way to code the __getattr__ technique to wrap an entire interface. In 2.6, for example, the class example above can be coded as a class decorator that triggers wrapped instance creation, instead of passing an instance into the wrapper's constructor:
def Tracer(aClass): # on @decorator class Wrapper: def init(self, *args, **kargs): # on instance creation self.wrapped = aClass(*args, **kargs) # use enclosing scope name def getattr(self, attrname): print 'Trace:', attrname # catches all but .wrapped return getattr(self.wrapped, attrname) # delegate to wrapped obj return Wrapper
if name == 'main':
@Tracer
class Spam: # like: Spam = Tracer(Spam)
def display(self): # Spam is rebound to Wrapper
print 'Spam!' * 8
@Tracer
class Person: # Person = Tracer(Person)
def __init__(self, name, hours, rate): # Wrapper bound to Person
self.name = name
self.hours = hours
self.rate = rate # in-method access not traced
def pay(self):
return self.hours * self.rate
food = Spam() # triggers Wrapper()
food.display() # triggers __getitem__
bob = Person('Bob', 40, 50) # bob is really a Wrapper
print bob.name # Wrapper embeds a Person
print bob.pay()
print
sue = Person('Sue', 60, 100)
print sue.name
print sue.pay()
print bob.name
print bob.pay()
Here is the output produced on Python 2.6 (it uses 2,6 print statements): attribute fetches on instances of both the Spam and Person classes invoke the __getattr__ logic in the Wrapper class, because "food" and "bob" are really instances of Wrapper, thanks to the decorator's redirection of instance creation calls:
Trace: display Spam!Spam!Spam!Spam!Spam!Spam!Spam!Spam! Trace: name Bob Trace: pay 2000
Trace: name Sue Trace: pay 6000 Trace: name Bob Trace: pay 2000
Notice that the preceding applies decoration to a user-defined class. Just like the book's original example, we can also use the decorator to wrap up a built-in type such as a list, as long as we subclass so as to allow decoration of instance creation. In the following, "x" is really a Wrapper again due to the indirection of decoration; notice how directly printing x invokes Wrapper's __getattr__, which in turn dispatches to the __repr__ of the built-in list superclass of the embedded instance:
@Tracer class MyList(list): pass # triggers Tracer()
x = MyList([1, 2, 3]) # triggers Wrapper() x.append(4) # triggers getattr, append Trace: append x.wrapped [1, 2, 3, 4] x # triggers getattr, repr Trace: repr [1, 2, 3, 4]
When classes are not enough
Interestingly, the decorator function in this example can almost be coded as a class instead of a function, with the proper operator overloading protocol. The following alternative works similarly because its __init__ is triggered when the "@" decorator is applied to the class, and its __call__ is triggered when a subject class instance is created. Our objects are really instances of Tracer this time, and we essentially just trade an enclosing scope reference for an instance attribute here:
class Tracer: def init(self, aClass): # on @decorator self.aClass = aClass # use instance attribute def call(self, *args): # on instance creation self.wrapped = self.aClass(*args) # ONE (LAST) INSTANCE PER CLASS! return self def getattr(self, attrname): print 'Trace:', attrname return getattr(self.wrapped, attrname)
@Tracer # triggers init class Spam: # like: Spam = Tracer(Spam) def display(self): print 'Spam!' * 8
... food = Spam() # triggers call food.display() # triggers getattr
This class-only alternative handles multiple classes as before, but it won't quite work for multiple instances of a given class: each instance construction call triggers __call__, which overwrites the prior instance. The net effect is that Tracer saves just one instance, the last one created. Experiment with this yourself to see how, but here's an example of the problem:
@Tracer class Person: # Person = Tracer(Person) def init(self, name): # Wrapper bound to Person self.name = name
bob = Person('Bob') # bob is really a Wrapper print bob.name # Wrapper embeds a Person Sue = Person('Sue') print sue.name # sue overwrites bob print bob.name # now bob's name is 'Sue'!
This codes output follows -- because this tracer only has a single shared instance, the second overwrites the first:
Trace: name Bob Trace: name Sue Trace: name Sue
The earlier function-based Tracer version does work for multiple instances, because each instance construction call makes a new Wrapper instance, instead of overwriting state of a single shared Tracer instance; the original version in the book handles multiple instances correctly for the same reason. Decorators are not only magical, they can also be incredibly subtle!
See also the following file, which collects some of this section's examples:
- <decorator2.py> -- off page, class decorators
Decorators versus manager functions
Of course, the Tracer class decorator example ultimately still relies on __getattr__ to intercept fetches on a wrapped and embedded instance object. In fact, all we've really accomplished in the Tracer class decorator above is to move the instance creation call inside a class, instead of passing in the instance to a manager function. With the book's non-decorator version of this example, we would simply code instance creation differently:
class Spam: # non-decorator version ... # any class will do food = wrapper(Spam()) # special creation syntax
@Tracer class Spam: # decorator version ... # requires @ syntax at class food = Spam() # normal creation syntax
Essentially, decorators simply shift special syntax requirements from the instance creation call, to the class statement itself. This is also true for the singleton example above -- rather than decorating, we could simply pass the class and its construction arguments into a manager function:
instances = {} def getInstance(aClass, *args): if aClass not in instances: instances[aClass] = aClass(*args) return instances[aClass]
bob = getInstance(Person, 'Bob', 40, 10) # versus: bob = Person('Bob', 40, 10)
Alternatively, we could use Python's introspection facilities to fetch the class from an already-created instance:
instances = {} def getInstance(object): aClass = object.class if aClass not in instances: instances[aClass] = object return instances[aClass]
bob = getInstance(Person('Bob', 40, 10)) # versus: bob = Person('Bob', 40, 10)
Why use decorators?
So why did I jst show you ways to not use iterators? Decorator syntax and semantics strike some as unusual and implicit at first glance, but it does have some advantages that are worth reviewing. All too often new technology is accepted without first asking the "why" questions; because their tradeoffs are not clear cut, let's wrap up with a discussion of some of the whys of decorators.
First of all, you should know that decorators suffer from two main drawbacks:
- A decorated function or class does not retain its original type -- its name is rebound to a wrapper object, which might matter in programs that use object names or test object types.
- The wrapping layer added by decoration incurs the extra perfomance cost of an additional call, each time the decorated object is invoked -- calls are a relatively time-expensive operation, so decoration makes a program slower. The name rebinding isssue is unlikey to matter for most programs. Moreover, the speed hit is is probably insignificant for most programs; can often be negated by simply removing the decorator when optimal perfomance is required; and is also incurred by non-decorator solutions that add wrapping logic.
Conversely, decorator's have two main advantages:
- They make it less likely that a programmer will forget to use required wrapping logic.
- They make it easier to add or remove decoration logic in the future. Neither of these benefits completely require decorator syntax to be achieved, though, and decorators are ultimately a stylistic choice, but one which many programmers will find preferrable. Their two puported benefits merit a closer inspection here.
Enforcing decoration consistency
Although the class decorators shown in this section may seem much more implicit than either of the preceding non-decorator alternatives, the decorator-based codings are also at least arguably less intrusive on the code that creates objects to be wrapped The special "@" syntax makes the programmer's intent more clear to readers of the class, and retains the normal instance-creation call coding style. Both functions and classes benefit from the fact that decorators appear just once, at the definition, instead of requiring special code at every call.
This last point leads to what seems to be a substantial benifit of decorators: they remove the risk that a programmer may accidentally call an undecorated class or function directly, thereby missing out on the decoration logic. In so doing, decorators ensure that the wrapping layer gets invoked for normal calls, and so enforce some measure of consistency. Although a programmer could accidentally forget to use the "@" decorator syntax too, that seems less likely; the "@" syntax appears in just one place, and is very explicit.
On the other hand, the non-decorator alternatives for both tracing and singleton management shown in the prior section can be used with arbitrary classes, without requiring that the special "@" syntax be added. In fact, the non-decorator options can be used on classes that may have been coded in the past before decoration extensions were even foreseen, without requiring any changes to class statements.
For example, using the non-decorator Wrapper with a built-in list, as shown in the previous section, is more straightforward than decorating; to use a new decorator on a built-in list, it must be augmented by coding an artificial subclass. Consider also the timer decorators in this section -- they require that we make an "@" line modification to every class we which to time. A general timer function to which we pass functions might seem less simpler and intrusive to many, especially if we wish to time many arbitrary functions or classes (for an example of a non-decorator timer, see the timer functions in files used to test 3.0 speed later on this page.
Because they can be used with any class, non-decorator alternatives to class decorators might be considered more general. However, they are also less obvious in intent at class statements (though more obvious at instance creation calls), and less forgiving to programmers who might forget to route new instances through the wrapping layer.
Also keep in mind that the utility of both function and class decorators can always be achieved without the "@" syntax; simply use the name rebinding equivalence explicitly. In the first decorator example above, for instance, we could simply code "Person = singleton(Person)", and skip the decorator syntax altogether. Again, decorators are largely just syntactic sugar for a common coding pattern, and as such a style choice. They can, however, help to make the wrapping more obvious, and minimize the chance that clients of a function or class will inadvertently forget to use the wrapping logic.
Simplifying code maintenance
Besides enforcing consistency, the other main advantage of decorators is in code maintenance: because the decoration code appears only at the function or class instead of at every call, it's easier to both add and delete that code in the future. For example, decorator syntax allows decoration to be added to functions and classes more conveniently after they are already in use -- the "@" syntax need only be added at the function or class definition, instead of tracking down and changing every call point in a program. Similarly, if you want to remove decoration logic later to disable debugging logic or optimize calls, decorators require that you remove just the one decorator line, rather than modifying every call to the function or class in your program.
On the other hand again, this maintenance benefit doesn't require decorators -- you can achieve the same effect by either modifying the function or class itself, or coding the assignment equivalent of decorators explicitly instead of using the "@" decorator syntax. So, while decoration is a good idea, the decoration syntax is still largely a stylistic choice. In practice, though, the special "@" syntax will serve to make your intent clearer, and will likely be preferred by most Python programmers.
I recall similar arguments against constructor functions in classes -- prior to introduction of __init__ methods, the same effect was often achieved by running an instance through a method manually when created: X = Class().init(). Over time, though, the __init__ syntax was universally preferred for being more explicit, consistent, and maintainable, despite being fundamentally a stylistic choice. Although you should be the judge, decorators seem to bring many of the same assets.
"Private" and "Public" attributes with class decorators (page 499)
Now that you've read about the new 2.6 and 3.0 class decorators feature in the preceding section, let's put them to work in a more comprehensive example. I wrote the example listed below under Python 2.6: an implementation of a "Private" declaration for class instance attributes (that is, attributes stored on an instance, or inherited from one of its classes). It disallows fetch/change access to such attributes from outside the class, but still allows the class itself to access those names within its methods. It's not exactly C++ or Java, but provides similar access control as an option in Python.
Implementing private attributes
An incomplete first-cut implementation of instance attribute privacy for only changes is described in the book on pages 499-500. Although the example listed below utilizes the new syntactic sugar of class decorators to code attribute privacy, it is ultimately still based upon the __getattr__ and __setattr__ operator overloading methods described in the book, that intercept attribute access. Like the original in the book, When a prviate attribute access is detected, this version uses the "raise" statement to raise an exception, along with an error message, which may be caught in a "try" or allowed to terminate the script (see the Exceptions part of the book for details).
Here is the code, along with a self-test at the bottom of the file:
""" Privacy for attributes fetched from class instances. See self-test code at end of file for a usage example. Decorator same as: Doubler = Private('data', 'size')(Doubler). Private returns onDecorator, onDecorator returns onInstance, and each onInstance instance embeds a Doubler instance. """
traceMe = False def trace(*args): if traceMe: print '[' + ' '.join(map(str, args)) + ']'
def Private(*privates): # privates in enclosing scope def onDecorator(aClass): # aClass in enclosing scope class onInstance: # wrapped in instance attribute def init(self, *args, **kargs): self.wrapped = aClass(*args, **kargs) def getattr(self, attr): # my attrs don't call getattr trace('get:', attr) # others assumed in wrapped if attr in privates: raise TypeError, 'private attribute fetch: ' + attr else: return getattr(self.wrapped, attr) def setattr(self, attr, value): # outside accesses trace('set:', attr, value) # others via getattr if attr == 'wrapped': # allow my attrs self.dict[attr] = value # avoid looping! elif attr in privates: raise TypeError, 'private attribute change: ' + attr else: setattr(self.wrapped, attr, value) # wrapped obj attrs return onInstance # or use dict return onDecorator
if name == 'main': traceMe = True
@Private('data', 'size')
class Doubler:
def __init__(self, label, start):
self.label = label # accesses inside the subject class
self.data = start # not intercepted: run normally
def size(self):
return len(self.data) # methods run with no checking
def double(self): # because privacy not inherited
for i in range(self.size()):
self.data[i] = self.data[i] * 2
def display(self):
print self.label, '=>', self.data
X = Doubler('X is', [1, 2, 3])
Y = Doubler('Y is', [-10, -20, -30])
# the followng all succeed
print X.label # accesses outside the subject class
X.display(); X.double(); X.display() # intercepted: validated, delegated
print Y.label
Y.display(); Y.double()
Y.label = 'Spam =>'
Y.display()
# the following all fail properly
"""
print X.size()
print X.data
X.data = [1, 1, 1]
X.size = lambda S: 0
print Y.data
print Y.size()
"""
- <private.py> -- fetch this file from off page
When run with tracing on, the module file's self-test code produces the following output; notice how the decorator catches and validates both attribute fetches and assignments run outside of the wrapped class, but does not catch attribute usage inside the class itself:
[set: wrapped <__main__.Doubler instance at 0x0B17A3C8>] [set: wrapped <__main__.Doubler instance at 0x09B60F80>] [get: label] X is [get: display] X is => [1, 2, 3] [get: double] [get: display] X is => [2, 4, 6] [get: label] Y is [get: display] Y is => [-10, -20, -30] [get: double] [set: label Spam =>] [get: display] Spam => => [-20, -40, -60]
Discussion
This code is a bit complex, and you're probably best off tracing through it and its examples to see how it works. To help you study, though, here are a few highlights worth mentioning:
- Inheritance versus delegation
The privacy example shown in the book on page 499 uses inheritance to mix in a __setattr__ to catch accesses. Inheritance makes this difficult, however, because it is not straightforward to know the difference between an access from inside or outside the class. Inside accesses should be allowed, and outside access restricted; to work around this, the book example requires inheriting classes to use __dict__ assignments to set attributes -- an incomplete solution at best. The version here uses delegation (embedding an object inside another) instead of inheritance, which makes it much easier to distinguish between accesses inside and outside of the subject class, and so seems to be a better suited pattern. Attribute accesses from outside the subject class are intercepted by the wrapper layer's overloading methods and delegated to the class if valid; accesses inside the class itself (e.g., through "self" in its methods) are not intercepted and are allowed to run normally without checks, because privacy is not inherited here. - Decorator arguments
The class decorator below accepts any number of arguments, to name private attributes. What really happens, though, is that the arguments are passed to the Private function, and Private returns the decorator function to be applied to the subject class. That is, the arguments are used before decoration ever occurs; Private returns the decorator, which in turn "remembers" the privates list as an enclosing scope reference. You can also pass arguments to simple function decorators in the same way; see the timer decorator in the prior section for an example. - State retention and enclosing scopes
Speaking of enclosing scopes, there are actually 3 levels of state retention at work in this code:- The arguments to Private are used before decoration occurs, and are retained as an enclosing scope reference for use in both onDecorator and onInstance calls;
- The class argument to onDecorator is used at decoration time, and is retained as an enclosing scope reference for use at instance construction time;
- And the wrapped instance object is retained as an instance attribute in onInstance, for use when attributes are later accessed from outside the class.
This all works fairly naturally, given Python's scope and namespace rules.
- Using namespace dictionaries
The __setattr__ in this code relies on an instance object's __dict__ attribute, in order to set onInstance's own "wrapped" attribute; it can't assign it directly or __setattr__ would loop (recall that __getattr__ is called for only undefined attributes, but __setattr__ is called for every attribute). However, it uses setattr() instead of __dict__ to set attributes in the wrapped object. Because of that, this will likely work for most classes; recall that new-style classes that have a __slots__ may not store attributes in a __dict__, but we only rely on a __dict__ at the onInstance level here, not in the wrapped instance.
Generalizing for Public declarations too
And now that we have a Private implementation, it's straightforward to generalize the code to allow for Public declarations too -- they are essentially the inverse of Private, so we need only negate the inner test. The example listed below allows a class to use decorators to define a set of either Private or Public instance attributes (attributes stored on an instance or inherited from its classes), with the following semantics:
- Private declares attributes of a class's instances which cannot be fetched or assigned, except from within the code of the class's methods. That is, any name declared Private cannot be accessed from outside the class, and any name not declared Private can be freely fetched or assigned from outside the class.
- Public declares attributes of a class's instances which can be fetched or assigned from both outside the class and from within the class's methods. That is, any name declared Public can be freely accessed anywhere and any name not declared Public cannot be accessed from outside the class.
Private and Public declarations are intended to be mutually exclusive: when using Private, undeclared names are considered Public; when using Public, undeclared names are considered Private. They are essentially inverses, though undeclared names not created by class methods behave slightly different -- they can be assigned and thus created outside the class under Private (all undeclared names are accessible), but cannot under Public (all undeclared names are inaccessible).
The code for this generalization appears below; again, study this on your own for more details. Notice how this adds an additional, 4th level of state retention at the top, beyond that described in the preceding section: the test functions used by the lambdas are saved in an extra enclosing scope.
""" Private and Public declarations for instances of classes, via 2.6+ class decorators. Controls access to attributes stored on an instance, or inherited by it from its classes.
Private() declares attribute names that cannot be fetched or assigned outside the subject class, and Public() declares all the names that can. See private.py for implementation notes and a usage example; this generalizes the code to allow for a Public inverse. """
traceMe = False def trace(*args): if traceMe: print '[' + ' '.join(map(str, args)) + ']'
def accessControl(failIf): def onDecorator(aClass): class onInstance: def init(self, *args, **kargs): self.__wrapped = aClass(*args, **kargs) def getattr(self, attr): trace('get:', attr) if failIf(attr): raise TypeError, 'private attribute fetch: ' + attr else: return getattr(self.__wrapped, attr) def setattr(self, attr, value): trace('set:', attr, value) if attr == '_onInstance__wrapped': self.dict[attr] = value elif failIf(attr): raise TypeError, 'private attribute change: ' + attr else: setattr(self.__wrapped, attr, value) return onInstance return onDecorator
def Private(*attributes): return accessControl(failIf=(lambda attr: attr in attributes))
def Public(*attributes): return accessControl(failIf=(lambda attr: attr not in attributes))
Here's a quick look at these class decorators in action at the interactive prompt; as advertised, non-Private or Public names can be fetched and changed from outside the subject class, but Private or non-Public names cannot:
from access import Private, Public
@Private('age') ... class Person: ... def init(self, name, age): ... self.name = name ... self.age = age # inside accesses run normally ... X = Person('Bob', 40) X.name # outside accesses validated 'Bob' X.name = 'Sue' X.name 'Sue' X.age TypeError: private attribute fetch: age X.age = 'Tom' TypeError: private attribute change: age
@Public('name') ... class Person: ... def init(self, name, age): ... self.name = name ... self.age = age ... X = Person('bob', 40) X.name 'bob' X.name = 'Sue' X.name 'Sue' X.age TypeError: private attribute fetch: age X.age = 'Tom' TypeError: private attribute change: age
For more details, see the following linked files (off page to save space here); the first is the decorators shown above, and the rest give two large client examples:
- <access.py> -- off page: the main code
- <person.py> -- off page: client example
- <vector.py> -- off page: client example
- <formats.py> -- off page: utility used by person.py
Discussion
To help you study the code again, here are a few final notes on the generalized version above:
- Using "__X" names
Besides generalizing, this version also makes use of Python's "__X" pseudo-private name mangling feature, to localize the "wrapped" attribute to the control class, by automatically prefixing it with the class name. This avoids the prior version's risk for collisions with a "wrapped" attribute that may be used by the real, wrapped class, and is useful in a general tool like this. It's not quite privacy, though, because he mangled name can be used freely outside the class. Notoce how we also have to use the fully expanded name string, '_onInstance__wrapped', in __setattr__, because that's what Python changes it to, and __setattr__ is called for every attribute (__getattr__ is called only for undefined names). See pages 543-545 for more on this feature. - Breaking privacy
Although this example implements true access controls for attributes of an instance and its classes, it is possible to subvert these controls in various ways -- for instance, by going through the expanded version of the "wrapped" attribute explicitly ("bob.pay" might not work, but the fully mangled "bob._onInstance__wrapped.pay" could). If you have to explicitly try to do so, though, these controls are probably sufficient for normal intended use. Of course, privacy controls can generally be subverted in any language if you try hard enough ("#define private public" may work in some C++ implementations, too). Although access controls can reduce accidental changes, much of this is up to programmers in any language. Wherever programmers are allowed to change source code, access control will always be a bit of a pipe dream. - Using decorators
We could again do this without decorators, and simply code the equivalent syntax sugar explicitly (class = decorator(class)); the decorator syntax, however makes this consistent and a bit more obvious in the code. The chief potential downside with this approach is that instances of decorated classes are not really instances of the original decorated class -- if you test their type with X.__class__ or isinstance(X, C), for example, they are instances of the wrapper class. Unless you plan to do introspection on the objects, though, this is probably irrelevant. - Intercepting operator-overloading methods
Caveat: the private decorators shown here work for normally named attributes in both 2.6 and 3.0, but fail to intercept and delegate implicit attribute fetches of built-in operations in 3.0 (and in 2.6 new-style classes). Recall that built-in operations run operator-overloading methods of classes if defined; print, for example, triggers method __str__. Unfortunately, built-in operations no longer run their attributes through __getattr__ (or its cousin __getattribute__), even though they do for normal classes in 2.6 and earlier. This means that in 3.0 delegation-based classes like these must redefine operator-overloading methods in the wrapper class, to intercept and delegate built-in operations correctly. There are more complex alternatives (e.g., inspecting the call stack to know where an access arose), but redefinition seems the simplest solution for making wrappers in 3.0 work as they do in 2.6.
And now that I've gone to such great lengths to add Private and Public for Python code, I also need to remind you again that it is not entirely Pythonic to add access controls to your classes like this. In fact, most Python programmers will probably find this example to be largely or totally irrelevant, apart from a demonstration of decorators in action. Most large Python programs get by successfully without any such controls at all. If you do wish to regulate attribute access in order to eliminate coding mistakes, though, or happen to be a soon-to-be-ex-C++-or-Java programmer, most things are possible with Python's operator overloading and introspection tools.
Validating function arguments with decorators
This section contains a somewhat advanced case study which explores various coding alternatives for validating function and method arguments with a function decorator. It has been moved off page, in the interest of space. This section is now available on the following page:
String format method in 2.6 and 3.0 (page 140)
The book discusses the "%" formatting expression for strings, primarily on pages 140-143. Python 2.6 and 3.0 add a new, alternative way to format strings -- the string object's new "format" method. Depending on which source you cite, this new method is either simpler or more advanced that the traditional "%" expression. In any case, it's a reasonable alternative, which may or may not become as widespread as "%" over time.
The basics
In short, the new format() method uses the subject string as a template, and takes any number of arguments that represent values to be substituted according to the template. Within the subject string, curly braces designate substitution targets, and name arguments to be inserted either by position ("{1}") or keyword ("{food}"). Python's ability to collect arbitrary positional and keyword arguments ("*args" and "**args" in Python code) allows for such general method call patterns. In Python 2.6, for example:
template = '{0}, {1} and {2}' # by position template.format('spam', 'ham', 'eggs') 'spam, ham and eggs'
template = '{motto}, {pork} and {food}' # by keyword template.format(motto='spam', pork='ham', food='eggs') 'spam, ham and eggs'
template = '{motto}, {0} and {food}' # by both template.format('ham', motto='spam', food='eggs') 'spam, ham and eggs'
Naturally, the string can also be a literal that creates a temporary string, arbitrary object types can be substituted.
'{motto}, {0} and {food}'.format(42, motto=3.14, food=[1, 2]) '3.14, 42 and [1, 2]'
And just as for the "%" expression and other string methods, format() creates and returns a new string object, which can be printed immediately or saved for further work (recall that string are immutable, so format() really must make a new object). String formatting is not just for display:
X = '{motto}, {0} and {food}'.format(42, motto=3.14, food=[1, 2]) X '3.14, 42 and [1, 2]'
X.split(' and ') ['3.14, 42', '[1, 2]']
Y = X.replace('and', 'but under no circumstances') Y '3.14, 42 but under no circumstances [1, 2]'
Adding keys, attributes, and offsets
Beyond this, format calls can become more complex, to support more advanced usage. For instance, format strings can name object attributes and dictionary keys -- as in normal python syntax, square brackets name dictionary keys and dots denote object attributes, of an item referenced by position or keyword. The first of the following indexes a dictionary on key 'spam', and then fetches attribute 'platform' from the already-imported sys module object; the second does the same, but names the objects by keyword, instead of position:
import sys
'My {1[spam]} runs {0.platform}'.format(sys, {'spam': 'laptop'}) 'My laptop runs win32'
'My {config[spam]} runs {sys.platform}'.format(sys=sys, config={'spam': 'laptop'}) 'My laptop runs win32'
Square brackets in format strings can name list (and other sequence) offsets to perform indexing too, but only single positive offsets work syntactically within format strings, so this feature is not as general as you might think. Just as for "%" expressions, to name negative offsets or slices, or to use arbitrary expression results in general, you must run expressions outside the format string itself:
somelist = list('SPAM') somelist ['S', 'P', 'A', 'M']
'first={0[0]}, third={0[2]}'.format(somelist) 'first=S, third=A'
'first={0}, last={1}'.format(somelist[0], somelist[-1]) # [-1] fails in string 'first=S, last=M'
parts = somelist[0], somelist[-1], somelist[1:3] # [1:3] fails in string 'first={0}, last={1}, middle={2}'.format(*parts) "first=S, last=M, middle=['P', 'A']"
Adding specific formatting
More specific layouts can be achieved by adding a colon after the substitution target's identification, followed by a format specifier which can name field size, justification, and a specific type code. In the following, "{0:10}" means the first positional argument in a field 10 wide; "{1:<10}" means the second positional argument left justified in a 10-wide field; "{0.platform:>10}" means the 'platform' attribute of the first argument right-justified in a 10-wide field; and "{2:g}" means the third argument formatted by default according to the "g" floating-point representation:
'{0:10} = {1:10}'.format('spam', 123.4567) 'spam = 123.457'
'{0:>10} = {1:<10}'.format('spam', 123.4567) ' spam = 123.457 '
'{0.platform:>10} = {1[item]:<10}'.format(sys, dict(item='laptop')) ' win32 = laptop '
'{0:e}, {1:.3e}, {2:g}'.format(3.14159, 3.14159, 3.14159) '3.141590e+00, 3.142e+00, 3.14159'
Hex, octal, and binary formats are supported as well (as describedearlier on this page, binary representation gets a full set of new support in 2.6 and 3.0):
'{0:X}, {1:o}, {2:b}'.format(255, 255, 255) # hex, octal, binary 'FF, 377, 11111111'
bin(255), int('11111111', 2), 0b11111111 # other to/from binary ('0b11111111', 255, 255)
hex(255), int('FF', 16), 0xFF # other to/from hex ('0xff', 255, 255)
oct(255), int('377', 8), 0o377, 0377 # other to/from octal ('0377', 255, 255, 255)
Comparison to current "%" expression
At least for positional references and dictionary keys, this begins to look very much like the current "%" formatting expression, especially in advanced use with type codes and extra formatting syntax. The current "%" expression can't handle keywords, attribute references, and binary type codes, though dictionary key references can often achieve some similar goals (see pages 142-143 in the book for more on dictionary key references with "%"). Compare the following to the equivalent format method calls above to see how the two techniques overlap:
BASICS
template = '%s, %s, %s' template % ('spam', 'ham', 'eggs') # by position 'spam, ham, eggs'
template = '%(motto)s, %(pork)s and %(food)s' template % dict(motto='spam', pork='ham', food='eggs') # by key 'spam, ham and eggs'
'%s, %s and %s' % (3.14, 42, [1, 2]) # arbitrary types '3.14, 42 and [1, 2]'
KEYS, ATTRIBUTES, AND OFFSETS
'My %(spam)s runs %(platform)s' % {'spam': 'laptop', 'platform': sys.platform} 'My laptop runs win32'
'My %(spam)s runs %(platform)s' % dict(spam='laptop', platform=sys.platform) 'My laptop runs win32'
somelist = list('SPAM') parts = somelist[0], somelist[-1], somelist[1:3] 'first=%s, last=%s, middle=%s' % parts "first=S, last=M, middle=['P', 'A']"
SPECIFIC FORMATTING
'%-10s = %10s' % ('spam', 123.4567) 'spam = 123.4567'
'%10s = %-10s' % ('spam', 123.4567) ' spam = 123.4567 '
'%(plat)10s = %(item)-10s' % dict(plat=sys.platform, item='laptop') ' win32 = laptop '
'%e, %.3e, %g' % (3.14159, 3.14159, 3.14159) '3.141590e+00, 3.142e+00, 3.14159'
HEX AND OCTAL, BUT NOT BINARY
'%x, %o' % (255, 255) 'ff, 377'
Formatting that is even more complex seems to be essentially a draw in terms of complexity. For instance, the following shows the same result generated with both techniques, with field sizes and justifications, and various argument reference methods:
hard-coded references in both
'My {1[spam]:<8} runs {0.platform:>8}'.format(sys, {'spam': 'laptop'}) 'My laptop runs win32'
'My %(spam)-8s runs %(platform)8s' % dict(spam='laptop', platform=sys.platform) 'My laptop runs win32'
In practice, programs are less likely to hard-code references like this, than to execute code that builds-up a set of substitution data ahead of time (to collect data to substitute into a HTML template all at once, for instance). When we account for common practice in examples like this, the comparison between the format() method and "%" expression is even more direct:
build data ahead of time
data = dict(platform=sys.platform, spam='laptop')
'My {spam:<8} runs {platform:>8}'.format(**data) 'My laptop runs win32'
'My %(spam)-8s runs %(platform)8s' % data 'My laptop runs win32'
As usual, the Python community will have to decide which technique proves itself better over time. Experiment with some of these on your own to get a feel for what is available, and be sure to see Python 2.6 and 3.0 documentation for more details. Also see the discussion of 3.0 feature changes below; the "%" expression may become deprecated in a future 3.X release, though this seems a bit too controversial to call today.
Fraction number type in 2.6 and 3.0 (page 107)
Python 2.6 introduces a new numeric type, Fraction, which implements a rational number object. It essentially keeps both numerator and denominator explicitly, so as to avoid some of the inaccuracies and limitations of floating point math hardware.
The basics
Fraction is something of a cousin to the existing Decimal fixed-precision type described on pages 107-108, which also can be used to control numerical accuracy, by fixing decimal digits and specifying rounding or truncation policies. It's also used in similar ways -- like Decimal, this new type resides in a module; import its constructor, and pass in numerator and denominator to make one. The following interaction in Python 2.6 shows how:
from fractions import Fraction x = Fraction(1, 3) y = Fraction(4, 6)
x Fraction(1, 3)
y Fraction(2, 3)
print y 2/3
Once created, Fractions can be used in mathematical expressions as usual:
x + y Fraction(1, 1)
x - y Fraction(-1, 3)
x * y Fraction(2, 9)
Numeric accuracy
Notice that this is different from floating-point type math, which is dependent on the underlying limitations of floating-point hardware:
a = 1 / 3. b = 4 / 6. a 0.33333333333333331 b 0.66666666666666663
a + b 1.0 a - b -0.33333333333333331 a * b 0.22222222222222221
This is especially true for floating-point values that cannot be represented accurately given their limited number of bits; both Fraction and Decimal provide ways to get exact results:
0.1 + 0.1 + 0.1 - 0.3 5.5511151231257827e-17
from fractions import Fraction Fraction(1, 10) + Fraction(1, 10) + Fraction(1, 10) - Fraction(3, 10) Fraction(0, 1)
from decimal import Decimal Decimal('0.1') + Decimal('0.1') + Decimal('0.1') - Decimal('0.3') Decimal('0.0')
Fraction(1000, 1234567890) Fraction(100, 123456789)
1000./1234567890 8.1000000737100011e-07
Conversions and mixed types
To support conversions, floating-point objects also now have a method that yields their numerator and denominator ratio, and float() accepts a Fraction as an argument. Trace through the following interaction to see how this pans out:
(2.5).as_integer_ratio() (5, 2)
f = 2.5 z = Fraction(*f.as_integer_ratio()) z Fraction(5, 2)
x Fraction(1, 3)
x + z Fraction(17, 6)
float(x) 0.33333333333333331
float(z) 2.5
float(x + z) 2.8333333333333335
Finally, some type-mixing is allowed in expressions, though Fraction must sometimes be manually propagated to retain accuracy. Study the following interaction to see how this works:
x Fraction(1, 3)
x + 2 Fraction(7, 3)
x + 2.0 2.3333333333333335
x + (1./3) 0.66666666666666663
x + (4./3) 1.6666666666666665
x + Fraction(*(4./3).as_integer_ratio()) Fraction(22517998136852479, 13510798882111488)
22517998136852479 / 13510798882111488. 1.6666666666666667
x + Fraction(4, 3) Fraction(5, 3)
For more details on the Fraction type, see Python 2.6 and 3.0 documentation.
String types model in 3.0 (chapters 4 and 7)
This section contains an in-depth survey of Python 3.0's new str/bytes string types, and their support for Unicode and binary data. Although simple ASCII strings still work as they did in earlier Pythons, this is one of the most significant changes in 3.0, as it impacts prior code written to process Unicode data or binary files. In the interest of space, this material has been moved off page. This section is now available on this page:
New iterators in 3.0: range, dictionary views, map and zip (pages 265, 81, 160)
In addition to the new set and dictionary generator forms described in a previous note, 3.0 also emphasizes iterators more widely than 2.X: range(), some dictionary method results, and other built-ins such as map(), filter(), and zip(), all are iterators that produce results on demand in 3.0, instead of constructing result lists as in 2.6. Although this saves memory space, it can impact your coding styles in some contexts.
Range iterator
The range() built-in returns an iterator in 3.0 that generates the numbers in the range on demand, instead of actually building the result list in memory. This subsumes the functionality of the old xrange() (which is no longer available), and you must use list(range(...)) to force an actual range list if one is needed. Unlike the list returned by range in 2.X, range objects in 3.0 support only iteration, indexing, and the len() function. Also note that the iter.next() call becomes the next(iter) built-in in 3.0; iterators like range() have a __next__() method but no longer a next():
C:\misc>c:\python30\python
R = range(10) # range is an iterator, not a list R range(0, 10)
I = iter(R) # make an iterator from the range next(I) # advance to next result 0 # this is what happens in for loops, comprehensions, etc. next(I) 1 next(I) 2
R range(0, 10) len(R) # range also does len and indexing, but no other sequence ops 10 R[0] 0 R[-1] 9 R[-2] 8
next(I) # continue taking form iterator, where left off 3 I.next() # .next() becomes .next(), but use new next() 4
list(range(10)) # to force a list of required [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Dictionary views
Similarly, dictionary keys(), values(), and items() methods return a "view" object in 3.0 which has an iterator that generates result items one at a time, instead of producing the result list all at once in memory. View items maintain the same physical ordering as that of the dictionary:
D = dict(a=1, b=2, c=3) D {'a': 1, 'c': 3, 'b': 2}
K = D.keys() # makes view object, not a list K <dict_keys object at 0x026D83C0>
next(K) # views are not iterators themselves TypeError: dict_keys object is not an iterator
I = iter(K) # views have an iterator next(I) # which can be used manually 'a' # but do not support len() or indexing next(I) 'c'
for k in D.keys(): print(k) # used automatically in iteration contexts ... a c b
As for iterators in 2.X, you can always force a 3.0 dictionary view to build a real list by passing it to the list() built-in if needed. In addition, 3.0 dictionaries still have iterators themselves, which return successive keys -- its still often not necessary to call keys() directly, as it is in 2.6:
list(K) # you can still force a real list if needed ['a', 'c', 'b']
V = D.values() # ditto for values() and items() views V <dict_values object at 0x026D8260>
list(V) [1, 3, 2]
list(D.items()) [('a', 1), ('c', 3), ('b', 2)]
D # dictionaries still have their own iterator {'a': 1, 'c': 3, 'b': 2} # which returns the next key on each iteration I = iter(D) next(I) 'a' next(I) 'c'
for key in D: print(key) # still no need to call keys() to iterate ... a c b
Also unlike 2.X's list results, 3.0's view objects for the keys() method are set-like, and support common set operations such as intersection and union; values() views are not, since they aren't unique, but items() results are if their (key, value) pairs are unique and hashable. Moreover, views are not carved in stone when created -- they dynamically reflect future changes made to the dictionary after the view object has been created:
D = {'a':1, 'b':2, 'c':3} D {'a': 1, 'c': 3, 'b': 2}
K = D.keys() V = D.values()
list(K) # maintain same order as dictionary ['a', 'c', 'b'] list(V) [1, 3, 2]
del D['b'] # change the dictionary in-place D {'a': 1, 'c': 3}
list(K) # reflected in any current view objects ['a', 'c'] list(V) [1, 3]
K | {'x': 4} # keys(), and some items() views, are set-like {'a', 'x', 'c'}
V & {'x': 4} TypeError: unsupported operand type(s) for &: 'dict_values' and 'dict'
V & {'x': 4}.values() TypeError: unsupported operand type(s) for &: 'dict_values' and 'dict_values'
Two other code notes for 3.0 dictionaries: First of all, because keys() is not a list, the traditional coding pattern for scanning a dictionary by sorted keys won't work in 3.0 -- use the sorted() call instead, on either a keys view or the dictionary itself (see the book for more on this alternative):
D = dict(zip(['a', 'b', 'c'], [1, 2, 3])) # two ways to zip keys/values D = { k:v for (k, v) in zip(['a', 'b', 'c'], [1, 2, 3]) } # 3.0 dict comprehension D {'a': 1, 'c': 3, 'b': 2}
Ks = D.keys() # sorting a view object doesn't work! Ks.sort() AttributeError: 'dict_keys' object has no attribute 'sort'
Ks = list(Ks) # you can force it to be a list and then sort Ks.sort() for k in Ks: print(k, D[k]) ... a 1 b 2 c 3
D {'a': 1, 'c': 3, 'b': 2} Ks = D.keys() # or you can use sorted() on the keys for k in sorted(Ks): print(k, D[k]) # sorted() accepts any iterable, not just lists ... # sorted() returns its result a 1 b 2 c 3
D {'a': 1, 'c': 3, 'b': 2} # better yet, sort the dict directly as in the book for k in sorted(D): print(k, D[k]) # dict iterators return one key per iteration ... a 1 b 2 c 3
Secondly, the dictionary has_key() method is gone in 3.0 - use the "in" expression instead (see the book for more on this alternative):
D {'a': 1, 'c': 3, 'b': 2}
D.has_key('c') AttributeError: 'dict' object has no attribute 'has_key'
'c' in D True 'x' in D False
if 'c' in D: print('present', D['c']) ... present 3
map(), filter(), and zip() iterators
Finally, the map(), filter(), and zip() built-ins also become iterators in 3.0 to conserve space, rather than producing a result list all at once in memory. Unlike range(), though, they return iterators directly -- after you step through their results once, they are exhausted. In other words, you can't have multiple iterators on their results. Here is the case for map() and zip(); as with other iterators you can force a list with list() if you really need one, but their default behavior can save substantial space in memory for large result sets:
M = map(abs, (-1, 0, 1)) # map returns an iterator, not a list M <map object at 0x0276B890> next(M) # use iterator manually: exhausts results 1 # these do not support len() or indexing next(M) 0 next(M) 1 next(M) StopIteration
for x in M: print(x) # map iterator is now empty: one pass only ...
M = map(abs, (-1, 0, 1)) # make a new iterator to scan again for x in M: print(x) # iteration contexts automatically call next() ... 1 0 1 list(map(abs, (-1, 0, 1))) # can force a real list if needed [1, 0, 1]
Z = zip((1, 2, 3), (10, 20, 30)) # zip is the same: a one-pass iterator Z <zip object at 0x02770EE0>
list(Z) [(1, 10), (2, 20), (3, 30)]
for pair in Z: print(pair) # exhausted after one pass ...
Z = zip((1, 2, 3), (10, 20, 30)) for pair in Z: print(pair) # use iterator automatically or manually ... (1, 10) (2, 20) (3, 30)
Z = zip((1, 2, 3), (10, 20, 30)) next(Z) (1, 10) next(Z) (2, 20)
The range() object differs slightly, though: it supports len() and indexing(), it is not its own iterator (you make one with iter() when iterating manually), and it can produce multiple iterators over its result which remember their positions independently:
R = range(3) # range allows multiple iterators next(R) TypeError: range object is not an iterator
I1 = iter(R) next(I1) 0 next(I1) 1
I2 = iter(R) next(I2) 0 next(I2) 1
The "nonlocal" statement in 3.0 (pages 318-326)
Python 3.0 sports a new "nonlocal" statement, which allows assignment to names in enclosing function scopes, and limits scope lookups to enclosing defs. The net effect is more direct and reliable implementation of changeable scope information, for programs that do not desire or need classes with attributes.
The Basics
As predicted by the note on page 218, Python 3.0 introduces a new statement, "nonlocal name1, name2, ...", which allows a nested function to change one or more names defined in a lexically enclosing function's scope. In 2.X Python, when one function def is nested in another, the nested function can reference any of the names defined (assigned) in the enclosing def's scope, but it cannot change them. In 3.0, by declaring the enclosing scopes names in a "nonlocal" statement, it can assign and so change them as well.
This provides a way for enclosing functions to provide state information remembered when the nested function is later called. By allowing the state to change, it becomes more useful to the nested function (imagine a counter in the enclosing scope, for instance). In 2.X, the prescription for achieving this effect is to declare the state "global" in both functions, in order to force it out to the enclosing module's scope; or better yet, use classes with attributes to make the state more explicit than nested scope references allow. Because nested functions have become a more common coding pattern for state retention, though, the nonlocal makes it more generally applicable.
Besides allowing names in enclosing defs to be changed, the "nonlocal" statement also forces the issue for references -- just like the "global" statement, "nonlocal" causes searches for the names listed in the statement to begin in the enclosing defs' scopes, not in the local scope of the declaring function. That is, "nonlocal" also means "skip my local scope entirely".
in fact, the names listed in a "nonlocal" must be previously defined in an enclosing def when "nonlocal" is reached, or an error is raised. The net effect is much like "global": "global" means the names resides in the enclosing module, and "nonlocal" means they reside in an enclosing def. The "nonlocal" is even more strict, though: scope search is restricted to only enclosing defs. That is, nonlocal names can appear only in enclosing defs, and not the module's global or built-in scopes outside the defs.
Note that name reference scope rules are still generally as before -- the "LEGB" rule, for Local (names assigned in a def), then Enclosing (names in an enclosing def), then Global (names at the top-level of a module), and finally Built-in (names predefined by Python in the built-in module). The nonlocal statement allows names in enclosing scopes to be changed, not just referenced. In addition, though, "global" and "nonlocal" restrict the look-up rules:
- "global" makes scope lookup begin in the enclosing module's scope, and allows names there to be assigned. Scope lookup continues on to the built-in scope if the name does not exist in the module, but assignments to "global" names always create or change them in the module's scope.
- "nonlocal" restricts scope lookup to just enclosing defs and requires that they already exist, and allows names there to be assigned. Scope lookup does not continue on to the global or built-in scopes. Reference to enclosing def scope names is allowed in 2.6, but not assignment; in 2.6, you can still use classes with explicit attributes to achieve the same changeable state information effect as nonlocals (and may be better off in some contexts -- more on this in a moment).
nonlocal in action
On to some examples, all run in 3.0. References to enclosing def scopes work as they do in 2.6; in the following, "tester" builds and returns function "nested", to be called later; the "state" reference in "nested" maps the local scope of "tester", by the normal scope look-up rules:
C:\misc>c:\python30\python
def tester(start): ... state = start ... def nested(label): ... print(label, state) ... return nested ... F = tester(0) F('spam') spam 0 F('ham') ham 0
Changing a name in an enclosing def's scope is not allowed by default, though; this is the normal case in 2.6 as well:
def tester(start): ... state = start ... def nested(label): ... print(label, state) ... state += 1 ... return nested ... F = tester(0) F('spam') UnboundLocalError: local variable 'state' referenced before assignment
Now, under 3.0 if we declare "state" in the "tester" scope as nonlocal within "nested", we get to change it inside the nested function. This works even though "tester" has returned and exited by the time we call the returned "nested" function through name "F".
def tester(start): ... state = start ... def nested(label): ... nonlocal state ... print(label, state) ... state += 1 ... return nested ... F = tester(0) F('spam') spam 0 F('ham') ham 1 F('eggs') eggs 2
As usual with enclosing scope references, we can call the "tester" factory function multiple times, to get multiple copies of its state in memory. The "state" object in the enclosing scope is essentially attached to the "nested" function object returned; each call makes a new, distinct "state" object, such that updating one functions state won't impact the other. The following continues the prior listing's interaction:
G = tester(42) # make a new tester that starts at 42 G('spam') spam 42
G('eggs') # my state information updated to 43 eggs 43
F('bacon') # but F's is where it left off: at 3 bacon 3 # each call has different state information
Finally, some boundary cases: Unlike the "global" statement, "nonlocal" names really must be previously assigned in an enclosing def's scope, or else you'll get an error -- you cannot create them dynamically by assigning them anew in the enclosing scope:
def tester(start): ... def nested(label): ... nonlocal state # nonlocals must already exist in enclosing def! ... state = 0 ... print(label, state) ... return nested ... SyntaxError: no binding for nonlocal 'state' found
def tester(start): ... def nested(label): ... global state # globals don't have to exist yet when declared ... state = 0 # this creates the name in the module now ... print(label, state) ... return nested ... F = tester(0) F('abc') abc 0 state 0
Moreover, "nonlocal" restricts the scope lookup to just enclosing defs; nonlocals are not looked up in the enclosing module's global scope or the built0in scope outside all defs, even if they are already there:
spam = 99 def tester(): ... def nested(): ... nonlocal spam # must be in a def, not the module! ... print('Current=', spam) ... spam += 1 ... return nested ... SyntaxError: no binding for nonlocal 'spam' found
Why use nonlocal?
There are a variety of ways to "remember" information across function and method calls in Python; while there are tradeoffs for all, "nonlocal" does improve this story for enclosing scope references. The "nonlocal" statement allows multiple copies of state, and addresses simple state retention needs where classes may not be warranted.
For example, one usual prescription for achieving the nonlocal effect in 2.6 and earlier is to simply move the state out to the global (module) scope:
def tester(start): ... global state # move it out to the module to change it ... state = start # global allows changed in the module scope ... def nested(label): ... global state ... print(label, state) ... state += 1 ... return nested ... F = tester(0) F('spam') spam 0 F('eggs') eggs 1
This works, but requires "global" declarations in both functions, and is prone to name collisions in the global scope. Worse, and more subtle, it only allows for a single shared copy of the state information in the module scope -- if we call "tester" again, we'll wind up resetting the module's "state" variable, such that prior calls will see their state overwritten. With "nonlocal" instead of "global", each call to "tester" remembers its own unique copy of the "state" object.
G = tester(42) # restes state's single copy in global scope G('toast')
toast 42
G('bacon') bacon 43
F('ham') # oops -- my counter has been overwritten! ham 44
The other prescription for changeable state information in 2.6 and earlier is to use classes with attributes to make state information access more explicit than the implicit magic of scope lookup rules. As an added benefit, each instance of a class gets a fresh copy of state information, as a natural byproduct of Python's object model. Here is a reformulation is the tester/nested functions above as a class; notice how the second version renames "nested" in this to "__call__" to make the equivalence even more direct (__call__ intercepts direct calls on an instance, so we don't need to call a named method):
class tester: # class-based alternative ... def init(self, start): ... self.state = start # save state explicitly ... def nested(self, label): ... print(label, self.state) # reference state explicitly ... self.state += 1 # changes are always allowed ... F = tester(0) F.nested('spam') spam 0 F.nested('ham') ham 1
G = tester(42) # each instance gets new copy of state G.nested('toast') # changing one does not impact others toast 42 G.nested('bacon') bacon 43
F.nested('eggs') # F's state is where is left off eggs 2
same, but use call instead of a named method
class tester: ... def init(self, start): ... self.state = start ... def call(self, label): # intercept direct instance calls ... print(label, self.state) # so .nested() not required ... self.state += 1 ... H = tester(99) H('juice') # invokes call juice 99 H('pancakes') pancakes 100
While using classes for state information is a generally good rule of thumb to follow, they might be overkill in cases like this, where state is a single counter. Such trivial state cases are more common than you might think; see the section on decorators elsewhere on this page for a prime example of nonlocal at work -- a function decorator that requires a per-function calls counter. In such contexts, nested defs are sometimes more lightweight than coding classes in those and similar examples, especially if you're not familiar with OOP yet.
To me, in the example we've been using, functions and nonlocals seem arguably more complex than classes when state must be changed, but simpler when state is just referenced. In either case, classes can make your intentions more apparent to the next person who may have to read your code. This is a subjective call, though, especially is the nested is deeper than in the equivalences that follow; you should always be the judge:
changeable state with defs and classes
def tester(start): # original function-based version ... state = start ... def nested(label): ... nonlocal state # change value retained in enclosing scope ... print(label, state) ... state += 1 ... return nested ... F = tester(0) F('spam')
class tester: # class-based alternative ... def init(self, start): ... self.state = start # save state explicitly ... def call(self, label): ... print(label, self.state) # reference state explicitly ... self.state += 1 # changes are always allowed ... F = tester(0) F('spam') # invokes call
reference-only state with defs and classes
def tester(state): # retain state in enclosing def scope ... def nested(label): ... print(label, state) # references value in enclosing scope ... return nested ... F = tester(0) F('spam')
class tester: # retain state in explicit attributes ... def init(self, state): ... self.state = state # state saved and referenced explicitly ... def call(self, label): ... print(label, self.state) ... F = tester(0) F('spam')
Division operator change in 3.0 (pages 102-103)
As promised on pages 102-103 of the book, the division operator has been changed to perform "true" division in 3.0. This means that "/" and "//" behave differently in 2,6 and 3.0:
- In 3.0, "/" now always returns a float result which includes any remainder, regardless of operand types. The "//" performs "floor" division, which truncates the remainder, and returns an int for integer operands, or a float if any operand is float.
- In 2.6, "/" performs truncating integer division if both operands are integers, and does float division keeping remainders otherwise. The "//" works as it does in 3.0, performing truncating division for integers, and floor division for floats. Here are the two operators at work in 3.0 and 2.6:
C:\misc>C:\Python30\python
10 / 4 # differs: keeps remainder 2.5 10 // 4 # same: truncates remainder 2 10 / 4.0 # same 2.5 10 // 4.0 # same 2.0
C:\misc>C:\Python26\python
10 / 4
2 10 // 4 2 10 / 4.0 2.5 10 // 4.0 2.0
Notice how the data type of the result for "//" is still dependent on the operand types in 3.0 -- if either is a float the result is float, else it is int. Although this seems to inherit the same type-dependent behavior of "/" in 2.X which motivated its change in 3.0, the type of the return value is much less critical than differences in the return value itself. Moreover, because "//" was provided in part as a backward-compatibility tool for programs that rely on truncating integer division (and there are many), it must return int for ints.
Supporting either Python
Although "/" behavior differs in 2.6 and 3.0, you can still support both versions in your code. If your programs depend on truncating integer division, use "//" in both 2.6 and 3.0. If your programs require floating point results with remainders for integers, use float() to guarantee that one operand is a float around a "/" when run in 2.6:
X = Y // Z # always truncates, always an int result if Y and Z are int
X = Y / float(Z) # guarantees float division with remainder in either 2.6 or 3.0
Alternatively, to use 3.0 "/" division in 2.6, you can enable it with a __future__ import, rather than forcing it with float() conversions:
C:\misc>C:\Python26\python
from future import division # enable 3.0 "/" behavior 10 / 4 2.5 10 // 4 2
Floor versus truncation
One subtlety: the "//" operator is generally referred to as truncating division, but it's more accurate to refer to it as floor division -- it truncates the result down to its floor, which means the closest whole number below the true result. The net effect is to round down, not strictly truncate, and this matters for negatives. You can see the difference for yourself with the Python math module:
import math math.floor(2.5) 2 math.floor(-2.5) -3 math.trunc(2.5) 2 math.trunc(-2.5) -2
When running division operators, you only really truncate for positive results, since truncation is the same as floor; for negatives, it's a floor result (really, they are both floor, but floor is the same as truncation for positives):
C:\Users\veramark\Mark\misc>c:\python30\python.exe
5 / 2, 5 / -2 (2.5, -2.5)
5 // 2, 5 // -2 # truncates to floor: rounds to first lower integer (2, -3) # 2.5 becomes 2, -2.5 becomes 3
5 / 2.0, 5 / -2.0 (2.5, -2.5)
5 // 2.0, 5 // -2.0 # ditto for floats, though result is float too (2.0, -3.0)
C:\Users\veramark\Mark\misc>c:\python26\python.exe
5 / 2, 5 / -2 # differs in 3.0 (2, -3)
5 // 2, 5 // -2 # this and the rest are the same in 2.6 and 3.0 (2, -3)
5 / 2.0, 5 / -2.0 (2.5, -2.5)
5 // 2.0, 5 // -2.0 (2.0, -3.0)
If you really want truncation regardless of sign, you can always run a float division result through math.trunc(), regardless of Python version (also see math.round(), for related functionality):
C:\Users\veramark\Mark\misc>c:\python30\python.exe
import math 5 / -2 # keep remainder -2.5 5 // -2 # floor below result -3 math.trunc(5 / -2) # truncate instead of floor -2
C:\Users\veramark\Mark\misc>c:\python26\python.exe
import math 5 / -2 # floor -3 5 // -2 # floor -3 5 / float(-2) # remainder in 2.6 -2.5 math.trunc(5 / float(-2)) -2
Python 2.6 and 3.0, and this book
[Update, January 2009]: Please also read the next sectionabout 3.0 performance (and other) issues. They are critical enough that most programmers may be best served by the 2.X line for now, unless and until a new, optimized 3.X release is available. (As also described below, Python 3.1 addresses 3.0 performance issues as of July 2009.)
[November 2008] I've begun receiving emails from readers wondering if a Python 3.0 version of this book is in the works. While we will publish a 3.0 edition eventually, it won't happen in the near term, because virtually every Python programmer will be using the 2.X line for some time to come. The 3.0 user base is almost non-existent today, and isn't expected to become dominant for at least 1-2 years. Although a 3.0 book might appeal to some early adopters, it would alienate the vast majority of people using Python today.
As a comparison (and to help you guess when a 3.0 edition might happen), I don't expect any of the students in my live classes to have to know 3.0 specifically for at least one year. They have 2.X dependencies in their work that preclude 3.0 adoption today. Because of that, for all of 2009, I will be teaching with Python 2.6 in classes, and pointing out upcoming 3.0 changes along the way.
This is essentially what the current edition of this book does too. The 3rd Edition of this book is based upon Python 2.5, with discussion of upcoming Python 3.0 features in notes and a Preface section. Because of this approach, this book applies to 2.6 directly, and can be used by 3.0 adopters in conjunction with the 3.0 notes it includes. For reasons described in more detail below, I recommend readers use this book to start out with 2.X Python today, and explore differences in 3.0 later as it becomes more widely used.
Parallel Pythons: 2.6 and 3.0
Python 2.6 was released in October 2008, one year after this book was published. Python 2.6 is fully backward-compatible with 2.5, and is a continuation of the 2.X Python line which simply adds a handful of minor and optional extensions. Because of that, this book applies completely and directly to 2.6, as well as earlier 2.X versions.
For instance, 2.6 introduces the new string format method, class decorators, and fractional numbers, described earlier on this page. Other 2.6 features such as "with" context managers (now enabled in 2.6), and absolute/relative imports (still partially enabled in 2.6) are already covered in the current edition. With the exception of "with" and "as" becoming official reserved words in 2.6, these are all non-intrusive extensions.
On the other hand, Python 3.0, currently due to be released in December 2008, will not be backward compatible with the 2.X Python line. That is, most 2.X code will not run under 3.0 unchanged. Fundamental changes, such as the new print function, the stings/bytes distinction, and dictionary method changes, guarantee some code breakage. A script to be shipped with 3.0, "2to3", will automatically convert much 2.X code to run on 3.0, but this addresses existing code, not new development. Although most of the language is the same in 3.0 and many applications programmers will not notice a major difference, 3.0 does introduce new tools and techniques such as the new string format method that are not fully explored in 2.X-based books.
Most observers do not expect Python 3.0 to be the most widely used version for perhaps two years or more, for a variety of reasons. Many popular 3rd-party extensions for Python are not expected to become available in 3.0 form for up to one year after 3.0 release. Moreover, most Python programmers today must use systems and code based upon Python 2.X, and so may be unable to migrate to 3.0 for years to come. In fact, because the existing code and user bases are so large today, the 2.X line will be developed and fully supported in parallel with the 3.X line for perhaps 3 to 5 more years, with 2.7 and 2.8 releases already planned.
Which Python to learn?
The net effect of this dual-version strategy is that the Python user base may be split for the next few years, between the 2.X and 3.X lines. The 2.X camp will dominate in the near term, but will likely be overtaken by 3.0 over time.
This can make it difficult for newcomers to decide which version to get started with: does one jump up to 3.0 immediately, or start with the more widely used 2.X line? Most programmers have no choice today: 2.X is required by nearly all existing Python-based systems. If you have the luxury of truly starting from scratch, though, the choice is less clear.
Because almost all programmers need to learn and use 2.X code today, unless you have more specific needs, I recommend starting out with the 2.X coverage in this book, and studying 3.0 changes slowly, using the resources in this book as well as those available on-line. The core ideas stressed in this book are the same, regardless of which version of Python you use. The differences are largely in smaller details. Most of what you learn for 2.X today will apply to Python 3.0 in the future, if and when you are able to migrate.
For more details on the 2.6/3.0 fork, see the release pages at www.python.org. Also note that Python core developers themselves suggest a similar approach: writing 2.X code now, and using the auto-conversion script to move to 3.0 when the time comes. A book is more focused on teaching, of course, but the same general recommendation applies.
Python 3.0 concerns
[Update, July '09]: Python 3.1 addresses most (but not all) of Python 3.0's I/O speed issues described in this section. Because of this, for most programmers the choice to use Python 3.X over 2.X is now largely about aesthetic language design issues, and third-party software dependencies. Please see the section below for up-to-date Python 3.X performance details.
[January 2009] If you are trying to decide between Python 2.6 and 3.0, you should know that both will be supported equally for the foreseeable future, and there is no clear bias towards either in the Python world today (apart from the usual smattering of overly strong opinions). You should also be aware of some common emerging concerns that may impact 3.0 adoption, by both you and the Python world at large. Among these:
- File I/O performance can be much (MUCH!) slower in Python 3.0 -- so much so that 3.0 will be impractical for many types of programs. Although an optimized future 3.X release may address this to some extent, 3.0 design decisions make this uncertain. As described later on this page, 3.0.1 improves the speed of one input mode, but nothing else; 3.1 has done much better, resolving the issue formost programmers.
- Many popular third-party libraries will not be available for 3.0 for a substantial amount of time (some won't be available until until 3.1 at best). Since extensions are where much of the action happens in real Python work, this is a deal-breaker for many.
- Some new language features of 3.0, such as its new string formatting method, seem to many to be cases of trading one complicated thing for another complicated thing, with no net reduction in complexity.
- Unless you have to deal with non-ASCII Unicode on a regular basis, the merging of unicode and string into a single string type in 3.0 will probably not seem like a net win either. We've simply traded 2 different string types in 2.X (str/unicode), for 2 different string types in 3.0 (str/byte, plus the mutable bytearray). In the process, we have made text input much slower -- reading line-by-line is typically 40X slower, likely due in large part to Unicode support. Equally disturbing, we seem to have also pushed the complexities of Unicode down to many more programmers than in the past.
These are common consensus issues, which I'll expand on in this section. Some may be temporary, of course, and others are subjective. But they are significant enough to raise doubt over 3.0 adoption. It's not completely impossible that 3.0 will be treated as beta-quality software, or worse, go down in flames as a failed attempt to fix something that was not broken.
I hope not; 3.0 has much to offer, and I suspect most of its issues will be resolved in the span of a year. At the least, though, you should understand its tradeoffs before migrating to it yourself. Python 3.0's future is, of course, up to the Python user-base to decide.
Performance problems in 3.0
Due to its growth in size, I have moved this material to its own top-level section immediately following this one -- please click the following link to read it:
The short story is that Python 3.0 has serious performance issues when reading or writing files. For example, in 3.0, writing text files can reportedly run 5 to 8 times slower; reading text files line-by-line can run 40 to 70 times slower; and reading large files all-at-once can run hundreds of times slower, or worse.
Although feature issues matter too, performance is probably the issue most likely to impede 3.0 adoption. As described in this topic's separate section, I/O speed is so bad as to make 3.0 impractical for many programmers today. As also described ahead, the more recent Python 3.0.1 speeds up one mode of input substantially to remove pathologically bad cases, but does not address other modes. A broader solution will likely have to await 3.1, which will include a complete rewrite of the I/O system in the C language for speed.
[Update, July '09]: Again, Python 3.1 addresses most (but not all) of Python 3.0's I/O speed issues described in this section. Please see the section below for up-to-date Python 3.X performance details.
Third-party library concerns in 3.0
Although more temporary a concern than the others, it should also be noted that it will be some time before most third-party libraries and frameworks become available for the 3.X line. This includes popular open-source extensions for Python such as Web frameworks, GUI toolkits, numeric programming packages, and the majority of extensions that real-world Python developers rely upon.
Third-party extensions are where much of the real action occurs in Python work. In the web domain, for instance, Python's standard library tools for sockets, servers, and various protocols will be present in 3.0 as they are in 2.6. However, few developers do any realistic work with these alone. They also depend upon extensions such as Google's App Engine, Django, TurboGears, and other popular frameworks. Until these packages are available for 3.0, 2.6 will still be the version of choice for most such developers.
The GUI domain is largely the same story: Python's Tkinter toolkit is available in the 3.0 standard library, but extenstions such as the wxPython toolkit and PIL image processing library are only available for 2.6, and won't be ported to 3.X for some time (perhaps when 3.1 is available in mid-2009, per recent reports). A 3.0 version of Python's interface to the MySQL database system is similarly on hold until Python 3.1, at the earliest.
Again, this is a temporary argument against migrating to 3.0 today; while this will vary from package to package and is impossible to predict, it seems likely that most popular extensions will be ported to 3.0 in roughly one year's time (assuming 3.0 itself grows in popularity, of course). Still, this is an important, and perhaps a show-stopper, concern to many today. If you rely on an extension not available in 3.0 form, migrating is not yet an option.
Feature concerns in 3.0: string formatting
The performance problem of the prior section is in addition to that fact that, at least to some observers, 3.0 solved some non-problems. Many people, for instance, argue that the new string formatting method described above is redundant with, and sometimes more complex than, the existing "%" formatting operator. Compare, for example, a very simple case of 2.X formatting versus 3.0 format methods:
print '%s=%s' % ('spam', 42) # 2.X format expression
print('{0}={1}'.format('spam', 42)) # 3.0 (and 2.6) format method
Whether this is improvement or regression, it certainly does add redundancy to the language for questionable, or at the least legitimately debatable, cause. See earlier on this page for more comparisons of the two overlapping techniques. As shown in more detail in that earlier section, even more complex formatting is essentially a draw in terms of complexity:
'My {1[spam]:<8} runs {0.platform:>8}'.format(sys, {'spam': 'laptop'}) 'My laptop runs win32'
'My %(spam)-8s runs %(platform)8s' % dict(spam='laptop', platform=sys.platform) 'My laptop runs win32'
To be fair, 3.0 does clean up some old problems in the language; things like the xrange function and xreadlines file method have been overdue for attention. But some of 3.0's new ideas seem to be a matter of trading one complicated thing for another complicated thing, for no obvious reason. That may be a classic mistake in software engineering at large, but it's certainly not in the spirit of the Python language.
In the string formatting case, for instance, there is close overlap between the new method, and the current and widely used expression; they are largely variations on a theme. Moreover, the current formatting expression is actually scheduled for eventual deprecation (in other words, outright removal) in a future Python release. Why? It's difficult to not see this as incredibly arbitrary, if not indifferent to the current large user base.
Other 3.0 changes seem either aimed at a minority of Python users, or difficult to justify as a net win. The very small fraction of Python programmers who must care about Unicode, for instance, does not seem to justify morphing all strings to support Unicode in 3.0, at the expense of breaking most code that processes binary files -- a task that is much more common than processing Unicode, in the Python world I have had the opportunity to see (more on this in the next section). The mutation of print from statement to function, though supportive of extra features, is also perceived by many to be controversial; it's not clear that this change's minor benefits warranted breaking nearly every Python program written to date.
Feature concerns in 3.0: string types model
The new string model's impact on language complexity is worth elaborating on here. After studying the new str/bytes types distinction in 3.0, it's becoming clear that one could legitimately argue that this isn't much of a net win either. For the sake of those who care about Unicode, 3.0's string changes make life more complicated for many programmers, and are probably one of the root causes of the I/O slowdown described earlier on this page.
As described elsewhere on this page, Python's string model has changed substantially in 3.0:
- In Python 2.X, there are two string types: str (the primary type, for both normal text and binary data), and unicode (an optional type, for wide-character strings).
- In Python 3.0, developers sought to address this dichotomy, with a single str type (that supports all flavors of text, including wide-character unicode), along with a bytes type (for binary data). An additional_bytearray_ type was added in 3.0 as mutable version of bytes.
Although 3.0's model may simplify some things for the relatively small set of people who need to care about processing non-ASCII Unicode text, the net effect which is impossible not to notice is that we've traded 2 different string types in 2.x_for 2 different string types in 3.0!_ In other words, we've gone from str+unicode to str+bytes, with no obvious reduction in complexity. Again, we seem to have traded one complicated situation for another roughly equivalent complicated situation, and broken many programs in the process.
In fact, the string situation is even more complex in 3.0. The str and bytes types are very incompatible: str and bytes objects functionality does not fully overlap (bytes don't do either flavor of string formatting, for example), and they cannot be intermixed at all without explicit conversions (2.X str and unicode often could). Although users are encouraged to choose str for text data, and bytes for binary data, the choice is something of a more binding one. Moreover, 3.0 even adds a 3rd string type -- bytearray, which is like bytes, but is mutable, supporting the same in-place change operations as lists.
Although the new string functionality model may be useful in some contexts, it's not possible to escape the fact that by trying to simplify the 2.X string story, 3.0 has in fact made it substantially more complex for many. This is an especially curious evolution for a language that's been historically seen as having a low intellectual requirements bar for beginners. Perhaps worse, by pushing Unicode down into the general string type, it is no longer optional knowledge -- we've made understanding its complexity an entry-level requirement for almost every Python programmer, not just the few who actually must deal with Unicode, as is the case in 2.X.
One can argue that those who don't care about Unicode or binary data can stick with "str" strings and default encodings, and get by much the same as they did in 2.X. While that's true for beginners up to a point, the new bytes and bytearray types, the text/binary files distinction, and the basics of Unicode, are effectively elevated to required learning in 3.0 for anyone who wished to truly understand Python, as well as anyone who deals with code written by others (as soon as anyone in an organization uses a feature, it's no longer optional for everyone else). At the least, 3.0 simplifies some things for people who must process Unicode, but made life more complex for everyone else.
And although Unicode does matter in some domains, it's irrelevant to probably 90% of the thousands of Python programmers I've met personally in classes over the years. This isn't a scientific sample, of course; my students come from specific organizations that may or may not be representative. Moreover, I teach largely in the ASCII-biased US, and have taught in Europe only 7 times and never in Asia. Still, when I ask who wants to discuss Unicode in Python classes, it's rare to see more than 1 or 2 students out of 15 respond at all; most understandably don't even know what Unicode is.
Given this interest level, it seems odd to require programmers to come to grips with the basics of Unicode, when most do not need to care. And assuming that the new Unicode support is at the root of 3.0's I/O slowdown too, it seems hardly fair to make Python slower for everyone, in order to address a domain in which only a few work.
Of course, language features are a very subjective topic, and your milage may vary. I recall similar complaints about the seemingly arbitrary nature of some changes in 2.0, which later became standard and best practice. On the other hand, Python had much smaller user and existing-code bases back then than it enjoys today, so such comparisons might not be valid. It remains to be seen whether the much larger and more heavily vested community today will be as flexible and forgiving as the community that existed when 2.0 was released.
For more on the debate over 3.0 language changes, search the web for Python forums. One recent thread on comp.lang.python, for example, dealt with the formatting change specifically. For more on changes in 3.0 in general, see its What's New In Python 3.0document at python.org.
Recommendation: stick with 2.X until 3.X is ready for "prime time"
I want to underscore that some 3.0 concerns discussed above are subjective, and others are likely to be temporary (in fact they may be resolved by the time you read this). Python 3.0 feature changes, for instance, are a controversial subject I won't debate further here, and the shortage of third-party libraries is likely to diminish as 3.0 gains acceptance over time.
Because of the more serious performance issues described above, though, I regrettably must recommend that most programmers stick with the 2.X line in the near term, unless and until an optimized 3.X release becomes available. This is especially true for programmers writing production-level code.
Python 3.0 probably should not have been pushed out the door at all with such poor I/O performance. As is, it's not ready for prime time, and it's certainly not ready to be "the" Python, especially for programs that process high volumes of file data. Since this is such a crucial and core operation, anything short of performance that matches 2.X speed will disqualify 3.X for many products and systems, and will substantially impede its adoption in general. Although it's reasonable to experiment with 3.0 today, don't expect it to supersede 2.X in most domains until this has been ironed out.
Unfortunately (and to be frank, surprisingly), the I/O performance problem has been downgraded to "critical" by Python core developers as I write this, which means it is not considered important enough to block a future release, let alone the next one. Until they do fix this, most programmers will probably treat 3.0 as beta-quality software at best, and as academic curiosity at worst.
Personally, I'm optimistic that the performance problem will be fixed in the near future (probably within a year), given the Python community's track record. Indeed, it's difficult not to see 3.0 as an experimental system, released early in order to attract third-party developers while its performance issues are fixed. Until they are, though, most Python programmers will be better served by the 2.X line.
[Update, July '09]: Again, Python 3.1 addresses most (but not all) of Python 3.0's I/O speed issues described in this section. Please see the section below for up-to-date Python 3.X performance details.
Python 3.X performance issues
Over time, I have posted speed test results for Python 3.0, 3.0.1, and 3.1. Due to their sizes, these test reports have been moved off-page. Please select from the following version-specific links. The first of these links is the most current news, but the 3.0 page provides some test background details that pertain to the later pages, so these may be best read from oldest to newest.
- Python 3.1 Addresses the Issue (Mostly) [July 2009]
- Python 3.0.1 Shows Minor Improvement [February 2009]
- Python 3.0 I/O is Radically Slower [January 2009]
The short story is that Python 3.0 had a major I/O speed regression, which has been largely addressed in Python 3.1. That is, Python 3.1 has optimized away most of the I/O speed regressions of Python 3.0, making this a non-issue for most programmers. Prior to 3.1, though, 3.0.1 speeds up one reading mode, but does not address other I/O speed concerns in 3.0; 3.0 itself is so slow as to be impractical for many I/O-focused programs. With the release of 3.1, the Python 3.X choice has become more about language features and third-party software dependecies, than about performance.
Back to this book's main updates page