Learning Python 4th Edition: Recent Clarifications (original) (raw)

Below are recent book clarifications: notes which provide additional coverage of language topics, and are intended as supplements to the book. The items on this page were posted after the first reprint of this book, which was dated January 2010. Any book changes they propose do not appear in either the first printing or the first reprint.

To make this page's content easier to apply in reprints, I've divided its notes into 2 sections--those that merit book changes, and those that do not. These lists are ordered in no particular way, though mostly by page number and/or date of addition.

Also note that much of what follows was later incorporated into 2013's newer 5th Edition of this book.

See also the older clarificationspage for items already patched in reprints, and the corrections pages for genuine book errata. And for more book-related topics, see also the book's notes pages; in general, I put less technically in-depth notes on that page.

Update, 10/30/2011: I've stopped adding trivial clarifications made in reprints to the list below, because the redundancy proved too complex to manage. For all items recently patched in reprints, including a few additional clarifications not listed here, please seethe confirmed errata list at O'Reilly's site, sorted by submission date.


radd.py

from future import print_function # for 2.7
class C:
def init(self, value):
self.data = value
def add(self, other):
print(self, '+', other, '=', end=' ')
return self.data + other
radd = add
def str(self):
return '[%s]' % self.data
x = C(1)
y = C(2)
print(x + 3) # [1] + 3 => left: add
print(3 + y) # 3 + [2] => right: radd==add
print(x + y) # [1] + [2] => both: add, then radd==add
When run, this code's print calls trace how every call is routed into the single __add__ method, with operands swapped for right-side appearances:
...>C:\Python32\python radd.py
[1] + 3 = 4
[2] + 3 = 5
[1] + [2] = [2] + 1 = 3
...>C:\Python27\python radd.py
[1] + 3 = 4
[2] + 3 = 5
[1] + [2] = [2] + 1 = 3
There's no reason to define __radd__ separately as shown in the book's brief call-tracing examples, unless right-side appearances require special-case processing. For instance, consider the book's second Commuter class example:
class Commuter: # Propagate class type in results
def init(self, val):
self.val = val
def add(self, other):
if isinstance(other, Commuter): other = other.val
return Commuter(self.val + other)
def radd(self, other):
return Commuter(other + self.val)
def str(self):
return '' % self.val
This class works the same if it simply assigns __radd__ to __add__, though it must still do some type testing to avoid nesting Commuter objects in expression results (comment-out the "if" to see why):
class Commuter: # Propagate class type in results
def init(self, val):
self.val = val
def add(self, other):
if isinstance(other, Commuter): other = other.val
return Commuter(self.val + other)
radd = add
def str(self):
return '' % self.val
Trace this to see why the equivalence works. The book's examples are designed to trace calls or illustrate concepts, of course, but they could use simpler patterns in real code.
Also notice that it's possible to achieve a similar effect by adding in reverse -- the following works the same as the former -- but name aliasing by simple assignment is more direct and does not incur an extra call and operation:
class C:
def add(self, other):
....
def radd(self, other): # other + self (radd) => self + other (add)
return self + other # but radd = add more direct and quick





  1. Page 896, end of second last paragraph: Unicode -- clarify impacts (new sentence)
    At the very end of the paragraph which begins "Even if you fall into", add a new last sentence which reads: "Though applications are beyond our scope here, especially if you work with the Internet, files, directories, network interfaces, databases, pipes, and even GUIs, Unicode may no longer be an optional topic for you in Python 3.X."
    I'm adding this because the existing text seems a bit misleading, after seeing firsthand how much Unicode permeates 3.X applications work. See this notefor related discussion. Reprints: delete the first clause of this new sentence of it won't fit as is; it looks like there is plenty of room.
    (This and 6 other Unicode items on this page arose from a recent reread of the Unicode chapter a year after writing it; it's fine as is, but a few key concepts could be polished with simple inserts in the next printing.)
  2. Page 901, start of 1st paragraph on page: Unicode -- same policy for read, write (reword)
    The start of this paragraph seems potentially misleading in retrospect--it's not clear if writes work the same as reads. This is clarified later on (see page 920 and later), but it may be worth tightening up here.
    Change: "When a file is opened in text mode, reading its data automatically decodes its content (per a platform default or a provided encoding name) and returns it as a str; writing takes a str and automatically encodes it before transferring it to the file."
    to read as this (the parenthesized part has been pulled out): "When a file is opened in text mode, reading its data automatically decodes its content and returns it as a str; writing takes a str and automatically encodes it before transferring it to the file. Both reads and writes translate per a platform default or a provided encoding name."
  3. Page 936, last sentence of page: Unicode -- mention filename tools too (new text)
    Change the last part of the text: "For more details on re, struct, pickle, and XML tools in general, consult" to read: "For more details on re, struct, pickle, and XML, as well as the impacts of Unicode on other library tools such as filename expansion and directory walkers, consult".
    The section here dealing with tools impacted by Unicode could also have mentioned that os.listdir returns decoded Unicode str for str arguments, and encoded raw binary bytes for bytes arguments, in order to handle undecodable filenames. In short, pass in the directory name as a bytes object to suppress Unicode decoding of filenames per the platform default, or else an exception is raised if any filenames fail to decode. Passing in a str invokes Unicode filename decoding on platforms where this matters.
    By proxy, os.walk and glob.glob work the same way, because they use os.listdir internally to generate filenames in directories. This was omitted here because the section already encroaches on the language/applications line. Instead, the impacts of Unicode on these and other tools are covered in depth in the new 4th Edition of Programming Python, where application topics are collected in general.

  1. Page 898: Unicode -- mention UTF-16 and UTF-32 in intro (new text)
    Near the end of the second last paragraph on this page, expand the start of the second last line by adding the parenthesized text in the following, to read: "sets in similar ways (e.g., UTF-16 and UTF-32 format strings with 2 and 4 bytes per each character, respectively), but all of these". This is implied by later UTF-16 examples, but UTF-16 is so common on Windows now that it merits a word here.
  2. Page 899: Unicode -- bytes is for encoded str too (new text)
    At the second bullet item in the second bullet list on this page, add the following text in parenthesis at the end, so that the bullet item reads: "* bytes for representing binary data (including encoded text)". This is shown and implied in later examples, but this seems like a key link concept.
  3. Page 900: Unicode -- internal str format (new footnote)
    I avoided internals discussion in this chapter on purpose, using terms such as "character" instead, but in retrospect some readers might find a more tangible model useful too. Add a footnote at the bottom of page 900, with its star at the very end of the last paragraph before header "Text and Binary Files", which reads:
    "It may help to know that Python internally stores decoded strings in UTF-16 (roughly, UCS-2) format, with 2 bytes per character (a.k.a. Unicode "code pont"), unless compiled for 4 bytes/character. Encoded text is always translated to and from this internal string form, in which text processing occurs.".
    Reprints: if this doesn't fit at the bottom of this page as is, please ask me how it could be shortened
  4. Page 909: Unicode -- "conversion" means encoding differently (new sentence)
    At the very end of the last paragraph on this page, add the following new sentence: "Either way, note that "conversion" here really just means encoding a text string to raw bytes per a different encoding scheme; decoded text has no encoding type, and is simply a string of Unicode code points (a.k.a. characters) in memory.".



  1. Does not search a package's own directory when it's used in package mode unless "from ." package-relative syntax is used, and
  2. Does not allow "from ." syntax to be used unless the importer is being used as part of a package,

you can no longer directly create directories that serve as both standalone programs and importable packages--because import syntax can vary per usage mode, importers in such directories may need to pick between package relative import syntax (and assume use as package only) or normal import syntax (and assume non-package usage only). The workarounds are as follows:
1. Always use fully specified "dir.dir.mod" absolute package imports instead of "from ." package relative imports,
2. Specialize your import statements according to their usage context (package or program) by testing __name__,
3. Add the package's directory to the sys.path module search path directly, or
4. Move all files meant to be visible outside a directory into a nested subdirectory package so they are always used in package mode

The latter may be the ultimate solution, but it implies substantial program restructuring for existing code meant to be used as both program and importable library. This cropped up in multiple cases in the PP4E book, but as a simple case, the PyEdit text editor is meant to be both run standalone, but also to be imported as attachable component classes. Since this system is nested in the PP4E package, it is referenced with absolute package import syntax by clients outside the package:
from PP4E.Gui.TextEditor import textEditor # component and pop up
In Python 2.X, PyEdit's own files imported files in its own directory with simple imports, relying on 2.X's implied package directory relative imports model:
import textConfig # startup font and colors
This worked in 2.X for both package and top-level program usage modes. However, unless this module is also located elsewhere on the import search path, this fails for package-mode in 3.X because the package directory itself is no longer searched. Simply using package-relative imports:
from . import textconfig
suffices when PyEdit is imported externally, but then fails when it is run standalone, because "from ." is allowed only for code being used as a package. To workaround for cases where the text config file had to be imported from the package directory, I specialized the imports per usage mode:
if name == 'main':
from textConfig import ( # my dir is on the path
opensAskUser, opensEncoding,
savesUseKnownEncoding, savesAskUser, savesEncoding)
else:
from .textConfig import ( # always from this package
opensAskUser, opensEncoding,
savesUseKnownEncoding, savesAskUser, savesEncoding)
Other cases instead run a top-level script one level up from the package subdirectory to avoid the conflict. Restructuring PyEdit as a top-level script plus a package subdirectory may be arguably better, but seems like too much of a change to existing code just to accomodate the new model. Moreover using full absolute paths from the PP4E root in every import seems to be overkill in the cases I observed, and is prone to requiring updates if directories are moved.
I'm not sure if such a dual program/library role was taken into account in the 3.X inter-package import model change (indeed, package-relative import semantics is being discussed anew on the Python developers list as I write this note), but it seems to be a primary casualty.


from test import C
X = C(99)
X.spam()
99

X
<test.C object at 0x02695310>

import pickle
pickle.dump(X, open('test.pkl', 'wb'))
pickle.load(open('test.pkl', 'rb'))
<test.C object at 0x02695350>

Y = pickle.load(open('test.pkl', 'rb'))
Y.spam()
99
As described in the book, bound methods allow us to treat an instance's methods as though they were simple callable functions -- especially useful in callback-based code such as GUIs to implement functions with state to be used while processing an event (see Pages 729-730 and the sidebar on Page 758 for more on this bound method role, as well as its __call__ alternative coding):
X
<test.C object at 0x02695310>
X.spam()
99

X.spam
<bound method C.spam of <test.C object at 0x02695310>>

T = X.spam
T()
99
You won't be able to pickle bound (or unbound) methods directly, though, which precludes using them in roles such as persistently saved or transferred callback handlers without extra steps on unpickles:
pickle.dump(X.spam, open('test.pkl', 'wb'))
Traceback (most recent call last):
...more...
_pickle.PicklingError: Can't pickle <class 'method'>: attribute lookup builtins.method failed
pickle.dump(C.spam, open('test1.pkl', 'wb'))
Traceback (most recent call last):
...more...
_pickle.PicklingError: Can't pickle <class 'function'>: attribute lookup builtins.function failed
Of course, pickling things like bound method callback handlers may not work in some cases anyhow, because the instance may contain state information that is valid in the pickling process only; references to GUI objects in callback handlers, for example, are likely invalid in an unpickling program. Unpickled state information might be less transient in other applications.
I'm not marking this as a book update because this book doesn't go into this level of detail on pickling. See Programming Python and Python's Library Manual for more on pickle, as well as the related shelvemodule which adds access to objects by key. As described elsewhere, there is additional pickler protocol for providing and restoring object state which may prove useful in this case. For instance, the pickler's __getstate__ and __setstate__ methods can be used for purposes such as reopening files on unpickling, and might be used to recreate a bound method when loading a pickled instance of a suitable wrapper class.


or return a different object, to support multiple active iterations:
class C:
def iter(self, ...):
return Citer(state)
class Citer:
def init(self, ...):
...configure state
def next(self):
...use state
...return next or raise StopIteration
This part of the book also compares such classes to generator functions and expressions, as well as simple list comprehensions, to show how the classes better support state retention and minimize memory requirements. Though not shown explicitly in the book, as implied directly by its coverage of generator functions on Pages 492-505 it's also possible to achieve similar effects by yielding values from the __iter__ method itself:
class C:
def iter(self, ...): # iter returns obj with next
...configure state # yield makes this a generator
for loop...: # generators make objs with next
yield next # return raises StopIteration
This technique works too, but seems like too deep magic to me. To understand this at all, you need to know two very implicit things:
* First, that __iter__ is invoked as a first step in iteration, and must return an object with a __next__ method (next in 2.X) to be called on each iteration. This is the iteration protocol in general, discussed in multiple places in the book; see the two iteration chapters especially.
* Second, that this coding scheme only works because calling a generator function (a def statement containing a yield statement) automatically creates and returns an iterable object which has an internally created __next__ method, which automatically raises StopIteration on returns. This is the definition of generator functions, discussed in detail on Pages 492-505.

In other words, this sort of __iter__ does return an object with a __next__ to be run later too, but only because that's what generator functions do automatically when they are first called. The combined effect is therefore the same as explicitly returning an object with an explicit __next__ method as in the book's examples, but there seems a magic multiplier factor at work here which makes the yield-based scheme substantially more obscure.
I would even suggest that this qualifies the __iter__/yield scheme as non-Pythonic, at least by that term's original conception. Among other things, it soundly violates Python's longstanding EIBTI motto -- for Explicit is better than implicit, the second rule listed by the "import this" statement of Python's underlying philosophies. (Run this command yourself at an interactive Python prompt to see what I mean; it's as formal a collection of goals and values as Python has.)
Of course, the Python world and time are the final judges on such matters. Moreover, one could credibly argue that the very meaning of the term Pythonic has been modified in recent years to incorporate much more feature redundancy and implicit magic than it originally did. Consider the growing prominence of scope closure state retention in recent Python code, instead of traditional and explicit object attributes. The __iter__/yield iterator coding scheme is ultimately based on the former and more implicit of these, and reflects a growing shift in the language from object-oriented towards functional programming patterns.
All of which is to me really just another instance of a general property I've observed often in the last two decades: Despite their many advantages, open source projects like Python sometimes seem to stand for no more than what their current crop of developers finds interesting. Naturally, whether you find that asset, liability, or both is up to you to decide.
As a rule, though, and as underscored often in the book, code like this that requires the next programmer to experience "moments of great clarity" is probably less than ideal from a typical software lifecycle perspective. Academically interesting though such examples may be, magic and engineering do not generally mix very well in practice.

def reload_all(*args):
transitive_reload(args, {})
if name == 'main':
import reloadall # Test code: reload myself
reload_all(reloadall) # Should reload this, types
Also keep in mind that both this and the original reload only modules which were loaded with "import" statements; since names copied with "from" statements do not cause a module to be nested in the importer's namespace, their containing module is not reloaded. Handling "from" importers may require either source code analysis, or customization of the __import__ operation.
If the recursion used in this example is confusing, see the discussion of recursive functions in the advanced function topics of Chapter 19; here is a simple example which demonstrates the technique:
>>> def countdown(N):
if N == 0:
print('stop') # 2.X: print 'stop'
else:
print(N, end=' ') # 2.X: print N,
countdown(N-1)

>>> countdown(20)
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 stop
For more on Python recursion, see also the recursive stack limit tools in the sys module (Python has a fixed depth limit on function calls, which you increase for pathologically deep recursive use cases):
>>> import sys
>>> help(sys.setrecursionlimit)
Help on built-in function setrecursionlimit in module sys:
setrecursionlimit(...)
setrecursionlimit(n)

Set the maximum depth of the Python interpreter stack to n. This
limit prevents infinite recursion from causing an overflow of the C
stack and crashing Python. The highest possible limit is platform-
dependent.
>>> sys.getrecursionlimit()
1000






else:
print('Not found')
print('Ni' if [item for item in x if match(item)] else 'Not found')
print('Ni' if list(filter(match, x)) else 'Not found')
print('Ni' if any(item for item in x if match(item)) else 'Not found')
print('Ni' if any(filter(match, x)) else 'Not found')
Try running these on your own to see what I mean. Despite their conciseness, the downside of some of the latter of these is that they may wind up calling match() more times than required (for items after a match)--possibly costly if match() is expensive to run.




  1. (self.name) fetches the value of a "name" attribute from whatever object variable "self" happens to reference.
  2. ((self.name).upper) then fetches the value of an "upper" attribute from whatever object was returned by step 1.
  3. ((self.name).upper)() finally calls the function object that "upper" is assumed to reference, with no arguments.

The net effect is that "self" references an object, whose "name" attribute references a (string) object, whose "upper" attribute references a (callable) object. It's object nesting; in general, that's what class instance state information always is--nested objects assigned to instance attibutes.
And that's why it works to pass "x" to this function directly: "x" is a class instance object, whose "name" attribute references a string object with an "upper"; "x" has no "upper" attribute itself. The "self" function argument is just a reference to the same object referenced by vatiable "x", whether the function is attached to a class or not.

CardHolder('11111111', '25', 3, '44')
in setName
<test.CardHolder object at 0x01410830>
The setter is called from __init__ when the instance is first created and the attribute is assigned, under both Python 3.X and 2.X. Also make sure that you derive the class from "object" under 2.X to make it a new-style class. As explained earlier in this chapter (and in Chapter 31), property setters don't quite work under 2.X without including "object" in the superclass list; once an attribute name is mistakenly assigned directly on an instance, it hides the property getter in the class too (perhaps this was the entire issue here?):
class CardHolder(object): # required in 2.X
With this change results under 2.6 and 3.1 are identical. You'll also need to use 2.X-style print statements or a from __future__ for 3.X-style print calls, of course; see earlier in the book for print() in 2.X:
from future import print_function
The other oddness in this example (which is covered earlier in the book but perhaps not explained as explicitly for this example itself as it could have been) is that names beginning with 2 underscores like "__name" are pseudo-private attributes: Python expands them to include the enclosing class's name, in order to localize them to the creating class. They are used intentionally here to avoid clashing with the real attribute names such as "name" that are part of the class's external client API. Python mangles each in-class appearance of the attribute like this:
__name ...becomes... _CardHolder__name
The single underscore naming pattern "_name" used elsewhere in this chapter is a weaker convention that informally attempts to avoid name collisions, but "__name" truly forces the issue, and is especially useful for classes like this one which manage attribute access but also need to record real state information in the instance. Clients use "name" (the property), and the expanded version of "__name" (the data) where state is actually stored is more or less hidden from them. Moreover, unlike "_name", it won't clash with other normal instance attributes if this class is later extended by a subclass.