Issue 905389: str.join() intercepts TypeError raised by iterator (original) (raw)

For str.join(), if it is passed an iterator and that iterator raises a TypeError, that exception is caught by the join method and replaced by its own TypeError exception. SyntaxError and IndexError exceptions are uneffected.

Example:

Python 2.3.3 (#51, Dec 18 2003, 20:22:39) [MSC v.1200 32 bit (Intel)] on win32 ... IDLE 1.0.2

def gen(n): if not isinstance(n, int): raise TypeError, "gen() TypeError" if n<0: raise IndexError, "gen() IndexError" for i in range(n): yield str(i)

''.join(gen(5)) '01234' ''.join(gen(-1))

Traceback (most recent call last): File "<pyshell#9>", line 1, in -toplevel- ''.join(gen(-1)) File "<pyshell#7>", line 5, in gen raise IndexError, "gen() IndexError" IndexError: gen() IndexError

''.join(gen(None))

Traceback (most recent call last): File "<pyshell#10>", line 1, in -toplevel- ''.join(gen(None)) TypeError: sequence expected, generator found

Logged In: YES user_id=113328

Unicode objects do not have this behaviour. For example:

u''.join(gen(None)) Traceback (most recent call last): File "", line 1, in ? File "", line 3, in gen TypeError: gen() TypeError

The offending code is at line 1610 or so of stringobject.c. The equivalent Unicode code starts at line 3955 of unicodeobject.c.

The string code does a 2-pass approach to calculate the size of the result, allocate space, and then build the value. The Unicode version resizes as it goes along. This may be a significant speed optimisation (on the assumption that strings are more commonly used than Unicode objects), but I can't test (no MSVC7 to build with).

If the speed issue is not significant, I'd recommend rewriting the string code to use the same approach the Unicode code uses. Otherwise, the documentation for str.join should clarify these points:

  1. The sequence being joined is materialised as a tuple (PySequence_Fast) - this may have an impact on generators which use a lot of memory.
  2. TypeErrors produced by materialising the sequence being joined will be caught and re-raised with a different message.