What else you need to know — Python-Future documentation (original) (raw)

The following points are important to know about when writing Python 2/3 compatible code.

bytes¶

Handling bytes consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 bytes object is simply an alias for Python 2’s str, rather than a true implementation of the Python 3 bytes object, which is substantially different.

future contains a backport of the bytes object from Python 3 which passes most of the Python 3 tests for bytes. (Seetests/test_future/test_bytes.py in the source tree.) You can use it as follows:

from builtins import bytes b = bytes(b'ABCD')

On Py3, this is simply the builtin bytes object. On Py2, this object is a subclass of Python 2’s str that enforces the same strict separation of unicode strings and byte strings as Python 3’sbytes object:

b + u'EFGH' # TypeError Traceback (most recent call last): File "", line 1, in TypeError: argument can't be unicode string

bytes(b',').join([u'Fred', u'Bill']) Traceback (most recent call last): File "", line 1, in TypeError: sequence item 0: expected bytes, found unicode string

b == u'ABCD' False

b < u'abc' Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: bytes() and <type 'unicode'>

In most other ways, these bytes objects have identical behaviours to Python 3’s bytes:

b = bytes(b'ABCD') assert list(b) == [65, 66, 67, 68] assert repr(b) == "b'ABCD'" assert b.split(b'B') == [b'A', b'CD']

Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...' in abytes() call as follows:

from builtins import bytes

...

b = bytes(b'This is my bytestring')

...

This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.

The bytes type from builtins also provides support for thesurrogateescape error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:

from builtins import bytes b = bytes(b'\xff') b.decode('utf-8', 'surrogateescape') '\udcc3'

This feature is in alpha. Please leave feedback here about whether this works for you.

str¶

The str object in Python 3 is quite similar but not identical to the Python 2 unicode object.

The major difference is the stricter type-checking of Py3’s str that enforces a distinction between unicode strings and byte-strings, such as when comparing, concatenating, joining, or replacing parts of strings.

There are also other differences, such as the repr of unicode strings in Py2 having a u'...' prefix, versus simply '...', and the removal of the str.decode() method in Py3.

future contains a newstr type that is a backport of thestr object from Python 3. This inherits from the Python 2unicode class but has customizations to improve compatibility with Python 3’s str object. You can use it as follows:

from future import unicode_literals from builtins import str

On Py2, this gives us:

str future.types.newstr.newstr

(On Py3, it is simply the usual builtin str object.)

Then, for example, the following code has the same effect on Py2 as on Py3:

s = str(u'ABCD') assert s != b'ABCD' assert isinstance(s.encode('utf-8'), bytes) assert isinstance(b.decode('utf-8'), str)

These raise TypeErrors:

bytes(b'B') in s Traceback (most recent call last): File "", line 1, in TypeError: 'in ' requires string as left operand, not <type 'str'>

s.find(bytes(b'A')) Traceback (most recent call last): File "", line 1, in TypeError: argument can't be <type 'str'>

Various other operations that mix strings and bytes or other types are permitted on Py2 with the newstr class even though they are illegal with Python 3. For example:

s2 = b'/' + str('ABCD') s2 '/ABCD' type(s2) future.types.newstr.newstr

This is allowed for compatibility with parts of the Python 2 standard library and various third-party libraries that mix byte-strings and unicode strings loosely. One example is os.path.join on Python 2, which attempts to add the byte-string b'/' to its arguments, whether or not they are unicode. (See posixpath.py.) Another example is theescape() function in Django 1.4’s django.utils.html.

In most other ways, these builtins.str objects on Py2 have the same behaviours as Python 3’s str:

s = str('ABCD') assert repr(s) == 'ABCD' # consistent repr with Py3 (no u prefix) assert list(s) == ['A', 'B', 'C', 'D'] assert s.split('B') == ['A', 'CD']

The str type from builtins also provides support for thesurrogateescape error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:

from builtins import str s = str(u'\udcff') s.encode('utf-8', 'surrogateescape') b'\xff'

This feature is in alpha. Please leave feedback here about whether this works for you.

dict¶

Python 3 dictionaries have .keys(), .values(), and .items()methods which return memory-efficient set-like iterator objects, not lists. (See PEP 3106.)

If your dictionaries are small, performance is not critical, and you don’t need the set-like behaviour of iterator objects from Python 3, you can of course stick with standard Python 3 code in your Py2/3 compatible codebase:

Assuming d is a native dict ...

for key in d: # code here

for item in d.items(): # code here

for value in d.values(): # code here

In this case there will be memory overhead of list creation on Py2 for each call to items, values or keys.

For improved efficiency, future.builtins (aliased to builtins) provides a Python 2 dict subclass whose keys(), values(), anditems() methods return iterators on all versions of Python >= 2.7. On Python 2.7, these iterators also have the same set-like view behaviour as dictionaries in Python 3. This can streamline code that iterates over large dictionaries. For example:

from future import print_function from builtins import dict, range

Memory-efficient construction:

d = dict((i, i2) for i in range(107))

assert not isinstance(d.items(), list)

Because items() is memory-efficient, so is this:

d2 = dict((v, k) for (k, v) in d.items())

As usual, on Python 3 dict imported from either builtins orfuture.builtins is just the built-in dict class.

Memory-efficiency and alternatives¶

If you already have large native dictionaries, the downside to wrapping them in a dict call is that memory is copied (on both Py3 and on Py2). For example:

This allocates and then frees a large amount of temporary memory:

d = dict({i: i2 for i in range(107)})

If dictionary methods like values and items are called only once, this obviously negates the memory benefits offered by the overridden methods through not creating temporary lists.

The memory-efficient (and CPU-efficient) alternatives are:

to construct a dictionary from an iterator. The above line could use a generator like this:
d = dict((i, i2) for i in range(107))
to construct an empty dictionary with a dict() call usingbuiltins.dict (rather than {}) and then update it;
to use the viewitems etc. functions from future.utils, passing in regular dictionaries:
from future.utils import viewkeys, viewvalues, viewitems
for (key, value) in viewitems(hugedictionary):
some code here

Set intersection:

d = {i**2: i for i in range(1000)}
both = viewkeys(d) & set(range(0, 1000, 7))

Set union:

both = viewvalues(d1) | viewvalues(d2)

For compatibility, the functions iteritems etc. are also available infuture.utils. These are equivalent to the functions of the same names insix, which is equivalent to calling the iteritems etc. methods on Python 2, or to calling items etc. on Python 3.

int¶

Python 3’s int type is very similar to Python 2’s long, except for the representation (which omits the L suffix in Python 2). Python 2’s usual (short) integers have been removed from Python 3, as has thelong builtin name.

Python 3:

2**64 18446744073709551616

Python 2:

2**64 18446744073709551616L

future includes a backport of Python 3’s int that is a subclass of Python 2’s long with the same representation behaviour as Python 3’s int. To ensure an integer is long compatibly with both Py3 and Py2, cast it like this:

from builtins import int must_be_a_long_integer = int(1234)

The backported int object helps with writing doctests and simplifies code that deals with long and int as special cases on Py2. An example is the following code from xlwt-future (called by the xlwt.antlr.BitSet class) for writing out Excel .xls spreadsheets. With future, the code is:

from builtins import int

def longify(data): """ Turns data (an int or long, or a list of ints or longs) into a list of longs. """ if not data: return [int(0)] if not isinstance(data, list): return [int(data)] return list(map(int, data))

Without future (or with future < 0.7), this might be:

def longify(data): """ Turns data (an int or long, or a list of ints or longs) into a list of longs. """ if not data: if PY3: return [0] else: return [long(0)] if not isinstance(data,list): if PY3: return [int(data)] else: return [long(data)] if PY3: return list(map(int, data)) # same as returning data, but with up-front typechecking else: return list(map(long, data))

isinstance¶

The following tests all pass on Python 3:

assert isinstance(262, int) assert isinstance(263, int) assert isinstance(b'my byte-string', bytes) assert isinstance(u'unicode string 1', str) assert isinstance('unicode string 2', str)

However, two of these normally fail on Python 2:

assert isinstance(2**63, int) Traceback (most recent call last): File "", line 1, in AssertionError

assert isinstance(u'my unicode string', str) Traceback (most recent call last): File "", line 1, in AssertionError

And if this import is in effect on Python 2:

from future import unicode_literals

then the fifth test fails too:

assert isinstance('unicode string 2', str) Traceback (most recent call last): File "", line 1, in AssertionError

After importing the builtins from future, all these tests pass on Python 2 as on Python 3:

from builtins import bytes, int, str

assert isinstance(10, int) assert isinstance(10**100, int) assert isinstance(b'my byte-string', bytes) assert isinstance(u'unicode string 1', str)

However, note that the last test requires that unicode_literals be imported to succeed.:

from future import unicode_literals assert isinstance('unicode string 2', str)

This works because the backported types int, bytes and str(and others) have metaclasses that override __instancecheck__. See PEP 3119for details.

Passing data to/from Python 2 libraries¶

If you are passing any of the backported types (bytes, int, dict, ``str) into brittle library code that performs type-checks using type(), rather than isinstance(), or requires that you pass Python 2’s native types (rather than subclasses) for some other reason, it may be necessary to upcast the types from future to their native superclasses on Py2.

The native function in future.utils is provided for this. Here is how to use it. (The output showing is from Py2):

from builtins import int, bytes, str from future.utils import native

a = int(10**20) # Py3-like long int a 100000000000000000000 type(a) future.types.newint.newint native(a) 100000000000000000000L type(native(a)) long

b = bytes(b'ABC') type(b) future.types.newbytes.newbytes native(b) 'ABC' type(native(b)) str

s = str(u'ABC') type(s) future.types.newstr.newstr native(s) u'ABC' type(native(s)) unicode

On Py3, the native() function is a no-op.

Native string type¶

Some library code, include standard library code like the array.array()constructor, require native strings on Python 2 and Python 3. This means that there is no simple way to pass the appropriate string type when theunicode_literals import from __future__ is in effect.

The objects native_str and native_bytes are available infuture.utils for this case. These are equivalent to the str andbytes objects in __builtin__ on Python 2 or in builtins on Python 3.

The functions native_str_to_bytes and bytes_to_native_str are also available for more explicit conversions.

open()¶

The Python 3 builtin open() function for opening files returns file contents as (unicode) strings unless the binary (b) flag is passed, as in:

in which case its methods like read() return Py3 bytes objects.

On Py2 with future installed, the builtins module provides anopen function that is mostly compatible with that on Python 3 (e.g. it offers keyword arguments like encoding). This maps to the open backport available in the standard library io module on Py2.7.

One difference to be aware of between the Python 3 open andfuture.builtins.open on Python 2 is that the return types of methods such as read() from the file object that open returns are not automatically cast from native bytes or unicode strings on Python 2 to the corresponding future.builtins.bytes or future.builtins.str types. If you need the returned data to behave the exactly same way on Py2 as on Py3, you can cast it explicitly as follows:

from future import unicode_literals from builtins import open, bytes

data = open('image.png', 'rb').read()

On Py2, data is a standard 8-bit str with loose Unicode coercion.

data + u'' would likely raise a UnicodeDecodeError

data = bytes(data)

Now it behaves like a Py3 bytes object...

assert data[:4] == b'\x89PNG' assert data[4] == 13 # integer

Raises TypeError:

data + u''

Custom str methods¶

If you define a custom __str__ method for any of your classes, functions like print() expect __str__ on Py2 to return a byte string, whereas on Py3 they expect a (unicode) string.

Use the following decorator to map the __str__ to __unicode__ on Py2 and define __str__ to encode it as utf-8:

from future.utils import python_2_unicode_compatible

@python_2_unicode_compatible class MyClass(object): def str(self): return u'Unicode string: \u5b54\u5b50' a = MyClass()

This then prints the name of a Chinese philosopher:

print(a)

This decorator is identical to the decorator of the same name indjango.utils.encoding.

This decorator is a no-op on Python 3.

Custom iterators¶

If you define your own iterators, there is an incompatibility in the method name to retrieve the next item across Py3 and Py2. On Python 3 it is __next__, whereas on Python 2 it is next.

The most elegant solution to this is to derive your custom iterator class frombuiltins.object and define a __next__ method as you normally would on Python 3. On Python 2, object then refers to thefuture.types.newobject base class, which provides a fallback nextmethod that calls your __next__. Use it as follows:

from builtins import object

class Upper(object): def init(self, iterable): self._iter = iter(iterable) def next(self): # Py3-style iterator interface return next(self._iter).upper() def iter(self): return self

itr = Upper('hello') assert next(itr) == 'H' assert next(itr) == 'E' assert list(itr) == list('LLO')

You can use this approach unless you are defining a custom iterator as a subclass of a base class defined elsewhere that does not derive fromnewobject. In that case, you can provide compatibility across Python 2 and Python 3 using the next function from future.builtins:

from builtins import next

from some_module import some_base_class

class Upper2(some_base_class): def init(self, iterable): self._iter = iter(iterable) def next(self): # Py3-style iterator interface return next(self._iter).upper() def iter(self): return self

itr2 = Upper2('hello') assert next(itr2) == 'H' assert next(itr2) == 'E'

next() also works with regular Python 2 iterators with a .next method:

itr3 = iter(['one', 'three', 'five']) assert 'next' in dir(itr3) assert next(itr3) == 'one'

This approach is feasible whenever your code calls the next() function explicitly. If you consume the iterator implicitly in a for loop orlist() call or by some other means, the future.builtins.next function will not help; the third assertion below would fail on Python 2:

itr2 = Upper2('hello')

assert next(itr2) == 'H' assert next(itr2) == 'E' assert list(itr2) == list('LLO') # fails because Py2 implicitly looks # for a next method.

Instead, you can use a decorator called implements_iterator fromfuture.utils to allow Py3-style iterators to work identically on Py2, even if they don’t inherit from future.builtins.object. Use it as follows:

from future.utils import implements_iterator

Upper2 = implements_iterator(Upper2)

print(list(Upper2('hello')))

prints ['H', 'E', 'L', 'L', 'O']

This can of course also be used with the @ decorator syntax when defining the iterator as follows:

@implements_iterator class Upper2(some_base_class): def init(self, iterable): self._iter = iter(iterable) def next(self): # note the Py3 interface return next(self._iter).upper() def iter(self): return self

On Python 3, as usual, this decorator does nothing.

Binding a method to a class¶

Python 2 draws a distinction between bound and unbound methods, whereas in Python 3 this distinction is gone: unbound methods have been removed from the language. To bind a method to a class compatibly across Python 3 and Python 2, you can use the bind_method() helper function:

from future.utils import bind_method

class Greeter(object): pass

def greet(self, message): print(message)

bind_method(Greeter, 'greet', greet)

g = Greeter() g.greet('Hi!')

On Python 3, calling bind_method(cls, name, func) is equivalent to calling setattr(cls, name, func). On Python 2 it is equivalent to:

import types setattr(cls, name, types.MethodType(func, None, cls))

Metaclasses¶

Python 3 and Python 2 syntax for metaclasses are incompatible.future provides a function (from jinja2/_compat.py) calledwith_metaclass() that can assist with specifying metaclasses portably across Py3 and Py2. Use it like this:

from future.utils import with_metaclass

class BaseForm(object): pass

class FormType(type): pass

class Form(with_metaclass(FormType, BaseForm)): pass