What else you need to know — Python-Future documentation (original) (raw)
The following points are important to know about when writing Python 2/3 compatible code.
bytes¶
Handling bytes
consistently and correctly has traditionally been one of the most difficult tasks in writing a Py2/3 compatible codebase. This is because the Python 2 bytes object is simply an alias for Python 2’s str, rather than a true implementation of the Python 3 bytes object, which is substantially different.
future
contains a backport of the bytes
object from Python 3 which passes most of the Python 3 tests for bytes
. (Seetests/test_future/test_bytes.py
in the source tree.) You can use it as follows:
from builtins import bytes b = bytes(b'ABCD')
On Py3, this is simply the builtin bytes object. On Py2, this object is a subclass of Python 2’s str that enforces the same strict separation of unicode strings and byte strings as Python 3’sbytes object:
b + u'EFGH' # TypeError Traceback (most recent call last): File "", line 1, in TypeError: argument can't be unicode string
bytes(b',').join([u'Fred', u'Bill']) Traceback (most recent call last): File "", line 1, in TypeError: sequence item 0: expected bytes, found unicode string
b == u'ABCD' False
b < u'abc' Traceback (most recent call last): File "", line 1, in TypeError: unorderable types: bytes() and <type 'unicode'>
In most other ways, these bytes objects have identical behaviours to Python 3’s bytes:
b = bytes(b'ABCD') assert list(b) == [65, 66, 67, 68] assert repr(b) == "b'ABCD'" assert b.split(b'B') == [b'A', b'CD']
Currently the easiest way to ensure identical behaviour of byte-strings in a Py2/3 codebase is to wrap all byte-string literals b'...'
in abytes()
call as follows:
from builtins import bytes
...
b = bytes(b'This is my bytestring')
...
This is not perfect, but it is superior to manually debugging and fixing code incompatibilities caused by the many differences between Py3 bytes and Py2 strings.
The bytes type from builtins also provides support for thesurrogateescape
error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:
from builtins import bytes b = bytes(b'\xff') b.decode('utf-8', 'surrogateescape') '\udcc3'
This feature is in alpha. Please leave feedback here about whether this works for you.
str¶
The str object in Python 3 is quite similar but not identical to the Python 2 unicode
object.
The major difference is the stricter type-checking of Py3’s str
that enforces a distinction between unicode strings and byte-strings, such as when comparing, concatenating, joining, or replacing parts of strings.
There are also other differences, such as the repr
of unicode strings in Py2 having a u'...'
prefix, versus simply '...'
, and the removal of the str.decode()
method in Py3.
future
contains a newstr
type that is a backport of thestr
object from Python 3. This inherits from the Python 2unicode
class but has customizations to improve compatibility with Python 3’s str object. You can use it as follows:
from future import unicode_literals from builtins import str
On Py2, this gives us:
str future.types.newstr.newstr
(On Py3, it is simply the usual builtin str object.)
Then, for example, the following code has the same effect on Py2 as on Py3:
s = str(u'ABCD') assert s != b'ABCD' assert isinstance(s.encode('utf-8'), bytes) assert isinstance(b.decode('utf-8'), str)
These raise TypeErrors:
bytes(b'B') in s Traceback (most recent call last): File "", line 1, in TypeError: 'in ' requires string as left operand, not <type 'str'>
s.find(bytes(b'A')) Traceback (most recent call last): File "", line 1, in TypeError: argument can't be <type 'str'>
Various other operations that mix strings and bytes or other types are permitted on Py2 with the newstr
class even though they are illegal with Python 3. For example:
s2 = b'/' + str('ABCD') s2 '/ABCD' type(s2) future.types.newstr.newstr
This is allowed for compatibility with parts of the Python 2 standard library and various third-party libraries that mix byte-strings and unicode strings loosely. One example is os.path.join
on Python 2, which attempts to add the byte-string b'/'
to its arguments, whether or not they are unicode. (See posixpath.py
.) Another example is theescape()
function in Django 1.4’s django.utils.html
.
In most other ways, these builtins.str
objects on Py2 have the same behaviours as Python 3’s str:
s = str('ABCD') assert repr(s) == 'ABCD' # consistent repr with Py3 (no u prefix) assert list(s) == ['A', 'B', 'C', 'D'] assert s.split('B') == ['A', 'CD']
The str type from builtins also provides support for thesurrogateescape
error handler on Python 2.x. Here is an example that works identically on Python 2.x and 3.x:
from builtins import str s = str(u'\udcff') s.encode('utf-8', 'surrogateescape') b'\xff'
This feature is in alpha. Please leave feedback here about whether this works for you.
dict¶
Python 3 dictionaries have .keys()
, .values()
, and .items()
methods which return memory-efficient set-like iterator objects, not lists. (See PEP 3106.)
If your dictionaries are small, performance is not critical, and you don’t need the set-like behaviour of iterator objects from Python 3, you can of course stick with standard Python 3 code in your Py2/3 compatible codebase:
Assuming d is a native dict ...
for key in d: # code here
for item in d.items(): # code here
for value in d.values(): # code here
In this case there will be memory overhead of list creation on Py2 for each call to items
, values
or keys
.
For improved efficiency, future.builtins
(aliased to builtins
) provides a Python 2 dict
subclass whose keys()
, values()
, anditems()
methods return iterators on all versions of Python >= 2.7. On Python 2.7, these iterators also have the same set-like view behaviour as dictionaries in Python 3. This can streamline code that iterates over large dictionaries. For example:
from future import print_function from builtins import dict, range
Memory-efficient construction:
d = dict((i, i2) for i in range(107))
assert not isinstance(d.items(), list)
Because items() is memory-efficient, so is this:
d2 = dict((v, k) for (k, v) in d.items())
As usual, on Python 3 dict
imported from either builtins
orfuture.builtins
is just the built-in dict
class.
Memory-efficiency and alternatives¶
If you already have large native dictionaries, the downside to wrapping them in a dict
call is that memory is copied (on both Py3 and on Py2). For example:
This allocates and then frees a large amount of temporary memory:
d = dict({i: i2 for i in range(107)})
If dictionary methods like values
and items
are called only once, this obviously negates the memory benefits offered by the overridden methods through not creating temporary lists.
The memory-efficient (and CPU-efficient) alternatives are:
- to construct a dictionary from an iterator. The above line could use a generator like this:
d = dict((i, i2) for i in range(107)) - to construct an empty dictionary with a
dict()
call usingbuiltins.dict
(rather than{}
) and then update it; - to use the
viewitems
etc. functions from future.utils, passing in regular dictionaries:
from future.utils import viewkeys, viewvalues, viewitems
for (key, value) in viewitems(hugedictionary):some code here
Set intersection:
d = {i**2: i for i in range(1000)}
both = viewkeys(d) & set(range(0, 1000, 7))
Set union:
both = viewvalues(d1) | viewvalues(d2)
For compatibility, the functions iteritems
etc. are also available infuture.utils. These are equivalent to the functions of the same names insix
, which is equivalent to calling the iteritems
etc. methods on Python 2, or to calling items
etc. on Python 3.
int¶
Python 3’s int
type is very similar to Python 2’s long
, except for the representation (which omits the L
suffix in Python 2). Python 2’s usual (short) integers have been removed from Python 3, as has thelong
builtin name.
Python 3:
2**64 18446744073709551616
Python 2:
2**64 18446744073709551616L
future
includes a backport of Python 3’s int
that is a subclass of Python 2’s long
with the same representation behaviour as Python 3’s int
. To ensure an integer is long compatibly with both Py3 and Py2, cast it like this:
from builtins import int must_be_a_long_integer = int(1234)
The backported int
object helps with writing doctests and simplifies code that deals with long
and int
as special cases on Py2. An example is the following code from xlwt-future
(called by the xlwt.antlr.BitSet
class) for writing out Excel .xls
spreadsheets. With future
, the code is:
from builtins import int
def longify(data): """ Turns data (an int or long, or a list of ints or longs) into a list of longs. """ if not data: return [int(0)] if not isinstance(data, list): return [int(data)] return list(map(int, data))
Without future
(or with future
< 0.7), this might be:
def longify(data): """ Turns data (an int or long, or a list of ints or longs) into a list of longs. """ if not data: if PY3: return [0] else: return [long(0)] if not isinstance(data,list): if PY3: return [int(data)] else: return [long(data)] if PY3: return list(map(int, data)) # same as returning data, but with up-front typechecking else: return list(map(long, data))
isinstance¶
The following tests all pass on Python 3:
assert isinstance(262, int) assert isinstance(263, int) assert isinstance(b'my byte-string', bytes) assert isinstance(u'unicode string 1', str) assert isinstance('unicode string 2', str)
However, two of these normally fail on Python 2:
assert isinstance(2**63, int) Traceback (most recent call last): File "", line 1, in AssertionError
assert isinstance(u'my unicode string', str) Traceback (most recent call last): File "", line 1, in AssertionError
And if this import is in effect on Python 2:
from future import unicode_literals
then the fifth test fails too:
assert isinstance('unicode string 2', str) Traceback (most recent call last): File "", line 1, in AssertionError
After importing the builtins from future
, all these tests pass on Python 2 as on Python 3:
from builtins import bytes, int, str
assert isinstance(10, int) assert isinstance(10**100, int) assert isinstance(b'my byte-string', bytes) assert isinstance(u'unicode string 1', str)
However, note that the last test requires that unicode_literals
be imported to succeed.:
from future import unicode_literals assert isinstance('unicode string 2', str)
This works because the backported types int
, bytes
and str
(and others) have metaclasses that override __instancecheck__
. See PEP 3119for details.
Passing data to/from Python 2 libraries¶
If you are passing any of the backported types (bytes
, int
, dict, ``str
) into brittle library code that performs type-checks using type()
, rather than isinstance()
, or requires that you pass Python 2’s native types (rather than subclasses) for some other reason, it may be necessary to upcast the types from future
to their native superclasses on Py2.
The native
function in future.utils
is provided for this. Here is how to use it. (The output showing is from Py2):
from builtins import int, bytes, str from future.utils import native
a = int(10**20) # Py3-like long int a 100000000000000000000 type(a) future.types.newint.newint native(a) 100000000000000000000L type(native(a)) long
b = bytes(b'ABC') type(b) future.types.newbytes.newbytes native(b) 'ABC' type(native(b)) str
s = str(u'ABC') type(s) future.types.newstr.newstr native(s) u'ABC' type(native(s)) unicode
On Py3, the native()
function is a no-op.
Native string type¶
Some library code, include standard library code like the array.array()
constructor, require native strings on Python 2 and Python 3. This means that there is no simple way to pass the appropriate string type when theunicode_literals
import from __future__
is in effect.
The objects native_str
and native_bytes
are available infuture.utils
for this case. These are equivalent to the str
andbytes
objects in __builtin__
on Python 2 or in builtins
on Python 3.
The functions native_str_to_bytes
and bytes_to_native_str
are also available for more explicit conversions.
open()¶
The Python 3 builtin open() function for opening files returns file contents as (unicode) strings unless the binary (b
) flag is passed, as in:
in which case its methods like read()
return Py3 bytes objects.
On Py2 with future
installed, the builtins module provides anopen
function that is mostly compatible with that on Python 3 (e.g. it offers keyword arguments like encoding
). This maps to the open
backport available in the standard library io module on Py2.7.
One difference to be aware of between the Python 3 open
andfuture.builtins.open
on Python 2 is that the return types of methods such as read()
from the file object that open
returns are not automatically cast from native bytes or unicode strings on Python 2 to the corresponding future.builtins.bytes
or future.builtins.str
types. If you need the returned data to behave the exactly same way on Py2 as on Py3, you can cast it explicitly as follows:
from future import unicode_literals from builtins import open, bytes
data = open('image.png', 'rb').read()
On Py2, data is a standard 8-bit str with loose Unicode coercion.
data + u'' would likely raise a UnicodeDecodeError
data = bytes(data)
Now it behaves like a Py3 bytes object...
assert data[:4] == b'\x89PNG' assert data[4] == 13 # integer
Raises TypeError:
data + u''
Custom __str__ methods¶
If you define a custom __str__
method for any of your classes, functions like print()
expect __str__
on Py2 to return a byte string, whereas on Py3 they expect a (unicode) string.
Use the following decorator to map the __str__
to __unicode__
on Py2 and define __str__
to encode it as utf-8:
from future.utils import python_2_unicode_compatible
@python_2_unicode_compatible class MyClass(object): def str(self): return u'Unicode string: \u5b54\u5b50' a = MyClass()
This then prints the name of a Chinese philosopher:
print(a)
This decorator is identical to the decorator of the same name indjango.utils.encoding
.
This decorator is a no-op on Python 3.
Custom iterators¶
If you define your own iterators, there is an incompatibility in the method name to retrieve the next item across Py3 and Py2. On Python 3 it is __next__
, whereas on Python 2 it is next
.
The most elegant solution to this is to derive your custom iterator class frombuiltins.object
and define a __next__
method as you normally would on Python 3. On Python 2, object
then refers to thefuture.types.newobject
base class, which provides a fallback next
method that calls your __next__
. Use it as follows:
from builtins import object
class Upper(object): def init(self, iterable): self._iter = iter(iterable) def next(self): # Py3-style iterator interface return next(self._iter).upper() def iter(self): return self
itr = Upper('hello') assert next(itr) == 'H' assert next(itr) == 'E' assert list(itr) == list('LLO')
You can use this approach unless you are defining a custom iterator as a subclass of a base class defined elsewhere that does not derive fromnewobject
. In that case, you can provide compatibility across Python 2 and Python 3 using the next
function from future.builtins
:
from builtins import next
from some_module import some_base_class
class Upper2(some_base_class): def init(self, iterable): self._iter = iter(iterable) def next(self): # Py3-style iterator interface return next(self._iter).upper() def iter(self): return self
itr2 = Upper2('hello') assert next(itr2) == 'H' assert next(itr2) == 'E'
next()
also works with regular Python 2 iterators with a .next
method:
itr3 = iter(['one', 'three', 'five']) assert 'next' in dir(itr3) assert next(itr3) == 'one'
This approach is feasible whenever your code calls the next()
function explicitly. If you consume the iterator implicitly in a for
loop orlist()
call or by some other means, the future.builtins.next
function will not help; the third assertion below would fail on Python 2:
itr2 = Upper2('hello')
assert next(itr2) == 'H'
assert next(itr2) == 'E'
assert list(itr2) == list('LLO') # fails because Py2 implicitly looks
# for a next
method.
Instead, you can use a decorator called implements_iterator
fromfuture.utils
to allow Py3-style iterators to work identically on Py2, even if they don’t inherit from future.builtins.object
. Use it as follows:
from future.utils import implements_iterator
Upper2 = implements_iterator(Upper2)
print(list(Upper2('hello')))
prints ['H', 'E', 'L', 'L', 'O']
This can of course also be used with the @
decorator syntax when defining the iterator as follows:
@implements_iterator class Upper2(some_base_class): def init(self, iterable): self._iter = iter(iterable) def next(self): # note the Py3 interface return next(self._iter).upper() def iter(self): return self
On Python 3, as usual, this decorator does nothing.
Binding a method to a class¶
Python 2 draws a distinction between bound and unbound methods, whereas in Python 3 this distinction is gone: unbound methods have been removed from the language. To bind a method to a class compatibly across Python 3 and Python 2, you can use the bind_method()
helper function:
from future.utils import bind_method
class Greeter(object): pass
def greet(self, message): print(message)
bind_method(Greeter, 'greet', greet)
g = Greeter() g.greet('Hi!')
On Python 3, calling bind_method(cls, name, func)
is equivalent to calling setattr(cls, name, func)
. On Python 2 it is equivalent to:
import types setattr(cls, name, types.MethodType(func, None, cls))
Metaclasses¶
Python 3 and Python 2 syntax for metaclasses are incompatible.future
provides a function (from jinja2/_compat.py
) calledwith_metaclass()
that can assist with specifying metaclasses portably across Py3 and Py2. Use it like this:
from future.utils import with_metaclass
class BaseForm(object): pass
class FormType(type): pass
class Form(with_metaclass(FormType, BaseForm)): pass