Differences between PyPy and CPython — PyPy documentation (original) (raw)

PyPy

This page documents the few differences and incompatibilities between the PyPy Python interpreter and CPython. Some of these differences are “by design”, since we think that there are cases in which the behaviour of CPython is buggy, and we do not want to copy bugs.

Differences that are not listed here should be considered bugs of PyPy.

Missing sys.getrefcount

Because of the different strategy above, sys.getrefcount() would return an unreliable number. So PyPy does not implement that, trying to use it will raise AttributeError: module 'sys' has no attribute 'getrefcount'. Note that newer versions of CPython also change the meaning of sys.getrefcount().

Subclasses of built-in types

Officially, CPython has no rule at all for when exactly overridden method of subclasses of built-in types get implicitly called or not. As an approximation, these methods are never called by other built-in methods of the same object. For example, an overridden __getitem__() in a subclass ofdict will not be called by e.g. the built-in get()method.

The above is true both in CPython and in PyPy. Differences can occur about whether a built-in function or method will call an overridden method of another object than self. In PyPy, they are often called in cases where CPython would not. Two examples:

class D(dict): def getitem(self, key): if key == 'print': return print return "%r from D" % (key,)

class A(object): pass

a = A() a.dict = D() a.foo = "a's own foo" print(a.foo)

CPython => a's own foo

PyPy => 'foo' from D

print('==========')

glob = D(foo="base item") loc = {} exec("print(foo)", glob, loc)

CPython => base item, and never looks up "print" in D

PyPy => 'foo' from D, and looks up "print" in D

Mutating classes of objects which are already used as dictionary keys

Consider the following snippet of code:

class X(object): pass

def evil_eq(self, other): print 'hello world' return False

def evil(y): d = {X(): 1} X.eq = evil_eq d[y] # might trigger a call to eq?

In CPython, __evil_eq__ might be called, although there is no way to write a test which reliably calls it. It happens if y is not x and hash(y) == hash(x), where hash(x) is computed when x is inserted into the dictionary. If by chance the condition is satisfied, then __evil_eq__is called.

PyPy uses a special strategy to optimize dictionaries whose keys are instances of user-defined classes which do not override the default __hash__,__eq__ and __cmp__: when using this strategy, __eq__ and__cmp__ are never called, but instead the lookup is done by identity, so in the case above it is guaranteed that __eq__ won’t be called.

Note that in all other cases (e.g., if you have a custom __hash__ and__eq__ in y) the behavior is exactly the same as CPython.

Ignored exceptions

In many corner cases, CPython can silently swallow exceptions. The precise list of when this occurs is rather long, even though most cases are very uncommon. The most well-known places are custom rich comparison methods (like __eq__); dictionary lookup; calls to some built-in functions like isinstance().

Unless this behavior is clearly present by design and documented as such (as e.g. for hasattr()), in most cases PyPy lets the exception propagate instead.

Object Identity of Primitive Values, is and id

Object identity of primitive values works by value equality, not by identity of the wrapper. This means that x + 1 is x + 1 is always true, for arbitrary integers x. The rule applies for the following types:

This change requires some changes to id as well. id fulfills the following condition: x is y <=> id(x) == id(y). Therefore id of the above types will return a value that is computed from the argument, and can thus be larger than sys.maxint (i.e. it can be an arbitrary long).

Note that strings of length 2 or greater can be equal without being identical. Similarly, x is (2,) is not necessarily true even ifx contains a tuple and x == (2,). The uniqueness rules apply only to the particular cases described above. The str, unicode,tuple and frozenset rules were added in PyPy 5.4; before that, a test like if x is "?" or if x is () could fail even if x was equal to "?" or (). The new behavior added in PyPy 5.4 is closer to CPython’s, which caches precisely the empty tuple/frozenset, and (generally but not always) the strings and unicodes of length <= 1.

Note that for floats there “is” only one object per “bit pattern” of the float. So float('nan') is float('nan') is true on PyPy, but not on CPython because they are two objects; but 0.0 is -0.0is always False, as the bit patterns are different. As usual,float('nan') == float('nan') is always False. When used in containers (as list items or in sets for example), the exact rule of equality used is “if x is y or x == y” (on both CPython and PyPy); as a consequence, because all nans are identical in PyPy, you cannot have several of them in a set, unlike in CPython. (Issue #1974). Another consequence is that cmp(float('nan'), float('nan')) == 0, becausecmp checks with is first whether the arguments are identical (there is no good value to return from this call to cmp, because cmp pretends that there is a total order on floats, but that is wrong for NaNs).

C-API Differences

The external C-API has been reimplemented in PyPy as an internal cpyext module. We support most of the documented C-API, but sometimes internal C-abstractions leak out on CPython and are abused, perhaps even unknowingly. For instance, assignment to a PyTupleObject is not supported after the tuple is used internally, even by another C-API function call. On CPython this will succeed as long as the refcount is 1. On PyPy this will always raise aSystemError('PyTuple_SetItem called on tuple after use of tuple")exception (explicitly listed here for search engines).

Another similar problem is assignment of a new function pointer to any of thetp_as_* structures after calling PyType_Ready. For instance, overridingtp_as_number.nb_int with a different function after calling PyType_Readyon CPython will result in the old function being called for x.__int__()(via class __dict__ lookup) and the new function being called for int(x)(via slot lookup). On PyPy we will always call the __new__ function, not the old, this quirky behaviour is unfortunately necessary to fully support NumPy.

The cpyext layer adds complexity and is slow. If possible, use cffi or HPy.

Performance Differences

CPython has an optimization that can make repeated string concatenation not quadratic. For example, this kind of code runs in O(n) time:

s = '' for string in mylist: s += string

In PyPy, this code will always have quadratic complexity. Note also, that the CPython optimization is brittle and can break by having slight variations in your code anyway. So you should anyway replace the code with:

parts = [] for string in mylist: parts.append(string) s = "".join(parts)

Miscellaneous

Extension modules

List of extension modules that we support:

The extension modules (i.e. modules written in C, in the standard CPython) that are neither mentioned above nor in lib_pypy/ are not available in PyPy.