[Python-Dev] Comparison speed (original) (raw)

Tim Peters tim.one@home.com
Mon, 14 May 2001 23:36:42 -0400

Previous message: [Python-Dev] Comparison speed
Next message: [Python-Dev] Comparison speed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

[Martin v. Loewis]

... When stepping through the code, I also missed support for the relationship between identity and equality. E.g. in PyObjectRichCompare, I'd expect

if (v == w) { switch (op) case PyEQ:case PyLE:case PyGE: PyINCREF(PyTrue); return PyTrue; case PyNE:case PyLT:case PyGT: PyINCREF(PyFalse); return PyFalse; } } That would not help in your case, of course. I don't even know how frequent comparing identical objects is in real life - but this is something that PyObjectCompare has that PyObjectRichCompare currently doesn't.

Guido insisted (with cause ) on these four pairs as being equivalent:

x <  y  iff  y >  x
x <= y       y >= x
x == y       y == x
x != y       y != x

but beyond that, in the presence of rich comparisons, agreed not to make any other assumptions about what those pixel-bags "mean". In particular, there's no implication that "x <= y" iff "x < y or x == y", or that "x < y" implies "x != y", etc.

Applying that to the above leaves you with nothing but

if (v == w && op == Py_EQ) /* then return Py_True */

Which is about all PyObject_Compare's

if (v == w)
    return 0;

assumes too. So I don't see much future in that.

[later, a patch to fill in the richcmp slot for strings]

+static PyObject* +stringrichcompare(PyStringObject *a, PyStringObject *b, int op) +{ + int c; + PyObject *result; + if (!PyStringCheck(b)) { + result = PyNotImplemented; + goto out; + } + if (op == PyEQ) { + if (a->obsize != b->obsize) { + result = PyFalse; + goto out; + } +#ifdef CACHEHASH + if (a->obshash != b->obshash + && a->obshash != -1 + && b->obshash != -1) { + result = PyFalse; + goto out; + } +#endif + } + c = stringcompare(a, b); + switch (op) { + case PyLT: c = c < 0; break;_ _+ case PyLE: c = c <= 0; break;_ _+ case PyEQ: c = c == 0; break;_ _+ case PyNE: c = c != 0; break;_ _+ case PyGT: c = c > 0; break; + case PyGE: c = c >= 0; break; + default: + result = PyNotImplemented; + goto out; + } + result = c ? PyTrue : PyFalse; + out: + PyINCREF(result); + return result;

[and that yields about an 8% speedup in the "<" case]

That looks on the right track, but maybe at the wrong level: why is it necessary? That is, the bulk of the "smarts" here in the switch stmt are type-independent: if there's no specific implementation of individual comparisons, but there is a tp_compare, then the switch stmt applies verbatim to any such type. Do we have to fill in the richcmp slot for everything to get Python to realize that? I mean "just about everything", too: while, e.g., ceval special-cases "<" for ints, that doesn't do sorting or max or min etc on ints a lick of good (they don't go thru the COMPARE_OP opcode then, but thru the general comparison routines).

The "speed problem" appears to be:

COMPARE_OP calls cmp_outcome()
which calls PyObject_RichCompare()
```
which calls do_richcmp()
```

  which calls try_rich_compare() (unsuccessfully now,
                                  successfully after your patch)
    which fails to find a richcmp slot on either operand (now)
    so says "not implemented"

```
  then calls try_3way_to_rich_compare()
```
```
    which calls try_3way_compare()
```

       which finally calls the tp_compare slot

       then runs exactly the same
     switch (op) {
  case Py_LT: c = c <  0; break;
  case Py_LE: c = c <= 0; break;
  case Py_EQ: c = c == 0; break;
  case Py_NE: c = c != 0; break;
  case Py_GT: c = c >  0; break;
  case Py_GE: c = c >= 0; break;
  }
      result = c ? Py_True : Py_False;
       switch as your patch

and things unwind. So we've got 7 function calls there, not even counting calls to PyErr_Occurred() and PyObject_IsTrue(), all to find about 3 machine instructions that actually do the compare .

You got an 8% speedup for one type by tricking the switch stmt into appearing 3 calls earlier. What if the implementation were smarter, and did it for all relevant types even a call or two before that?

I don't see any reason "in principle" that compares couldn't be much faster, and via the usual gimmicks: bigger, smarter functions that remember what they've already determined so don't need to figure it out over and over again, and fast paths to favor common cases at the expense of comparisons from Mars. One thing to note here: the workhorse comparisons are "like strings" in having no logical need for richcmps at all; and the objects for which richcmps were introduced were numerical arrays, which can much better afford a longer code path to find them (one matrix compare will trigger many vanilla element compares anyway, so even for arrays it's much more important that the latter be fast). The code now is approximately backwards in that respect (it takes gobs of work before we even look for a cmp now -- indeed, if a type has both cmp and richcmp slots now, and we're doing an explict "cmp" compare, the code now tries to simulate cmp first via a long sequence of richcmp calls!).

I don't have time to uglify this code, but Python would benefit from it.

and-no-matter-what-guido-may-say-ly y'rs - tim

Previous message: [Python-Dev] Comparison speed
Next message: [Python-Dev] Comparison speed
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]