[Python-Dev] binary trees. Review obmalloc.c (original) (raw)

Vladimir 'Yu' Stepanov vys at renet.ru
Thu Apr 27 11:01:26 CEST 2006


Josiah Carlson wrote:

"Vladimir 'Yu' Stepanov" <vys at renet.ru> wrote:

Josiah Carlson wrote:

There exists various C and Python implementations of both AVL and Red-Black trees. For users of Python who want to use AVL and/or Red-Black trees, I would urge them to use the Python implementations. In the case of needing the speed of a C extension, there already exists a CPython extension module for AVL trees: http://www.python.org/pypi/pyavl/1.1

I would suggest you look through some suggested SoC projects in the wiki: http://wiki.python.org/moin/SummerOfCode - Josiah

Thanks for the answer! I already saw pyavl-1.1. But for this reason I would like to see the module in a standard package python. Functionality for pyavl and dict to compare difficultly. Functionality of my module will differ from functionality dict in the best party. I have lead some tests on for work with different types both for a package pyavl-1.1, and for the prototype of own module. The script of check is resulted in attached a file avl-timeit.py In files timeit-result--.txt results of this test. The first in the name of a file means quantity of the added elements, the second - argument of a method timeit. There it is visible, that in spite of the fact that the module xtree is more combined in comparison with pyavl the module (for everyone again inserted pair [the key, value], is created two elements: python object - pair, and an internal element of a tree), even in this case it works a little bit more quickly. Besides the module pyavl is unstable for work in multithread appendices (look the attached file module-avl-is-thread-unsafe.py). I'm not concerned about the speed of the external AVL module, and I'm not concerned about the speed of trees in Python; so far people have found that dictionaries are generally sufficient. More specifically, while new lambda syntaxes are presented almost weekly, I haven't heard anyone complain about Python's lack of a tree module in months. As a result, I don't find the possible addition of any tree structure to the collections module as being a generally compelling addition. The object dict is irreplaceable in most cases, it so. I do not assume, that binary trees can sometime replace standard `dict' object. Sorting of the list also is and will be irreplaceable function.

But there is a number of problems which can be rather easily solved by means of binary trees. Most likely for them there will be an alternative. But, as a rule, it is less obvious to the programmer. And most likely realization will be not so fast.

Probably that nobody mentioned necessity of binary trees. In my opinion it is not a reason in an estimation of necessity of the given type.

To take for example PHP. The array and the dictionary there is realized as the list. There is no even a hint on hash object. Those users to whom was necessary hash already have chosen perl/python/ruby/etc. The similar situation can arise and concerning binary trees.

Again, I believe that the existance of 3rd party extension modules which implement AVL and red-black trees, regardless of their lack of thread safety, slower than your module by a constant, or lack of particular functionality to be basically immaterial to this discussion. In my mind, there are only two relevant items to this discussion:

1. Is having a tree implementation (AVL or Red-Black) important, and if so, is it important enough to include in the collections module? 2. Is a tree implementation important enough for google to pay for its inclusion, given that there exists pyavl and a collectionsmodule.c patch for a red-black tree implementation? Then again, you already have your own implementation of a tree module, and it seems as though you would like to be paid for this already-done work. I don't know how google feels about such things, but you should remember to include this information in your application. I have looked a patch for collectionsmodule.c. In my opinion it is fine realization of binary trees. If there is this class that to me there is no need to create one more copy.

My persistence speaks a demand of this type of data. As to my message on the beginning of the project for SoC I ask to consider that this theme has not been lifted :) For me it was a minor question.

I think, that creation of this type (together with type of pair), will make programming more convenient since sorting by function sort will be required less often.

How often are you sorting data? I've found that very few of my operations involve sorting data of any kind, and even if I did have need of sorting, Python's sorting algorithm is quite fast, likely faster by nearly a factor of 2 (in the number of comparisons) than the on-line construction of an AVL or Red-Black tree. Really it is not frequent. Only when there is one of problems which is difficult for solving by means of already available means, whether means it that I should look aside C, C ++? It not seems to me that presence of structures of self-balanced binary trees are distinctive feature of any language.

Comparison of functions of sorting and binary trees not absolutely correctly. I think that function sort will lose considerably on greater lists. Especially after an insert or removal of all one element.

I can probably borrow in this module beyond the framework of the project google. The demand of such type of data is interesting. Because of necessity of processing gcmodule.c' and obmalloc.c' this module cannot be realized as the external module.

It seems to me (after checking collectionsmodule.c and a few others) that proper C modules only seem to import Python.h and maybe structmember.h . If you are manually including gcmodule.c and obmalloc.c, you may be doing something wrong; I'll leave it to others to further comment on this aspect. Thanks that at once has not shot me :) I was not going to work directly neither with that, nor with another. There were difficulties with translation :)

When I discussed obmalloc.c, I meant its internal code. Ways of its improvement:



More information about the Python-Dev mailing list