[Python-Dev] Subclassing varying length types (What's a PyStructSequence ?) (original) (raw)

M.-A. Lemburg mal@lemburg.com
Mon, 10 Dec 2001 11:57:10 +0100


Tim Peters wrote:

[MAL] > Have you tried disabling all free list and using pymalloc > instead ? No, but I haven't tried anything -- it's a 2.3 issue. > If this pays off, I agree, we should get rid off all of them. When I do try it , it will be slower but more memory-efficient (both data and code) than the type-specific free lists, and faster and much more memory-efficient than using malloc().

Well, let's do some pybench runs next year and see what the results look like.

> ... > I would consider moving from 8-bit strings to Unicode an > improvement in flexibility.

Sure. Moving from one malloc to two is orthogonal.

You know that I know that you knew what I was talking about :-)

> It also results in better algroithms (== simpler, less error-prone, > etc. in this case).

Unclear what "it" means; assuming it means using two mallocs instead of one for a Unicode string object, the 8-bit string algorithms haven't been a particular source of bugs. People mutating strings at the C level has been.

If you ever try to support more than ASCII text in a user program, you'll find that having to deal with only one encoding safes you a whole lot of trouble. I won't even start talking about variable length encodings, encodings with builtin shift state and other goodies which are a complete nightmare to handle (e.g. various character properties such as title case, upper/lower mappings, different ways to encode a single character, collation,...).

> As I said, it's a tradeoff flexibility vs. memory consumption. > Whether it pays off depends on your application environment. It > certainly does for companies like Micron and pays off stock-wise > for a lot of people... uhm, getting off-topic here :-)

I've got nothing against Unicode (apart from the larger issue that the whole world would obviously be a lot better off if they switched to American English ).

I suppose Mandarin would reach a larger share in world population ... and they need Unicode :-)

>> Subclassing seems easy enough to me from the Python level; I >> don't have time to revisit C-level subclasssing here (and I don't >> know that it's hackish there either, but do think it's in need of >> docs).

> It is beautifully easy for non-varying-length types. Unfortunately, > it happens that some of the basic types which would be attractive > for subclassing are varying length types (such as string and > tuples). It's easy to subclass from str and tuple in Python -- even to add your own instance data.

Yeah, but that's not the point. I want to do this in C...

> In my case, I'm looking for away to subclass strings, but I haven't > yet found an elegant solution to the problem of adding extra > data to the instances.

It's easy if you're willing to use a dict:

I would be willing to use a dictionary. It's only that even the dictionary trick doesn't seem to work at C level.

class STR(str): def new(cls, strguts, n): self = str.new(cls, strguts) self.n = n return self

s = STR('abc', 42) print s # abc print s.n # 42 slots doesn't work here, though. I admit I personally don't see much attraction to subclassing from str and tuple, apart from adding additional methods. I suppose someone could code up two-malloc variants ...

If you look at mxURL you'll find an extension type which tries to play nice with strings -- it would be a good candidate for a string subtype.

A string type which carries along an encoding attribute would be another good candidate for a string subtype.

Both need extra attributes/data fields.

-- Marc-Andre Lemburg CEO eGenix.com Software GmbH


Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/