msg21859 - (view) |
Author: Thomas Wouters (twouters) *  |
Date: 2004-07-31 00:08 |
Joining a list of string subtype instances usually results in a single string instance: >>> class mystr(str): pass >>> type("".join([mystr("a"), mystr("b")])) <type 'str'> But if the list only contains one object that is a string subtype instance, that instance is returned unchanged: >>> type("".join([mystr("a")])) <class '__main__.mystr'> This can have odd effects, for instance when the result of "".join(lst) is used as the returnvalue of a __str__ hook. "".join should perhaps return the type of the joining string, but definately vary its type based on the *number* of items its joining. |
|
|
msg21860 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-08-02 14:25 |
Logged In: YES user_id=6656 What are you asking? I agree it's a bug. I'm sure you're competent to write a patch :-) |
|
|
msg21861 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2004-08-04 19:39 |
Logged In: YES user_id=593130 This behavior does not, to me, clearly violate the current doc: "Return a string which is the concatenation of the strings in the sequence seq" where string is bytestring or Unicodestring. If one takes 'string' narrowly, then your subclass instances should be rejected as input. If one takes 'string' more broadly as isinstance(s,basestring) then your subclass should be equally acceptible as input or output. If neither consistent interpretation of 'string' is meant, then there is a doc bug, or at least an underspecification. Workaround 0: if len(seq) == 1: ... Workaround 1. map(str, seq)) to force str out. *However*, in playing around (in 2.2), I discovered: >>> type(''.join((a))) <type 'str'> >>> type(''.join([a])) <class '__main__.ms'> >>> type(''.join({a:None})) <class '__main__.ms'> Having the type of the join of a singleton depend on the type (mutability?) of the singleton wrapper is definitely disquieting. Workaround 2: tuple(seq) |
|
|
msg21862 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2004-08-04 20:28 |
Logged In: YES user_id=38388 I agree with Terry. The result type is defined by the semantics or the list elements and the length of the list: len(list) > 1: sep.join(list) := list[0] + sep + ... + sep + list[n] len(list) == 1: sep.join(list) := list[0] len(list) == 0: sep.join(list) := sep[:0] |
|
|
msg21863 - (view) |
Author: Michael Hudson (mwh)  |
Date: 2004-08-05 12:04 |
Logged In: YES user_id=6656 A clue for Terry: think about what "(a)" isn't :-) I initially agreed that this was a bug because, e.g. str_subclass()[:] returns a str. Isn't this the same sort of thing? |
|
|
msg21864 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2004-08-05 16:10 |
Logged In: YES user_id=593130 Duh, my turn to forget. For any beginners reading this ... >>> class ms(str): pass ... >>> a=ms('a') >>> type(''.join((a,))) <class '__main__.ms'> Expanding mhw's second point: >>> e=ms() >>> type(e) <class '__main__.ms'> >>> import copy >>> e2=copy.copy(e) >>> type(e2) <class '__main__.ms'> >>> e3=e[:] >>> type(e3) <type 'str'> >>> id(e),id(e2),id(e3) (9494608, 9009936, 8577440) so [:] is not exactly an abbreviated synonym for copy(). Is this a butg? (I haven't rechecked the respective docs yet.) One reason I hesitate to call the OP's original observation a bug is that the whole sujbect of operations on subtype instances seems not completely baked. Knowing the result types in all cases may require experiments as well as doc reading. |
|
|
msg21865 - (view) |
Author: Gustavo Niemeyer (niemeyer) *  |
Date: 2004-08-07 15:48 |
Logged In: YES user_id=7887 If this was considered a bug: >>> type(ms("a")+ms("b")) <type 'str'> >>> type(ms("a")[:]) <type 'str'> Are these bugs as well? I belive this is how the implementation was intended to be, even if not optimal for subclasses. I suggest closing this bug as invalid, and writing a PEP about the possible new subclass support change (for all classes), if there's enough interest. |
|
|
msg21866 - (view) |
Author: Thomas Wouters (twouters) *  |
Date: 2004-08-07 22:17 |
Logged In: YES user_id=34209 The point of the original bugreport is not that some operations return strings instead of subtypes. The point is that *one* operation *sometimes* returns subtypes. It's inconsistent and unexpected behaviour, and since you clearly don't write 'sep.join(seq)' for a common case of 'seq' being a single item, something you will only occasionally trigger. I don't have an emotional investment in this bug, it's just something that came up on #python. I also don't care which way it's fixed -- but treating the single-element-sequence case the same as the multiple-element-sequence seems logical to me. Regardless of how the multiple-element-sequence is handled exactly :) As for why I didn't write a patch myself, Michael, if I had time for that, I would've spent it writing a good decorator proposal >:-) |
|
|
msg21867 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-08-11 07:00 |
Logged In: YES user_id=1038590 Don't know about anyone else, but the shortcut in str.join that returns a reference to the *original* string in the case of a single item sequence strikes me as very bad ju-ju: >>> class mystr(str): pass ... >>> s1 = mystr('fred') >>> s1 'fred' >>> s1.mutable = 42 >>> s1.mutable 42 >>> s2 = ''.join([s1]) >>> s2.mutable 42 When I call join, I expect to get a *new* object back, not a reference to an old one (this is safe for standard strings, but not for subclasses which may have mutable state). |
|
|
msg21868 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-08-11 08:04 |
Logged In: YES user_id=1038590 New patch (#1007087) created with a test for this bug, as well as a fix for it (the fix simply removes the 'sequence of 1' shortcut). Checks for the unicode case as well, although unicode didn't have this bug (due to a different join implementation). |
|
|
msg21869 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-08-19 17:46 |
Logged In: YES user_id=80475 Was this one fixed? |
|
|
msg21870 - (view) |
Author: Tim Peters (tim.peters) *  |
Date: 2004-08-19 18:22 |
Logged In: YES user_id=31435 I think the patch is still awaiting review. |
|
|
msg21871 - (view) |
Author: Alyssa Coghlan (ncoghlan) *  |
Date: 2004-08-19 21:13 |
Logged In: YES user_id=1038590 I've just assigned the relevant patch to Tim for review. The latest version should address his concerns with the original fix (which didn't use the optimised path, even when it was safe). |
|
|
msg21872 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2004-08-23 23:25 |
Logged In: YES user_id=80475 Fixed. See: Objects/stringobject.c 2.225 Lib/test/test_string.py 1.26 |
|
|