"'%s' % unicode_string" produces a unicode result. I think the following code should also return a unicode string: class Wrapper: ....def __str__(self): ........return unicode_string '%s' % Wrapper() That behavior would make it easier to write library code that can work with either str objects or unicode objects. The fix is pretty simple (see that attached patch). Perhaps the PyObject_Text function should be called _PyObject_Text instead. Alternatively, if the function is make public then we should document it and perhaps also provide a builtin function called 'text' that uses it.
Logged In: YES user_id=38388 Nice patch. Only nit: PyObject_Text() should check that the result of tp_str() is indeed either a string or unicode instance (possibly from a subclass). Otherwise, the function wouldn't be able to guarantee this feature - which is what it's all about.
Logged In: YES user_id=35752 Attaching a better patch. Add a builtin function called "text". Change PyObject_Text to check the return types as suggested by Mark. Update the documentation and the tests.
Logged In: YES user_id=35752 Here's a quote from him: > I'm beginning to think that we need an extra method (__text__), that > can return any kind of string that's compatible with Python's text model. > > (in today's CPython, that's an 8-bit string with ASCII only, or a Uni- > code string. future Python's may support more string types, at least at > the C implementation level). > > I'm not sure we can change __str__ or __unicode__ without breaking > code in really obscure ways (but I'd be happy to be proven wrong). My idea is that we can change __str__ without breaking code. The reason is that no one should be calling tp_str directly. Instead they use PyObject_Str. I don't know what he meant by "string that's compatible with Python's text model". With my change, Python can only deal with str or unicode instances. I have no idea how we could support other string implementations. I don't want to introduce a text() builtin that calls __str__ and then later realize that __text__ would be a useful. Perhaps this change is big enough to require a PEP.