[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)

Antoine Pitrou solipsis at pitrou.net
Mon Oct 3 14:32:48 CEST 2005


Le lundi 03 octobre 2005 à 02:09 -0400, Martin Blais a écrit :

What if we could completely disable the implicit conversions between unicode and str?

This would be very annoying when dealing with some modules or libraries where the type (str / unicode) returned by a function depends on the context, build, or platform.

A good rule of thumb is to convert to unicode everything that is semantically textual, and to only use str for what is to be semantically treated as a string of bytes (network packets, identifiers...). This is also, AFAIU, the semantic model which is favoured for a hypothetical future version of Python.

This is what I'm using to do safe conversion to a given type without worrying about the type of the argument:

DEFAULT_CHARSET = 'utf-8'

def safe_unicode(s, charset=None): """ Forced conversion of a string to unicode, does nothing if the argument is already an unicode object. This function is useful because the .decode method on an unicode object, instead of being a no-op, tries to do a double conversion back and forth (which often fails because 'ascii' is the default codec). """ if isinstance(s, str): return s.decode(charset or DEFAULT_CHARSET) else: return s

def safe_str(s, charset=None): """ Forced conversion of an unicode to string, does nothing if the argument is already a plain str object. This function is useful because the .encode method on an str object, instead of being a no-op, tries to do a double conversion back and forth (which often fails because 'ascii' is the default codec). """ if isinstance(s, unicode): return s.encode(charset or DEFAULT_CHARSET) else: return s

Good luck

Antoine.



More information about the Python-Dev mailing list