Issue 635398: re.sub() coerces u'' to '' (original) (raw)
Using Python 2.2.1 on FreeBSD, these work as expected:
re.sub(u'f', u'b', u'foo') # keep string as Unicode u'boo' re.sub(u'f', u'b', 'foo') # coerce string to Unicode u'boo'
But this doesn't work the way I think it should:
re.sub(u'f', u'b', u'') # coerce string to non- Unicode?! ''
That is, an empty Unicode string does not survive as Unicode after going through re.sub().
Logged In: YES user_id=38376
this buglet has already been fixed in the SRE master repository. here's the relevant portion:
*** 1802,1808 **** switch (PyList_GET_SIZE(list)) { case 0: Py_DECREF(list); ! return PyString_FromString(""); case 1: result = PyList_GET_ITEM(list, 0); Py_INCREF(result); --- 1785,1791 ---- switch (PyList_GET_SIZE(list)) { case 0: Py_DECREF(list); ! return PySequence_GetSlice(pattern, 0, 0); case 1: result = PyList_GET_ITEM(list, 0); Py_INCREF(result);
I'll update the Python repository asap (once I've gotten around to merge in some changes done in the Python repository).
PS. also see my post on comp.lang.python on this topic. well-written unicode code shouldn't care about things like this...