Issue 635398: re.sub() coerces u'' to '' (original) (raw)

Using Python 2.2.1 on FreeBSD, these work as expected:

re.sub(u'f', u'b', u'foo') # keep string as Unicode u'boo' re.sub(u'f', u'b', 'foo') # coerce string to Unicode u'boo'

But this doesn't work the way I think it should:

re.sub(u'f', u'b', u'') # coerce string to non- Unicode?! ''

That is, an empty Unicode string does not survive as Unicode after going through re.sub().

Logged In: YES user_id=38376

this buglet has already been fixed in the SRE master repository. here's the relevant portion:

*** 1802,1808 **** switch (PyList_GET_SIZE(list)) { case 0: Py_DECREF(list); ! return PyString_FromString(""); case 1: result = PyList_GET_ITEM(list, 0); Py_INCREF(result); --- 1785,1791 ---- switch (PyList_GET_SIZE(list)) { case 0: Py_DECREF(list); ! return PySequence_GetSlice(pattern, 0, 0); case 1: result = PyList_GET_ITEM(list, 0); Py_INCREF(result);

I'll update the Python repository asap (once I've gotten around to merge in some changes done in the Python repository).

PS. also see my post on comp.lang.python on this topic. well-written unicode code shouldn't care about things like this...