Issue 1651: Limit the max size of PyUnicodeObject->defenc? (original) (raw)

Created on 2007-12-18 12:13 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (6)
msg58744 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2007-12-18 12:13
I think that the cached default encoding version of the unicode object should be limited in size. It's probably a bad idea to cache a 100MB of data. For large amount strings and unicode objects the user should do explicit caching if required.
msg58756 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2007-12-18 19:14
I don't see a patch. And I think you cannot do this without compromising correctness, since _PyUnicode_AsDefaultEncodedString() returns the cached value without incrementing its refcount. (The only refcount that keeps it alive is the cache entry.)
msg61547 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2008-01-22 23:02
The default encoding version is generated lazily, and only from a couple of places (if I believe my grepping through the py3k sources). So we can: * choose not to care, as the conversion looks rather rare * incref the return value of _PyUnicode_AsDefaultEncodedString(), and convert the 20 or so places in which that function is used to properly decref the value when done
msg61548 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-01-22 23:05
> * choose not to care, as the conversion looks rather rare Yes. > * incref the return value of _PyUnicode_AsDefaultEncodedString(), > and convert the 20 or so places in which that function is used to > properly decref the value when done No. I suspect you'll find it quite difficult to pick a place where to do the decref in some cases.
msg61549 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2008-01-22 23:41
For Py3k you can get rid of the cached default encoded version of the Unicode object altogether: This was only needed to make the Unicode/string auto-coercion mechanism efficient in Python 2.x. In Py3k, you'll only do such conversions at the IO-boundaries and explicitly, so caching the converted value is no longer necessary.
msg61550 - (view)	Author: Guido van Rossum (gvanrossum) *	Date: 2008-01-22 23:49
You wish. In practice (unfortunately) it's still used quite a bit. It would be a good project to eradicate the need, but I see it as low priority.

History
Date	User	Action	Args
2022-04-11 14:56:29	admin	set	github: 45992
2008-01-22 23:49:21	gvanrossum	set	messages: +
2008-01-22 23:41:31	lemburg	set	nosy: + lemburgmessages: +
2008-01-22 23:05:23	gvanrossum	set	status: open -> closedmessages: +
2008-01-22 23:02:11	pitrou	set	nosy: + pitroumessages: +
2008-01-06 22:29:44	admin	set	keywords: - py3kversions: Python 3.0
2007-12-18 19:14:12	gvanrossum	set	priority: high -> normalkeywords: - patchresolution: rejectedmessages: + nosy: + gvanrossum
2007-12-18 12:13:40	christian.heimes	create