Issue 18723: shorten function of textwrap module is susceptible to non-normalized whitespaces (original) (raw)

Created on 2013-08-13 06:26 by vajrasky, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix_for_non_normalized_whitespaces_in_placeholder.patch vajrasky,2013-08-13 06:26 review
fix_for_non_normalized_whitespaces_in_placeholder_v2.patch vajrasky,2013-08-13 11:05 review
fix_for_non_normalized_whitespaces_in_placeholder_v3.patch vajrasky,2013-08-13 13:22 review
fix_for_non_normalized_whitespaces_in_placeholder_v4.patch vajrasky,2013-08-13 15:10 review
fix_for_non_normalized_whitespaces_in_placeholder_v5.patch vajrasky,2013-08-13 15:48 review
Messages (11)
msg195045 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 06:26
In shorten function of textwrap module, the placeholder becomes a hole where we can inject non-normalized whitespaces to the text. >>> text = "Hello there, how are you this fine day? I'm glad to hear it!" >>> from textwrap import shorten >>> shorten(text, 40, placeholder=" ") 'Hello there, how are you' We normalize the only-whitespaces placeholder. But.... >>> shorten(text, 17, placeholder=" \n\t(...) \n\t[...] \n\t") '(...) \n\t[...]' >>> shorten(text, 40, placeholder=" \n\t(...) \n\t[...] \n\t") 'Hello there, how \n\t(...) \n\t[...]' Attached the patch to normalize the non-normalized whitespaces in placeholder.
msg195046 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 06:44
I'm not convinced this is a bug. The whitespace right-stripping is more of an implementation detail. You can really put what you want inside the placeholder.
msg195047 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 06:54
Okay, nevermind about non-normalized whitespaces in placeholder, but what about this case? >>> text = "Hello there, how are you this fine day? I'm glad to hear it!" >>> from textwrap import shorten >>> shorten(text, 10, placeholder=" ") 'Hello' >>> shorten(text, 9, placeholder=" ") '' >>> len('Hello') 5 Isn't that weird?
msg195059 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 09:23
> Isn't that weird? Agreed, this one is a bug. The stripping in shorten() should be smarter, i.e. it should not affect the placeholder's own spaces. Do you want to write a patch?
msg195060 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 09:24
Correcting myself: > i.e. it should not affect the placeholder's own spaces. ... except for leading whitespace in case the placeholder ends up alone in the result. i.e. shorten("somethingtoolong") should return "(...)", not " (...)".
msg195067 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 11:05
> Do you want to write a patch? My pleasure. Attached the second version of the patch to accomodate Pitrou's request.
msg195073 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 13:19
Attached more refined patch. Removed unnecessary test. Added more robust test. Added shorten in __all__.
msg195079 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 15:10
I missed this case: >>> from textwrap import shorten >>> shorten('hell', 4) Traceback (most recent call last): File "", line 1, in File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 386, in shorten return w.shorten(text, placeholder=placeholder) File "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", line 322, in shorten raise ValueError("placeholder too large for max width") ValueError: placeholder too large for max width Also, in this patch, I removed the unnecessary stripping of the text part.
msg195080 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-08-13 15:34
> I missed this case: > > >>> from textwrap import shorten > >>> shorten('hell', 4) > Traceback (most recent call last): > File "", line 1, in > File > "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", > line 386, in shorten > return w.shorten(text, placeholder=placeholder) > File > "/home/sky/Code/python/programming_language/cpython/Lib/textwrap.py", > line 322, in shorten > raise ValueError("placeholder too large for max width") > ValueError: placeholder too large for max width This is by design. Passing a placeholder larger than the width is a programming error, regardless of whether the text is small enough.
msg195081 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-08-13 15:48
Okay, attached the fifth version not to care about the case where text is smaller than the placeholder.
msg200038 - (view) Author: Vajrasky Kok (vajrasky) * Date: 2013-10-16 07:38
Serhiy's commit http://hg.python.org/cpython/rev/2e8c424dc638 fixed this issue already. So I closed this ticket.
History
Date User Action Args
2022-04-11 14:57:49 admin set github: 62923
2013-10-16 07:38:56 vajrasky set status: open -> closedresolution: fixedmessages: +
2013-08-13 15:48:41 vajrasky set files: + fix_for_non_normalized_whitespaces_in_placeholder_v5.patchmessages: +
2013-08-13 15:34:15 pitrou set messages: + title: shorten function of textwrap module is susceptible to non-normalized whitespaces -> shorten function of textwrap module is susceptible to non-normalized whitespaces
2013-08-13 15:10:07 vajrasky set files: + fix_for_non_normalized_whitespaces_in_placeholder_v4.patchmessages: +
2013-08-13 13:22:45 vajrasky set files: + fix_for_non_normalized_whitespaces_in_placeholder_v3.patch
2013-08-13 13:21:33 vajrasky set files: - fix_for_non_normalized_whitespaces_in_placeholder_v3.patch
2013-08-13 13:19:47 vajrasky set files: + fix_for_non_normalized_whitespaces_in_placeholder_v3.patchmessages: +
2013-08-13 11:05:29 vajrasky set files: + fix_for_non_normalized_whitespaces_in_placeholder_v2.patchmessages: +
2013-08-13 09:24:07 pitrou set messages: +
2013-08-13 09:23:09 pitrou set messages: +
2013-08-13 09:12:42 ezio.melotti set nosy: + ezio.melottistage: patch review
2013-08-13 06:54:54 vajrasky set messages: +
2013-08-13 06:44:06 pitrou set messages: +
2013-08-13 06:26:52 vajrasky create