Issue 36549: str.capitalize should titlecase the first character not uppercase (original) (raw)

Created on 2019-04-07 10:40 by steven.daprano, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 12804 merged kingsley,2019-04-12 14:07
Messages (10)
msg339568 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2019-04-07 10:40
str.capitalize appears to uppercase the first character of the string, which is okay for ASCII but not for non-English letters. For example, the letter NJ in Croatian appears as Nj at the start of words when the first character is capitalized: Njemačka ('Germany'), not NJemačka. (In ASCII, that's Njemacka not NJemacka.) https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs But using any of: U+01CA LATIN CAPITAL LETTER NJ U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J U+01CC LATIN SMALL LETTER NJ we get the wrong result with capitalize: py> 'NJemačka'.capitalize() 'NJemačka' py> 'Njemačka'.capitalize() 'NJemačka' py> 'njemačka'.capitalize() 'NJemačka' I believe that the correct behaviour is to titlecase the first code point and lowercase the rest, which is what the Apache library here does: https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String-
msg339570 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-07 10:57
I think this is a reasonable change. Also the docs for str.title() should be fixed.
msg339804 - (view) Author: Kingsley McDonald (kingsley) * Date: 2019-04-09 20:34
Hello there, I'm an absolute beginner here and this whole thing is a little overwhelming, so please bear with me. I think this would be a suitable first task for me to take on because it appears to be a simple one-line change (correct me if I'm mistaken, though).
msg339878 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-10 18:15
This issue is easy if you know C. * Find the implementation of str.capitalize in unicodeobject.c and make it using the title case. See on the implementation of str.title for example. * Find tests for str.capitalize and aďd new cases. Finding the proper place for test may be the hardest part. * Update the documentation for str.capitalize. Add the versionchanged directive. * Fix the documentation for str.title. Use str.capitalize in the example. * Add the news and What's New entries.
msg339890 - (view) Author: Kingsley McDonald (kingsley) * Date: 2019-04-10 20:49
Thanks for clarifying all of that! I now have the patch and tests working locally. However, I'm not too sure what documentation needs to be changed for str.title. Should it specify that only the first letter of digraphs are capitalised, rather than the full character? I sure hope I get the hang of this soon :-D
msg340066 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-12 15:35
New changeset b015fc86f7b1f35283804bfee788cce0a5495df7 by Steve Dower (Kingsley M) in branch 'master': bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804) https://github.com/python/cpython/commit/b015fc86f7b1f35283804bfee788cce0a5495df7
msg340067 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-12 15:36
Thanks! I'm a big fan of this change :)
msg340076 - (view) Author: Zackery Spytz (ZackerySpytz) * (Python triager) Date: 2019-04-12 16:14
I think that the PR may have been merged too quickly. Serhiy had made a list, and I think that the PR was missing some necessary changes.
msg340095 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-12 18:42
What is missing? It looks like everything on Serhiy's list was done.
msg340096 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2019-04-12 18:43
Oh, apart from the What's New section. But this looks enough like a bugfix (previous behaviour "wasn't capitalizing my name correctly" - new behaviour "now capitalizes my name correctly") that it's hardly critical to advertise it on that page.
History
Date User Action Args
2022-04-11 14:59:13 admin set github: 80730
2019-04-12 18:43:26 steve.dower set messages: +
2019-04-12 18:42:11 steve.dower set messages: +
2019-04-12 16:14:06 ZackerySpytz set nosy: + ZackerySpytzmessages: +
2019-04-12 15:36:11 steve.dower set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2019-04-12 15:35:48 steve.dower set nosy: + steve.dowermessages: +
2019-04-12 14:07:20 kingsley set keywords: + patchstage: needs patch -> patch reviewpull_requests: + <pull%5Frequest12731>
2019-04-10 20:49:41 kingsley set messages: +
2019-04-10 18:15:03 serhiy.storchaka set messages: +
2019-04-10 12:50:50 vstinner set nosy: - vstinner
2019-04-09 20:34:54 kingsley set nosy: + kingsleymessages: +
2019-04-07 10:57:18 serhiy.storchaka set type: enhancementcomponents: + Interpreter Core, Unicodeversions: + Python 3.8keywords: + easy (C)nosy: + serhiy.storchaka, ezio.melotti, vstinnermessages: + stage: needs patch
2019-04-07 10:40:51 steven.daprano create