msg339568 - (view) |
Author: Steven D'Aprano (steven.daprano) *  |
Date: 2019-04-07 10:40 |
str.capitalize appears to uppercase the first character of the string, which is okay for ASCII but not for non-English letters. For example, the letter NJ in Croatian appears as Nj at the start of words when the first character is capitalized: Njemačka ('Germany'), not NJemačka. (In ASCII, that's Njemacka not NJemacka.) https://en.wikipedia.org/wiki/Gaj's_Latin_alphabet#Digraphs But using any of: U+01CA LATIN CAPITAL LETTER NJ U+01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J U+01CC LATIN SMALL LETTER NJ we get the wrong result with capitalize: py> 'NJemačka'.capitalize() 'NJemačka' py> 'Njemačka'.capitalize() 'NJemačka' py> 'njemačka'.capitalize() 'NJemačka' I believe that the correct behaviour is to titlecase the first code point and lowercase the rest, which is what the Apache library here does: https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#capitalize-java.lang.String- |
|
|
msg339570 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2019-04-07 10:57 |
I think this is a reasonable change. Also the docs for str.title() should be fixed. |
|
|
msg339804 - (view) |
Author: Kingsley McDonald (kingsley) * |
Date: 2019-04-09 20:34 |
Hello there, I'm an absolute beginner here and this whole thing is a little overwhelming, so please bear with me. I think this would be a suitable first task for me to take on because it appears to be a simple one-line change (correct me if I'm mistaken, though). |
|
|
msg339878 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2019-04-10 18:15 |
This issue is easy if you know C. * Find the implementation of str.capitalize in unicodeobject.c and make it using the title case. See on the implementation of str.title for example. * Find tests for str.capitalize and aďd new cases. Finding the proper place for test may be the hardest part. * Update the documentation for str.capitalize. Add the versionchanged directive. * Fix the documentation for str.title. Use str.capitalize in the example. * Add the news and What's New entries. |
|
|
msg339890 - (view) |
Author: Kingsley McDonald (kingsley) * |
Date: 2019-04-10 20:49 |
Thanks for clarifying all of that! I now have the patch and tests working locally. However, I'm not too sure what documentation needs to be changed for str.title. Should it specify that only the first letter of digraphs are capitalised, rather than the full character? I sure hope I get the hang of this soon :-D |
|
|
msg340066 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2019-04-12 15:35 |
New changeset b015fc86f7b1f35283804bfee788cce0a5495df7 by Steve Dower (Kingsley M) in branch 'master': bpo-36549: str.capitalize now titlecases the first character instead of uppercasing it (GH-12804) https://github.com/python/cpython/commit/b015fc86f7b1f35283804bfee788cce0a5495df7 |
|
|
msg340067 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2019-04-12 15:36 |
Thanks! I'm a big fan of this change :) |
|
|
msg340076 - (view) |
Author: Zackery Spytz (ZackerySpytz) *  |
Date: 2019-04-12 16:14 |
I think that the PR may have been merged too quickly. Serhiy had made a list, and I think that the PR was missing some necessary changes. |
|
|
msg340095 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2019-04-12 18:42 |
What is missing? It looks like everything on Serhiy's list was done. |
|
|
msg340096 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2019-04-12 18:43 |
Oh, apart from the What's New section. But this looks enough like a bugfix (previous behaviour "wasn't capitalizing my name correctly" - new behaviour "now capitalizes my name correctly") that it's hardly critical to advertise it on that page. |
|
|