msg339317 - (view) |
Author: Jun (Jun) * |
Date: 2019-04-02 06:36 |
I was looking for a list of Unicode codepoints that str.isspace() returns true. According to https://docs.python.org/3/library/stdtypes.html#str.isspace, it's "Whitespace characters are those characters defined in the Unicode character database as “Other” or “Separator” and those with bidirectional property being one of “WS”, “B”, or “S”." However, for U+202F(https://www.fileformat.info/info/unicode/char/202f/index.htm) which is a "Separator" and its bidirectional property is "CS", str.isspace() returns True while it shouldn't if we follow the definition above. >>> "\u202f".isspace() True I'm not sure either the documentation should be updated or behavior should be updated, but at least those should be consistent. |
|
|
msg339318 - (view) |
Author: SilentGhost (SilentGhost) *  |
Date: 2019-04-02 06:59 |
I think you have to read that "and" as "or". It's sufficient that '\u202f' is a separator for it to be considered a whitespace character. |
|
|
msg339336 - (view) |
Author: Jun_ (Jun_) |
Date: 2019-04-02 14:32 |
Do you mean read the statement as follows? Whitespace characters are characters that satisfy either one of: 1. Character type is "Other" 2. Character type is "Separator" 3. Characters with "WS", "B", or "S" bidirectional property If that's the case, this is also not reflect the behavior as most of characters in "Other" are not whitespace characters and in fact str.isspace() returns False for those characters. |
|
|
msg339339 - (view) |
Author: SilentGhost (SilentGhost) *  |
Date: 2019-04-02 14:56 |
According to comment for _PyUnicode_IsWhitespace it's supposed to include Zs category, plus documented BIDI properties. So, I'm not sure where "Other" came from. |
|
|
msg348947 - (view) |
Author: Greg Price (Greg Price) * |
Date: 2019-08-03 08:18 |
The actual behavior turns out to match that comment. See attached PR, which adds a test confirming that and also corrects the documentation. (A related issue is #18236 -- we should probably adjust the definition to match the one Unicode now provides. But meanwhile we'll want to correct the docs.) |
|
|
msg349678 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-08-14 11:05 |
New changeset 6bccbe7dfb998af862a183f2c36f0d4603af2c29 by Victor Stinner (Greg Price) in branch 'master': bpo-36502: Correct documentation of str.isspace() (GH-15019) https://github.com/python/cpython/commit/6bccbe7dfb998af862a183f2c36f0d4603af2c29 |
|
|
msg349947 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-08-19 09:53 |
New changeset 8c1c426a631ba02357112657193f82c58d3e08b4 by Victor Stinner (Greg Price) in branch '3.8': bpo-36502: Correct documentation of str.isspace() (GH-15019) (GH-15296) https://github.com/python/cpython/commit/8c1c426a631ba02357112657193f82c58d3e08b4 |
|
|
msg349948 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-08-19 10:10 |
New changeset 0fcdd8d6d67f57733203fc79e6a07a89b924a390 by Miss Islington (bot) in branch '3.7': bpo-36502: Correct documentation of str.isspace() (GH-15019) (GH-15296) https://github.com/python/cpython/commit/0fcdd8d6d67f57733203fc79e6a07a89b924a390 |
|
|
msg349950 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2019-08-19 10:14 |
str.isspace() documentation has been fixed, thanks Greg Price for the fix! I close the issue. |
|
|
msg349983 - (view) |
Author: Greg Price (Greg Price) * |
Date: 2019-08-20 01:33 |
Thanks Victor for the reviews and merges! (Unmarking 2.7, because https://docs.python.org/2/library/stdtypes.html seems to not have this issue.) |
|
|
msg351526 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2019-09-09 16:37 |
New changeset 64c6ac74e254d31f93fcc74bf02b3daa7d3e3f25 by Benjamin Peterson (Greg Price) in branch 'master': bpo-36502: Update link to UAX #44, the Unicode doc on the UCD. (GH-15301) https://github.com/python/cpython/commit/64c6ac74e254d31f93fcc74bf02b3daa7d3e3f25 |
|
|
msg351536 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2019-09-09 17:10 |
New changeset 58d61efd4cdece3b026868a66d829001198d29b1 by Benjamin Peterson in branch '2.7': [2.7] bpo-36502: Update link to UAX GH-44, the Unicode doc on the UCD. (GH-15808) https://github.com/python/cpython/commit/58d61efd4cdece3b026868a66d829001198d29b1 |
|
|
msg351545 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-09-09 18:40 |
New changeset 0a86da87da82c4a28d7ec91eb54c0b9ca40bbea7 by Miss Islington (bot) in branch '3.7': bpo-36502: Update link to UAX GH-44, the Unicode doc on the UCD. (GH-15301) https://github.com/python/cpython/commit/0a86da87da82c4a28d7ec91eb54c0b9ca40bbea7 |
|
|
msg351546 - (view) |
Author: miss-islington (miss-islington) |
Date: 2019-09-09 18:41 |
New changeset c1c04cbc24c11cd7a47579af3faffee05a16acd7 by Miss Islington (bot) in branch '3.8': bpo-36502: Update link to UAX GH-44, the Unicode doc on the UCD. (GH-15301) https://github.com/python/cpython/commit/c1c04cbc24c11cd7a47579af3faffee05a16acd7 |
|
|