msg71201 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-16 09:14 |
A few weeks ago I fixed the struct module's documentation which wasn't 3.0 compliant (basically renaming "strings" to "bytes" and "unicode" to "string"). Now I've had a look at the array module, and it's got similar problems. http://docs.python.org/dev/3.0/library/array.html Unfortunately, the method names are wrong as far as Py3K is concerned. "tostring" returns what is now called a "bytes", and "tounicode" returns what is now called a "string". There are a few other errors in the documentation too, like the 'c' type code (which no longer exists, but is still documented), and examples using Python 2 syntax. Those are trivial to fix. I suggest a 3-step process for fixing this: 1. Update the documentation to describe the 3.0 behaviour using 3.0 terminology, even though the method names are wrong (I've done this already). 2. Rename "tostring" and "fromstring" methods to "tobytes" and "frombytes". I think this is quite important as the value being returned can no longer be described as a "string". 3. Rename "tounicode" and "fromunicode" methods to "tostring" and "fromstring". I think this is less important, as the name "unicode" isn't ambiguous, and potentially undesirable, as we'd be re-using method names which previously did something else. I'm aware we've got the final beta in 4 days, and there's no way my phase 2-3 can be done after that. I think we should aim to do phase 2, but probably not phase 3. I've fixed the documentation to accurately describe the current behaviour, using Python 3 terminology. This doesn't change any behaviour at all, so it should be able to be committed immediately. I'll have a go at a "phase 2" patch shortly. Is it feasible to even think about renaming a method at this stage? Commit log: Doc/library/array.rst, Modules/arrayobject.c: Updated array module documentation to be Python 3.0 compliant. * Removed references to 'c' type code (no longer valid). * References to "string" changed to "bytes". * References to "unicode" changed to "string". * Updated examples to use Python 3.0 syntax (and show the output of evaluating them). |
|
|
msg71202 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2008-08-16 09:22 |
> 2. Rename "tostring" and "fromstring" methods to "tobytes" and > "frombytes". I think this is quite important as the value being returned > can no longer be described as a "string". I'm not a native speaker (of English), but my understanding is that the noun "string", in itself, can very well be used to describe this type: the result is a "byte string", as opposed to a "character string". Merriam-Webster's seems to agree; meaning 5b(2) is "a sequence of like items (as bits, characters, or words)" |
|
|
msg71203 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-16 09:59 |
> I'm not a native speaker (of English), but my understanding is that the > noun "string", in itself, can very well be used to describe this type: > the result is a "byte string", as opposed to a "character string". > Merriam-Webster's seems to agree; meaning 5b(2) is "a sequence of like > items (as bits, characters, or words)" Ah yes, that's quite right (and computer science literature will strongly support that claim as well). However the word "string", unqualified, and in Python 3.0 terminology (as described in PEP 358) now refers only to the "str" type (formerly known as "unicode"), so it is very confusing to have a method "tostring" which returns a bytes object. For array to become a good Py3k citizen, I'd strongly argue that tostring/fromstring should be renamed to tobytes/frombytes. I'm currently writing a patch for that - it looks like there's very minimal damage. However as a separate issue, I think the documentation update should be approved first. |
|
|
msg71204 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-16 10:00 |
(Fixed issue title) |
|
|
msg71205 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-16 10:26 |
I renamed tostring/fromstring to tobytes/frombytes in the array module, as described above. I then grepped the entire py3k tree for "tostring" and "fromstring", and carefully replaced all references which pertain to array objects. The relatively minor number of these references suggests this won't be a big problem. All the test cases pass. I haven't (yet) renamed tounicode/fromunicode to tostring/fromstring. The more I think about it, the more that sounds like a bad idea (and could create confusion as to whether this is a character string or byte string, as Martin pointed out). The patch (doc+bytesmethods.patch) does both the original doc-only.patch, plus the renaming and updating of all usages. Use the above commit log, plus: Renamed array.tostring to array.tobytes, and array.fromstring to array.frombytes, to reflect the Python 3.0 terminology. Updated all references to these methods in Lib to the new names. |
|
|
msg71206 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-16 10:30 |
Oops .. forgot to update the array.rst docs with the new method names. Replaced doc+bytesmethods.patch with a fixed version. |
|
|
msg71555 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-08-20 16:15 |
A similar issue came up in another bug (http://bugs.python.org/issue3613), and Guido said: "IMO it's okay to add encodebytes(), but let's leave encodestring() around with a deprecation warning, since it's so late in the release cycle." I think that's probably wise RE this bug as well - my original suggestion to REPLACE tostring/fromstring with tobytes/frombytes was probably a bit over-zealous. I'll have another go at this during some spare cycles tomorrow - basically taking my current patch and adding tostring/fromstring back in, to call tobytes/frombytes with deprecation warnings. Does this sound like a good plan? (Also policy question: When you have deprecated functions, how do you document them? I assume you say "deprecated" in the docs; is there a standard template for this?) |
|
|
msg72439 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2008-09-04 00:02 |
Can I just remind people that I have a documentation patch ready here (and has been for about a month)? Of course the doc+bytesmethods.patch may be debatable and probably too late to go in 3.0. But you should be able to commit doc-only.patch with no problems. Current array documentation (http://docs.python.org/dev/3.0/library/array.html) is clearly wrong in Python 3.0 (even containing syntax errors). |
|
|
msg83664 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-03-17 10:31 |
Benjamin, do you think this should be fixed in 3.1? |
|
|
msg83668 - (view) |
Author: Benjamin Peterson (benjamin.peterson) *  |
Date: 2009-03-17 11:48 |
It would be nice to deprecate the old names in 3.1 and remove them in 3.2, but I think it should get approval on python-dev. |
|
|
msg83670 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2009-03-17 12:15 |
Note that, irrespective of the changes to the library itself, the documentation is out of date since it still uses the old "string/unicode" nomenclature, rather than the new "bytes/string". I have provided a separate documentation patch which should be applicable with relatively little fuss. (It's from August so it will probably conflict, but I can update it if necessary). |
|
|
msg86295 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2009-04-22 14:37 |
The doc patch is in scope for the Bug Day. |
|
|
msg86393 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2009-04-24 03:36 |
OK since the patches I submitted are now eight months old, I just did an update and re-applied them. I am submitting new patch files which don't change anything, but are patches against revision 71822 (should be much easier to apply). I'd still like to see doc+bytesmethods.patch applied (since it fixes method names which make no sense at all in Python 3.0 context), but since it's getting a bit late for that, I'll be happy for the doc-only patch to be accepted (which merely corrects the documentation which is still using Python 2.x terminology). |
|
|
msg86394 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2009-04-24 03:36 |
Full method renaming patch. |
|
|
msg86397 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2009-04-24 05:04 |
I think this patch is unacceptable for Python 3.1. It is an incompatible change (removing a method), one would have to deprecate the method to be removed first. I also agree with Benjamin that a wider-audience approval of the deprecation would be required. I, myself, remain opposed to this change. |
|
|
msg86398 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2009-04-24 05:12 |
I agree with that -- too big a change to make now. But can we please get the documentation patch accepted? It's been waiting here for eight months with corrections to clearly-incorrect documentation. |
|
|
msg130404 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2011-03-09 02:55 |
In 3.2, a change *was* committed (by who?) but not recorded here: .from/.tostring were renamed .from/.tobytes and kept as deprecated aliases. Is there anything more to this issue other than removing the deprecated aliases in 3.3 (which could be done now if that was the intention or 3.4 in not)? Is there still any idea/intention of renaming .from/.tounicode to .from/.tostring? That would have to be done at least one version with the 'string' names absent, and would have little gain as 'unicode' is clear. |
|
|
msg140189 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2011-07-12 13:43 |
It was Antoine in fa8b57f987c5, for #8990. |
|
|
msg140210 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2011-07-12 19:51 |
> Is there still any idea/intention of renaming .from/.tounicode to > .from/.tostring? That would have to be done at least one version with > the 'string' names absent, and would have little gain as 'unicode' is > clear. Indeed, not only it would bring little benefit, but may also confuse users porting from 2.x (since the from/tostring methods would then have a totally different meaning). |
|
|
msg140224 - (view) |
Author: Matt Giuca (mgiuca) |
Date: 2011-07-13 03:57 |
There are still some inconsistencies in the documentation (in particular, incorrectly using the word "string" to refer to a bytes object, which made sense in Python 2 but not 3), which I fixed in my doc-only.patch file that's coming up to its third birthday. Most of it has been fixed with the previous change which added 'tobytes' and 'frombytes' and made tostring and fromstring aliases. But there are some places which don't make sense: array: "If given a list or string" needs to be "If given a list, bytes or string" (since a bytes is not a string). frombytes: "Appends items from the string" needs to be "Appends items from the bytes object", since this does not work if you give it a string. Less importantly, I also recommended renaming "unicode string" to just "string", since in Python 3 there is no such thing as a non-unicode string. For instance, there is an example that uses a variable named "unicodestring" that could be renamed to just "string". > Indeed, not only it would bring little benefit, but may also confuse > users porting from 2.x (since the from/tostring methods would then > have a totally different meaning). Well, by that logic, you shouldn't have renamed "unicode" to "str" since that would also confuse users porting from 2.x. It generally seems like a good idea in Python 3 to rename all mentions of "string" to "bytes" and all mentions of "unicode" to "string", so as to be consistent with the new names of the types (it is better to be internally consistent than consistent with the previous version). Though I do agree that it would be chaos to rename array.from/tounicode to from/tostring now, given that array.from/tostring already has a different meaning in Python 3. |
|
|