[Python-Dev] bytes.from_hex() (original) (raw)
Ron Adam rrr at ronadam.com
Sat Feb 18 13:17:42 CET 2006
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Josiah Carlson wrote:
Ron Adam <rrr at ronadam.com> wrote:
Josiah Carlson wrote:
Bengt Richter had a good idea with bytes.recode() for strictly bytes transformations (and the equivalent for text), though it is ambiguous as to the direction; are we encoding or decoding with bytes.recode()? In my opinion, this is why .encode() and .decode() makes sense to keep on both bytes and text, the direction is unambiguous, and if one has even a remote idea of what the heck the codec is, they know their result.
- Josiah I like the bytes.recode() idea a lot. +1 It seems to me it's a far more useful idea than encoding and decoding by overloading and could do both and more. It has a lot of potential to be an intermediate step for encoding as well as being used for many other translations to byte data. Indeed it does. I think I would prefer that encode and decode be just functions with well defined names and arguments instead of being methods or arguments to string and Unicode types. Attaching it to string and unicode objects is a useful convenience. Just like x.replace(y, z) is a convenience for string.replace(x, y, z) . Tossing the encode/decode somewhere else, like encodings, or even string, I see as a backwards step. I'm not sure on exactly how this would work. Maybe it would need two sets of encodings, ie.. decoders, and encoders. An exception would be given if it wasn't found for the direction one was going in. Roughly... something or other like: import encodings encodings.tostr(obj, encoding): if encoding not in encoders: raise LookupError 'encoding not found in encoders' # check if obj works with encoding to string # ... b = bytes(obj).recode(encoding) return str(b) encodings.tounicode(obj, decodeing): if decoding not in decoders: raise LookupError 'decoding not found in decoders' # check if obj works with decoding to unicode # ... b = bytes(obj).recode(decoding) return unicode(b) Anyway... food for thought. Again, the problem is ambiguity; what does bytes.recode(something) mean? Are we encoding to something, or are we decoding from something?
This was just an example of one way that might work, but here are my thoughts on why I think it might be good.
In this case, the ambiguity is reduced as far as the encoding and decodings opperations are concerned.)
somestring = encodings.tostr( someunicodestr, 'latin-1')
It's pretty clear what is happening to me.
It will encode to a string an object, named someunicodestr, with
the 'latin-1' encoder.
And also rusult in clear errors if the specified encoding is unavailable, and if it is, if it's not compatible with the given someunicodestr obj type.
Further hints could be gained by.
help(encodings.tostr)
Which could result in... something like... """ encoding.tostr( <string|unicode>, ) -> string
Encode a unicode string using a encoder codec to a
non-unicode string or transform a non-unicode string
to another non-unicode string using an encoder codec.
"""
And if that's not enough, then help(encodings) could give more clues. These steps would be what I would do. And then the next thing would be to find the python docs entry on encodings.
Placing them in encodings seems like a fairly good place to look for these functions if you are working with encodings. So I find that just as convenient as having them be string methods.
There is no intermediate default encoding involved above, (the bytes object is used instead), so you wouldn't get some of the messages the present system results in when ascii is the default.
(Yes, I know it won't when P3K is here also)
Are we going to need to embed the direction in the encoding/decoding name (tobase64, frombase64, etc.)? That doesn't any better than binascii.b2abase64 .
No, that's why I suggested two separate lists (or dictionaries might be better). They can contain the same names, but the lists they are in determine the context and point to the needed codec. And that step is abstracted out by putting it inside the encodings.tostr() and encodings.tounicode() functions.
So either function would call 'base64' from the correct codec list and get the correct encoding or decoding codec it needs.
What about .reencode and .redecode? It seems as
though the 're' added as a prefix to .encode and .decode makes it clearer that you get the same type back as you put in, and it is also unambiguous to direction.
But then wouldn't we end up with multitude of ways to do things?
s.encode(codec) == s.redecode(codec)
s.decode(codec) == s.reencode(codec)
unicode(s, codec) == s.decode(codec)
str(u, codec) == u.encode(codec)
str(s, codec) == s.encode(codec)
unicode(s, codec) == s.reencode(codec)
str(u, codec) == s.redecode(codec)
str(s, codec) == s.redecode(codec)
Umm .. did I miss any? Which ones would you remove?
Which ones of those will succeed with which codecs?
The method bytes.recode(), always does a byte transformation which can be almost anything. It's the context bytes.recode() is used in that determines what's happening. In the above cases, it's using an encoding transformation, so what it's doing is precisely what you would expect by it's context.
There isn't a bytes.decode(), since that's just another transformation. So only the one method is needed. Which makes it easer to learn.
The question remains: is str.decode() returning a string or unicode depending on the argument passed, when the argument quite literally names the codec involved, difficult to understand? I don't believe so; am I the only one?
- Josiah
Using help(str.decode) and help(str.encode) gives:
S.decode([encoding[,errors]]) -> object
S.encode([encoding[,errors]]) -> object
These look an awful lot alike. The descriptions are nearly identical as well. The Python docs just reproduce (or close to) the doc strings with only a very small amount of additional words.
Learning how the current system works comes awfully close to reverse engineering. Maybe I'm overstating it a bit, but I suspect many end up doing exactly that in order to learn how Python does it.
Or they go with the first solution that seems to work and hope for the best. I believe that's what Martin said earlier in this thread.
It's much too late (or early now) to think further on this. So until tomorrow.
(please ignore typos) ;-)
Cheers, Ronald Adam
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]