[Python-Dev] bytes.from_hex() (original) (raw)
Ron Adam rrr at ronadam.com
Sun Feb 19 04:54:44 CET 2006
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Josiah Carlson wrote:
Ron Adam <rrr at ronadam.com> wrote:
Except that ambiguates it even further.
Is encodings.tounicode() encoding, or decoding? According to everything you have said so far, it would be decoding. But if I am decoding binary data, why should it be spending any time as a unicode string? What do I mean?
Encoding and decoding are relative concepts. It's all encoding from one thing to another. Weather it's "decoding" or "encoding" depends on the relationship of the current encoding to a standard encoding.
The confusion introduced by "decode" is when the 'default_encoding' changes, will change, or is unknown.
x = f.read() #x contains base-64 encoded binary data y = encodings.tounicode(x, 'base64')
y now contains BINARY DATA, except that it is a unicode string
No, that wasn't what I was describing. You get a Unicode string object as the result, not a bytes object with binary data. See the toy example at the bottom.
z = encodings.tostr(y, 'latin-1')
Later you define a strtostr function, which I (or someone else) would use like: z = strtostr(x, 'base64', 'latin-1') But the trick is that I don't want some unicode string encoded in latin-1, I want my binary data unencoded. They may happen to be the same in this particular example, but that doesn't mean that it makes any sense to the user.
If you want bytes then you would use the bytes() type to get bytes directly. Not encode or decode.
binary_unicode = bytes(unicode_string)
The exact byte order and representation would need to be decided by the python developers in this case. The internal representation 'unicode-internal', is UCS-2 I believed.
It's no more ambiguous than any math operation where you can do it one way with one operations and get your original value back with the same operation by using an inverse value.
n2=n+1; n3=n+(-1); n==n3 n2=n2; n3=n(.5); n==n3 Ahh, so you are saying 'tobase64' and 'frombase64'. There is one major reason why I don't like that kind of a system: I can't just say encoding='base64' and use str.encode(encoding) and str.decode(encoding), I necessarily have to use, str.recode('to'+encoding) and str.recode('from'+encoding) . Seems a bit awkward.
Yes, but the encodings API could abstract out the 'to_base64' and 'from_base64' so you can just say 'base64' and have it work either way.
Maybe a toy "incomplete" example might help.
# in module bytes.py or someplace else.
class bytes(list):
"""
bytes methods defined here
"""
# in module encodings.py
# using a dict of lists, but other solutions would
# work just as well.
unicode_codecs = {
'base64': ('from_base64', 'to_base64'),
}
def tounicode(obj, from_codec):
b = bytes(obj)
b = b.recode(unicode_codecs[from_codec][0])
return unicode(b)
def tostr(obj, to_codec):
b = bytes(obj)
b = b.recode(unicode_codecs[to_codec][1])
return str(b)
# in your application
import encodings
... a bunch of code ...
u = encodings.tounicode(s, 'base64')
# or if going the other way
s = encodings.tostr(u, 'base64')
Does this help? Is the relationship between the bytes object and the encodings API clearer here? If not maybe we should discuss it further off line.
Cheers, Ronald Adam
- Previous message: [Python-Dev] bytes.from_hex()
- Next message: [Python-Dev] bytes.from_hex()
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]