[Python-Dev] email package status in 3.X (original) (raw)
P.J. Eby pje at telecommunity.com
Mon Jun 21 19:17:57 CEST 2010
- Previous message: [Python-Dev] email package status in 3.X
- Next message: [Python-Dev] email package status in 3.X
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
At 11:43 AM 6/21/2010 -0400, Barry Warsaw wrote:
On Jun 21, 2010, at 10:20 PM, Nick Coghlan wrote: >Something that may make sense to ease the porting process is for some >of these "on the boundary" I/O related string manipulation functions >(such as os.path.join) to grow "encoding" keyword-only arguments. The >recommended approach would be to provide all strings, but bytes could >also be accepted if an encoding was specified. (If you want to mix >encodings - tough, do the decoding yourself).
This is probably a stupid idea, and if so I'll plead Monday morning mindfuzz for it. Would it make sense to have "encoding-carrying" bytes and str types?
It's not a stupid idea, and could potentially work. It also might have a better chance of being able to actually be implemented in 3.x than my idea.
Basically, I'm thinking of types (maybe even the current ones) that carry around a .encoding attribute so that they can be automatically encoded and decoded where necessary. This at least would simplify APIs that need to do the conversion.
I'm not really sure how much use the encoding is on a unicode object
- what would it actually mean?
Hm. I suppose it would effectively mean "this string can be represented in this encoding" -- which is useful, in that you could fail operations when combining with bytes of a different encoding.
Hm... no, in that case you should just encode the string to the bytes' encoding, and let that throw an error if it fails. So, really, there's no reason for a string to know its encoding. All you need is the bytes type to have an encoding attribute, and when doing mixed-type operations between bytes and strings, coerce to bytes of the same encoding.
However, if .encoding is None, then coercion would follow the same rules as now -- i.e., convert the bytes to unicode, assuming an ascii encoding. (This would be different than setting an encoding of 'ascii', because in that case, it means you want cross-type operations to result in ascii bytes, rather than a unicode string, and to fail if the unicode part can't be encoded appropriately. The 'None' setting is effectively a nod to compatibility with prior 3.x versions, since I assume we can't just throw out the old coercion behavior.)
Then, a few more changes to the bytes type would round out the implementation:
Allow .decode() to not specify an encoding, unless .encoding is None
Add back in the missing string methods (e.g. .encode()), since you can transparently upgrade to a string)
Smart str, as shown in your proposal.
Would it be feasible? Dunno.
Probably, although it might mean adding back in special cases that were previously taken out, and a few new ones.
Would it help ease the bytes/str confusion? Dunno.
Not sure what confusion you mean -- Web-SIG and I at least are not confused about the difference between bytes and str, or we wouldn't be having an issue. ;-) Or maybe you mean the stdlib's API confusion? In which case, yes, definitely!
But I think it would help make APIs easier to design and use because it would cut down on the encoding-keyword function signature infection.
Not only that, but I believe it would also retroactively make the stdlib's implementation of those APIs "correct" again, and give us One Obvious Way to work with bytes of a known encoding, while constraining any unicode that gets combined with those bytes to be validly encodable. It also gives you an idempotent constructor for bytes of a specified encoding, that can take either a bytes of unspecified encoding, a bytes of the correct encoding, or a string that can be encoded as such.
In short, +1. (I wish it were possible to go back and make bytes non-strings and have only this ebytes or bstr or whatever type have string methods, but I'm pretty sure that ship has already sailed.)
- Previous message: [Python-Dev] email package status in 3.X
- Next message: [Python-Dev] email package status in 3.X
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]