[Python-Dev] bytes (original) (raw)

[Python-Dev] bytes / unicode

Toshio Kuratomi a.badger at gmail.com
Tue Jun 22 19:21:23 CEST 2010

Previous message: [Python-Dev] bytes / unicode
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote:

Toshio Kuratomi writes: > unicode handling redesign. I'm stating my reading of the RFC not to defend > the use case Philip has, but because I think that the outlook that non-text > uris (before being percentencoded) are violations of the RFC

That's not what I'm saying. What I'm trying to point out is that manipulating a bytes object as an URI sort of presumes a lot about its encoding as text.

I think we're more or less in agreement now but here I'm not sure. What manipulations are you thinking about? Which stage of URI construction are you considering?

I've just taken a quick look at python3.1's urllib module and I see that there is a bit of confusion there. But it's not about unicode vs bytes but about whether a URI should be operated on at the real URI level or the data-that-makes-a-uri level.

all functions I looked at take python3 str rather than bytes so there's no confusing stuff here
urllib.request.urlopen takes a strict uri. That means that you must have a percent encoded uri at this point
urllib.parse.urljoin takes regular string values
urllib.parse and urllib.unparse take regular string values

Since many of the URIs we deal with are more or less textual, why not take advantage of that? Cool, so to summarize what I think we agree on:

Percent encoded URIs are text according to the RFC.
The data that is used to construct the URI is not defined as text by the RFC.
However, it is very often text in an unspecified encoding
It is extremely convenient for programmers to be able to treat the data that is used to form a URI as text in nearly all common cases.

-Toshio -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: not available URL: <http://mail.python.org/pipermail/python-dev/attachments/20100622/a926e262/attachment.pgp>

Previous message: [Python-Dev] bytes / unicode
Next message: [Python-Dev] bytes / unicode
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list