[Python-checkins] r43546 - in python/trunk: Doc/lib/liburlparse.tex Lib/test/test_urlparse.py Lib/urlparse.py Misc/NEWS (original) (raw)

fred.drake python-checkins at python.org
Sun Apr 2 00:14:44 CEST 2006


Author: fred.drake Date: Sun Apr 2 00:14:43 2006 New Revision: 43546

Modified: python/trunk/Doc/lib/liburlparse.tex python/trunk/Lib/test/test_urlparse.py python/trunk/Lib/urlparse.py python/trunk/Misc/NEWS Log: Patch #624325: urlparse.urlparse() and urlparse.urlsplit() results now sport attributes that provide access to the parts of the result.

Modified: python/trunk/Doc/lib/liburlparse.tex

--- python/trunk/Doc/lib/liburlparse.tex (original) +++ python/trunk/Doc/lib/liburlparse.tex Sun Apr 2 00:14:43 2006 @@ -25,48 +25,74 @@ \code{nntp}, \code{prospero}, \code{rsync}, \code{rtsp}, \code{rtspu}, \code{sftp}, \code{shttp}, \code{sip}, \code{sips}, \code{snews}, \code{svn}, \code{svn+ssh}, \code{telnet}, \code{wais}. + \versionadded[Support for the \code{sftp} and \code{sips} schemes]{2.5} The \module{urlparse} module defines the following functions: -\begin{funcdesc}{urlparse}{urlstring\optional{, default_scheme\optional{, allow_fragments}}} -Parse a URL into 6 components, returning a 6-tuple: (addressing -scheme, network location, path, parameters, query, fragment -identifier). This corresponds to the general structure of a URL: +\begin{funcdesc}{urlparse}{urlstring\optional{, + default_scheme\optional{, allow_fragments}}} +Parse a URL into six components, returning a 6-tuple. This +corresponds to the general structure of a URL: \code{\var{scheme}://\var{netloc}/\var{path};\var{parameters}?\var{query}#\var{fragment}}. Each tuple item is a string, possibly empty. -The components are not broken up in smaller parts (e.g. the network +The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. -The delimiters as shown above are not part of the tuple items, +The delimiters as shown above are not part of the result, except for a leading slash in the \var{path} component, which is -retained if present.

-Example:

-\begin{verbatim} -urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') -\end{verbatim}

-yields the tuple +retained if present. For example: \begin{verbatim} +>>> from urlparse import urlparse +>>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html') +>>> o ('http', 'www.cwi.nl:80', '/%7Eguido/Python.html', '', '', '') +>>> o.scheme +'http' +>>> o.port +80 +>>> o.geturl() +'http://www.cwi.nl:80/%7Eguido/Python.html' \end{verbatim} If the \var{default_scheme} argument is specified, it gives the -default addressing scheme, to be used only if the URL string does not +default addressing scheme, to be used only if the URL does not specify one. The default value for this argument is the empty string. -If the \var{allow_fragments} argument is zero, fragment identifiers +If the \var{allow_fragments} argument is false, fragment identifiers are not allowed, even if the URL's addressing scheme normally does -support them. The default value for this argument is \code{1}. -\end{funcdesc} +support them. The default value for this argument is \constant{True}. -\begin{funcdesc}{urlunparse}{tuple} -Construct a URL string from a tuple as returned by \code{urlparse()}. +The return value is actually an instance of a subclass of +\pytype{tuple}. This class has the following additional read-only +convenience attributes: + +\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present} + \lineiv{scheme} {0} {URL scheme specifier} {empty string} + \lineiv{netloc} {1} {Network location part} {empty string} + \lineiv{path} {2} {Hierarchical path} {empty string} + \lineiv{params} {3} {Parameters for last path element} {empty string} + \lineiv{query} {4} {Query component} {empty string} + \lineiv{fragment}{5} {Fragment identifier} {empty string} + \lineiv{username}{ } {User name} {\constant{None}} + \lineiv{password}{ } {Password} {\constant{None}} + \lineiv{hostname}{ } {Host name (lower case)} {\constant{None}} + \lineiv{port} { } {Port number as integer, if present} {\constant{None}} +\end{tableiv} + +See section\ref{urlparse-result-object}, ``Results of +\function{urlparse()} and \function{urlsplit()},'' for more +information on the result object. + +\versionchanged[Added attributes to return value]{2.5} +\end{funcdesc} + +\begin{funcdesc}{urlunparse}{parts} +Construct a URL from a tuple as returned by \code{urlparse()}. +The \var{parts} argument be any six-item iterable. This may result in a slightly different, but equivalent URL, if the -URL that was parsed originally had redundant delimiters, e.g. a ? with -an empty query (the draft states that these are equivalent). +URL that was parsed originally had unnecessary delimiters (for example, +a ? with an empty query; the RFC states that these are equivalent). \end{funcdesc} \begin{funcdesc}{urlsplit}{urlstring\optional{, @@ -79,12 +105,38 @@ separate the path segments and parameters. This function returns a 5-tuple: (addressing scheme, network location, path, query, fragment identifier). + +The return value is actually an instance of a subclass of +\pytype{tuple}. This class has the following additional read-only +convenience attributes: + +\begin{tableiv}{l|c|l|c}{member}{Attribute}{Index}{Value}{Value if not present} + \lineiv{scheme} {0} {URL scheme specifier} {empty string} + \lineiv{netloc} {1} {Network location part} {empty string} + \lineiv{path} {2} {Hierarchical path} {empty string} + \lineiv{query} {3} {Query component} {empty string} + \lineiv{fragment} {4} {Fragment identifier} {empty string} + \lineiv{username} { } {User name} {\constant{None}} + \lineiv{password} { } {Password} {\constant{None}} + \lineiv{hostname} { } {Host name (lower case)} {\constant{None}} + \lineiv{port} { } {Port number as integer, if present} {\constant{None}} +\end{tableiv} + +See section\ref{urlparse-result-object}, Results of +\function{urlparse()} and \function{urlsplit()},'' for more +information on the result object. + \versionadded{2.2} +\versionchanged[Added attributes to return value]{2.5} \end{funcdesc} -\begin{funcdesc}{urlunsplit}{tuple} +\begin{funcdesc}{urlunsplit}{parts} Combine the elements of a tuple as returned by \function{urlsplit()} into a complete URL as a string. +The \var{parts} argument be any five-item iterable. +This may result in a slightly different, but equivalent URL, if the +URL that was parsed originally had unnecessary delimiters (for example, +a ? with an empty query; the RFC states that these are equivalent). \versionadded{2.2} \end{funcdesc} @@ -93,22 +145,16 @@ (\var{base}) with a relative URL'' (\var{url}). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing -components in the relative URL.

-Example:

-\begin{verbatim} -urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') -\end{verbatim}

-yields the string +components in the relative URL. For example:

\begin{verbatim} +>>> from urlparse import urljoin +>>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') 'http://www.cwi.nl/%7Eguido/FAQ.html' \end{verbatim}

-The \var{allow_fragments} argument has the same meaning as for -\code{urlparse()}. +The \var{allow_fragments} argument has the same meaning and default as +for \function{urlparse()}. \end{funcdesc}

\begin{funcdesc}{urldefrag}{url} @@ -133,3 +179,61 @@ both Uniform Resource Names (URNs) and Uniform Resource Locators (URLs).} \end{seealso} + + +\subsection{Results of \function{urlparse()} and \function{urlsplit()}

Modified: python/trunk/Lib/test/test_urlparse.py

--- python/trunk/Lib/test/test_urlparse.py (original) +++ python/trunk/Lib/test/test_urlparse.py Sun Apr 2 00:14:43 2006 @@ -12,15 +12,53 @@ def checkRoundtrips(self, url, parsed, split): result = urlparse.urlparse(url) self.assertEqual(result, parsed)

@@ -187,6 +225,69 @@ ]: self.assertEqual(urlparse.urldefrag(url), (defrag, frag))

Modified: python/trunk/Lib/urlparse.py

--- python/trunk/Lib/urlparse.py (original) +++ python/trunk/Lib/urlparse.py Sun Apr 2 00:14:43 2006 @@ -41,7 +41,111 @@ _parse_cache = {}

-def urlparse(url, scheme='', allow_fragments=1): +class BaseResult(tuple):

@@ -53,7 +157,7 @@ url, params = _splitparams(url) else: params = ''

def _splitparams(url): if '/' in url: @@ -73,12 +177,13 @@ delim = len(url) return url[start:delim], url[delim:]

-def urlsplit(url, scheme='', allow_fragments=1): +def urlsplit(url, scheme='', allow_fragments=True): """Parse a URL into 5 components: :///?# Return a 5-tuple: (scheme, netloc, path, query, fragment). Note that we don't break the components up in smaller bits (e.g. netloc is a single string) and we don't expand % escapes."""

@@ -111,9 +216,9 @@ url, fragment = url.split('#', 1) if scheme in uses_query and '?' in url: url, query = url.split('?', 1)

def urlunparse((scheme, netloc, url, params, query, fragment)): """Put a parsed URL back together again. This may result in a @@ -136,7 +241,7 @@ url = url + '#' + fragment return url

-def urljoin(base, url, allow_fragments = 1): +def urljoin(base, url, allow_fragments=True): """Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.""" if not base:

Modified: python/trunk/Misc/NEWS

--- python/trunk/Misc/NEWS (original) +++ python/trunk/Misc/NEWS Sun Apr 2 00:14:43 2006 @@ -489,6 +489,9 @@ Library

+- Patch #624325: urlparse.urlparse() and urlparse.urlsplit() results + now sport attributes that provide access to the parts of the result. + - Patch #1462498: sgmllib now handles entity and character references in attribute values.


More information about the Python-checkins mailing list