[Python-Dev] urllib.request.urlopen struggling in Windows 7 (original) (raw)
Thom Ives thom.ives at hp.com
Tue Nov 15 00:31:12 CET 2011
- Previous message: [Python-Dev] [Python-checkins] cpython (merge 3.2 -> default): Fix memory leak with FLUFL-related syntax errors (!)
- Next message: [Python-Dev] urllib.request.urlopen struggling in Windows 7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Previously, in python 2.6, I had made a lot of use of urllib.urlopen to capture web page content and then post process the data from the site I was downloading. Now, those routines, and the new routines I am trying to use for python 3.2 are running into what seems to be a windows only (maybe even windows 7 only problem).
Using the following code with python 3.2.2 (64) on windows 7 ...
import urllib.request
fp = urllib.request.urlopen(URL_string_that_I_use)
string = fp.read() fp.close() print(string.decode("utf8"))
I get the following message: Traceback (most recent call last): File "TATest.py", line 5, in string = fp.read() File "d:\python32\lib\http\client.py", line 489, in read return self._read_chunked(amt) File "d:\python32\lib\http\client.py", line 553, in _read_chunked self._safe_read(2) # toss the CRLF at the end of the chunk File "d:\python32\lib\http\client.py", line 592, in _safe_read raise IncompleteRead(b''.join(s), amt) http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)
Using the following code instead ...
import urllib.request
fp = urllib.request.urlopen(URL_string_that_I_use) for Line in fp: print(Line.decode("utf8").rstrip('\n')) fp.close()
I get a fair amount of the web page's content, but then the rest of the capture is thwarted by ...
Traceback (most recent call last): File "TATest.py", line 9, in for Line in fp: File "d:\python32\lib\http\client.py", line 489, in read return self._read_chunked(amt) File "d:\python32\lib\http\client.py", line 545, in _read_chunked self._safe_read(2) # toss the CRLF at the end of the chunk File "d:\python32\lib\http\client.py", line 592, in _safe_read raise IncompleteRead(b''.join(s), amt) http.client.IncompleteRead: IncompleteRead(0 bytes read, 2 more expected)
Trying to read another page yields ...
Traceback (most recent call last): File "TATest.py", line 11, in print(Line.decode("utf8").rstrip('\n')) File "d:\python32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\x92' in position 21: character maps to
I do believe this is a windows issue, but can python be made more robust to deal with what is causing it? When trying similar code on Linux, we do not encounter the problem.
- Previous message: [Python-Dev] [Python-checkins] cpython (merge 3.2 -> default): Fix memory leak with FLUFL-related syntax errors (!)
- Next message: [Python-Dev] urllib.request.urlopen struggling in Windows 7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]