[Python-Dev] Bytes path support (original) (raw)
Marko Rauhamaa marko at pacujo.net
Sat Aug 23 10:21:57 CEST 2014
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Stephen J. Turnbull" <stephen at xemacs.org>:
Just read as bytes and decode piecewise in one way or another. For Oleg's HTML case, there's a well-understood structure that can be used to determine retry points
HTML and XML are interesting examples since their encoding is initially unknown:
^
+--- Now I know it is UTF-8
^
+--- Now I know it was UTF-16
all along!
Then we have:
HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1
See how deep you have to parse the TCP stream before you realize the content encoding is UTF-16.
Marko
- Previous message: [Python-Dev] Bytes path support
- Next message: [Python-Dev] Bytes path support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]