Issue 920680: readline not implemented for UTF-16 (original) (raw)

Created on 2004-03-21 22:37 by bob.ippolito, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf16reader.py bob.ippolito,2004-05-18 23:22 monkeypatch to get utf16 readline support
utf16reader.py bob.ippolito,2004-05-19 18:38 second revision of monkeypatch
Messages (11)
msg54114 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-03-21 22:37
The StreamReader for UTF-16 (all three of them) doesn't implement readline.
msg54115 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-03-21 22:44
Logged In: YES user_id=38388 Patches are welcome :-)
msg54116 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-03-21 22:54
Logged In: YES user_id=139309 I don't need it enough to write a patch, but this is what I used instead.. and it seems like it might work: try: for line in inFile: tline = translator(line) outFile.write(tline) except NotImplementedError: BUFFER = 16384 bytes = inFile.read(BUFFER) while bytes: lines = bytes.split(u'\n') bytes = lines.pop() for line in lines: tline = translator(line) outFile.write(tline) newbytes = inFile.read(BUFFER) bytes += newbytes if not newbytes and bytes: bytes += u'\n'
msg54117 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-18 23:22
Logged In: YES user_id=139309 I've attached a monkeypatch to get readline support for utf-16 codecs.. import utf16reader utf16reader.install() It can be trivially inserted into the utf16 encodings implementation.. it would be really cool if someone would audit the implementation and sneak it in before Python 2.4 :)
msg54118 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-19 08:19
Logged In: YES user_id=38388 Thanks for the patch. Some comments: * Unicode has a lot more line-end markers than just LF; you should use .splitlines() to break lines at all of them * please collapse both methods (sized + unsized) into one method and default to 256 bytes for the buffer size
msg54119 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-19 18:38
Logged In: YES user_id=139309 Attaching a revised monkeypatch: * splitlines is used (I wasn't aware of the other unicode EOL markers) * 256 bytes is the new default buffer size Why do you want sized and unsized to be in the same function? They're both dispatched from readline as appropriate, and they are very different code paths. It would be much uglier as one function, so I'm not going to do it in my own code.
msg54120 - (view) Author: Jim Jewett (jimjjewett) Date: 2004-05-19 23:10
Logged In: YES user_id=764593 It might be just an upload/download quirk, but when I tried, this concatenated short lines. u"\n".join(...) worked better, but I'm not sure how that plays with other line breaks. It might work better to stick a class around the realine functions, so that self.buff can always be a (state-preserved) list; just return the first row, until the list length gets to one, then concatenate to that and resplit.
msg54121 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2004-05-26 19:41
Logged In: YES user_id=38388 I don't have time to review this now, but will get back to it after EuroPython if you ping me. Thanks.
msg54122 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-26 19:46
Logged In: YES user_id=139309 Can you please give an example of a case where short lines get concatenated? I can't fix it if I don't know what's wrong.
msg54123 - (view) Author: Bob Ippolito (bob.ippolito) * (Python committer) Date: 2004-05-26 19:52
Logged In: YES user_id=139309 Also, I've moved the latest copy of the code to my public repository at: http://svn.red-bean.com/bob/unicode/trunk/utf16reader.py this should be free of any quirks, but I still can't reproduce whatever problem jim is having.
msg65415 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2008-04-12 20:23
It seems this is no longer true.
History
Date User Action Args
2022-04-11 14:56:03 admin set github: 40061
2008-04-12 20:23:54 benjamin.peterson set status: open -> closedresolution: fixedmessages: + nosy: + benjamin.peterson
2004-03-21 22:37:17 bob.ippolito create