[Python-Dev] PEP 393 Summer of Code Project (original) (raw)

Glenn Linderman v+python at g.nevcal.com
Thu Sep 1 11:20:59 CEST 2011


On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:

Glenn Linderman writes:

> How many different iterators into the same text would be concurrently > needed by an application? And why? A WYSIWYG editor for structured text (TeX, HTML) might want two (at least), one for the "source" window and one for the "rendered" window. One might want to save the state of the iterators (if that's possible) and cache it as one moves the "window" forward to make short backward motion fast, giving you two (or four, etc) more.

Sure. But those are probably all the same type of iterators — probably (since they are WYSIWYG) dealing with multi-codepoint characters (Guido's recent definition of grapheme, which seems to subsume both grapheme clusters and composed characters).

Hence all of them would be using/requiring the same sort of representation, index, analysis, or some combination of those.

> Seems like if it is dealing with text at the level of grapheme > clusters, it needs that type of iterator. Of course, if it does > I/O it needs codec access, but that is by nature sequential from > the starting point to the end point.

save-region' ? save-text-remove-markup' ?

Yes, save-region sounds like exactly what I was speaking of.
save-text-remove-markup I would infer needs to process the text to remove the markup characters... since you used TeX and HTML as examples, markup is text, not binary (which would be a different problem). Since the TeX and HTML markup is mostly ASCII, markup removal (or more likely, text extraction) could be performed via either a grapheme iterator, or a codepoint iterator, or even a code unit iterator. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/95124888/attachment.html>



More information about the Python-Dev mailing list