[Python-Dev] PEP 393 Summer of Code Project

Glenn Linderman v+python at g.nevcal.com
Thu Sep 1 11:20:59 CEST 2011


On 9/1/2011 2:15 AM, Stephen J. Turnbull wrote:
> Glenn Linderman writes:
>
>   >  How many different iterators into the same text would be concurrently
>   >  needed by an application?  And why?
>
> A WYSIWYG editor for structured text (TeX, HTML) might want two (at
> least), one for the "source" window and one for the "rendered" window.
> One might want to save the state of the iterators (if that's possible)
> and cache it as one moves the "window" forward to make short backward
> motion fast, giving you two (or four, etc) more.

Sure.  But those are probably all the same type of iterators — probably 
(since they are WYSIWYG) dealing with multi-codepoint characters 
(Guido's recent definition of grapheme, which seems to subsume both 
grapheme clusters and composed characters).

Hence all of  them would be using/requiring the same sort of 
representation, index, analysis, or some combination of those.

>   >  Seems like if it is dealing with text at the level of grapheme
>   >  clusters, it needs that type of iterator.  Of course, if it does
>   >  I/O it needs codec access, but that is by nature sequential from
>   >  the starting point to the end point.
>
> `save-region' ?  `save-text-remove-markup' ?

Yes, save-region sounds like exactly what I was speaking of.  
save-text-remove-markup I would infer needs to process the text to 
remove the markup characters... since you used TeX and HTML as examples, 
markup is text, not binary (which would be a different problem).  Since 
the TeX and HTML markup is mostly ASCII, markup removal (or more likely, 
text extraction) could be performed via either a grapheme iterator, or a 
codepoint iterator, or even a code unit iterator.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20110901/95124888/attachment.html>


More information about the Python-Dev mailing list