Pure python implementation of string-like class

Akihiro KAYAMA kayama at st.rim.or.jp
Sun Feb 26 17:49:46 EST 2006


Hi Ross. 

Thanks a lot for your clarifying. I didn't think my post could be an
Unicode frame. 

I don't know this mailing list is the right place talking about
Unicode issue, but as for me, a million codespace which UTF-16 brings
is not enough. It presume that same characters has a same codepoint.
But differs from the simple and beauty Roman Alphabet, it is sometimes
difficult to decide two kanji characters are "same" or not. Because
its glyph swings with various reason(ex. who, when and where it's
wrote). So first of all we assign codepoints, and next we consider
that "this character which appears in this Chinese historical book may
be the same character as this character in Unicode CJK Extension
A". Such an identifying characters is also one of my project's tasks.
I think this can be explanation why UTF-16 is enough for majority but
not for all.

Anyway, I suppose that implementing string-like classes is a generic
python issue. For example, it will be useful if a rich text class
which has style attributes like bold on each characters has also
string-like methods and can be dealt with like a string.

In article <1140975976.471949.20940 at t39g2000cwt.googlegroups.com>,
"Ross Ridge" <rridge at csclub.uwaterloo.ca> writes:

rridge> thiking about it, it might actually make sense to use strings as the
rridge> internal representation as a lot operations can be implemented by using
rridge> the standard string operation but multipling the offsets and lengths by
rridge> 4.

Ah, COOL! It sounds very nice. I'll try it.
Thanks again.

-- kayama



More information about the Python-list mailing list