Pure python implementation of string-like class
Steve Holden
steve at holdenweb.com
Sat Feb 25 10:08:34 EST 2006
Akihiro KAYAMA wrote:
> Hi all.
>
> I would like to ask how I can implement string-like class using tuple
> or list. Does anyone know about some example codes of pure python
> implementation of string-like class?
>
> Because I am trying to use Python for a text processing which is
> composed of a large character set. As the character set is wider than
> UTF-16(U+10FFFF), I can't use Python's native unicode string class.
>
"Wider than UTF-16" doesn't make sense.
> So I want to prepare my own string class, which provides convenience
> string methods such as split, join, find and others like usual string
> class, but it uses a sequence of integer as a internal representation
> instead of a native string. Obviously, subclassing of str doesn't
> help.
>
> The implementation of each string methods in the Python source
> tree(stringobject.c) is far from python code, so I have started from
> scratch, like below:
>
> def startswith(self, prefix, start=-1, end=-1):
> assert start < 0, "not implemented"
> assert end < 0, "not implemented"
> if isinstance(prefix, (str, unicode)):
> prefix = MyString(prefix)
> n = len(prefix)
> return self[0:n] == prefix
>
> but I found it's not a trivial task for myself to achive correctness
> and completeness. It smells "reinventing the wheel" also, though I
> can't find any hints in google and/or Python cookbook.
>
> I don't care efficiency as a starting point. Any comments are welcome.
> Thanks.
>
The UTF-16 encoding is capable of representing the whole of Unicode.
There should be no need to do anything special to use UTF-16.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC www.holdenweb.com
PyCon TX 2006 www.python.org/pycon/
More information about the Python-list
mailing list