Pure python implementation of string-like class

Steve Holden steve at holdenweb.com
Sat Feb 25 10:08:34 EST 2006


Akihiro KAYAMA wrote:
> Hi all.
> 
> I would like to ask how I can implement string-like class using tuple
> or list. Does anyone know about some example codes of pure python
> implementation of string-like class?
> 
> Because I am trying to use Python for a text processing which is
> composed of a large character set. As the character set is wider than
> UTF-16(U+10FFFF), I can't use Python's native unicode string class.
> 
"Wider than UTF-16" doesn't make sense.

> So I want to prepare my own string class, which provides convenience
> string methods such as split, join, find and others like usual string
> class, but it uses a sequence of integer as a internal representation
> instead of a native string.  Obviously, subclassing of str doesn't
> help.
> 
> The implementation of each string methods in the Python source
> tree(stringobject.c) is far from python code, so I have started from
> scratch, like below:
> 
>     def startswith(self, prefix, start=-1, end=-1):
>         assert start < 0, "not implemented"
>         assert end < 0, "not implemented"
>         if isinstance(prefix, (str, unicode)):
>             prefix = MyString(prefix)
>         n = len(prefix)
>         return self[0:n] == prefix
> 
> but I found it's not a trivial task for myself to achive correctness
> and completeness. It smells "reinventing the wheel" also, though I
> can't find any hints in google and/or Python cookbook.
> 
> I don't care efficiency as a starting point. Any comments are welcome.
> Thanks.
> 
The UTF-16 encoding is capable of representing the whole of Unicode. 
There should be no need to do anything special to use UTF-16.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/




More information about the Python-list mailing list