Where does str class represent its data?

James Stroud jstroud at mbi.ucla.edu
Wed Jul 11 22:49:29 EDT 2007


ChrisEdgemon at gmail.com wrote:
> I'd like to implement a subclass of string that works like this:
> 
> 
>>>>m = MyString('mail')
>>>>m == 'fail'
> 
> True
> 
>>>>m == 'mail'
> 
> False
> 
>>>>m in ['fail', hail']
> 
> True
> 
> My best attempt for something like this is:
> 
> class MyString(str):
>   def __init__(self, seq):
>     if self == self.clean(seq): pass
>     else: self = MyString(self.clean(seq))
> 
>   def clean(self, seq):
>     seq = seq.replace("m", "f")
> 
> but this doesn't work.  Nothing gets changed.
> 
> I understand that I could just remove the clean function from the
> class and call it every time, but I use this class in several
> locations, and I think it would be much safer to have it do the
> cleaning itself.
> 

The "flat is better than nested" philosophy suggests that clean should 
be module level and you should initialize a MyString like such:

   m = MyString(clean(s))

Where clean is

   def clean(astr):
     return astr.replace('m', 'f')

Although it appears compulsory to call clean each time you instantiate 
MyString, note that you do it anyway when you check in your __init__. 
Here, you are explicit. Such an approach also eliminates the obligation 
to clean the string under conditions where you know it will already be 
clean--such as deserialization.

Also, you don't return anything from clean above, so you assign None to 
self here:

    self = MyString(self.clean(seq))

Additionally, it has been suggested that you use __new__. E.g.:

py> class MyString(str):
...   def __new__(cls, astr):
...     astr = astr.replace('m', 'f')
...     return super(MyString, cls).__new__(cls, astr)
...
py> MyString('mail')
'fail'

But this is an abuse of the str class if you intend to populate your 
subclasses with self-modifying methods such as your clean method. In 
this case, you might consider composition, wherein you access an 
instance of str as an attribute of class instances. The python standard 
library make this easy with the UserString class and the ability to add 
custom methods to its subclasses:

py> from UserString import UserString as UserString
py> class MyClass(UserString):
...   def __init__(self, astr):
...     self.data = self.clean(astr)
...   def clean(self, astr):
...     return astr.replace('m', 'f')
...
py> MyClass('mail')
'fail'
py> type(_)
<type 'instance'>

This class is much slower than str, but you can always access an 
instance's data attribute directly if you want fast read-only behavior.

py> astr = MyClass('mail').data
py> astr
'fail'

But now you are back to a built-in type, which is actually the 
point--not everything needs to be in a class. This isn't java.

James


-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list