Where does str class represent its data?

James Stroud jstroud at mbi.ucla.edu
Thu Jul 12 19:04:48 EDT 2007


ChrisEdgemon at gmail.com wrote:
> On Jul 11, 9:49 pm, James Stroud <jstr... at mbi.ucla.edu> wrote:

>>The "flat is better than nested" philosophy suggests that clean should
>>be module level and you should initialize a MyString like such:
>>
>>   m = MyString(clean(s))
>>
>>Where clean is
>>
>>   def clean(astr):
>>     return astr.replace('m', 'f')
>>
>>Although it appears compulsory to call clean each time you instantiate
>>MyString, note that you do it anyway when you check in your __init__.
>>Here, you are explicit.
> 
> Initially, I tried simply calling a clean function on a regular
> string, without any of this messy subclassing.  However, I would end
> up accidentally cleaning it more than once, and transforming the
> string was just very messy.

Its not clear what you mean here. A code snippet might help. In theory, 
you can encapsulate any amount of cleaning inside a single function, so 
it shouldn't be messy. You need only to return the result.


def fix_whitespace(astr):
   import string
   astr = ''.join(c if c not in string.whitespace else '-' for c in astr)
   return astr.strip()

def fix_m(astr):
   return astr.replace('m', 'f')

def clean_up(astr):
   return fix_m(fix_whitespace(astr))


In theory, if you didn't want a custom string to actually change its own 
value, this could be semantically equivalent to:

new_str = astr.fix_whitespace().fix_m()

The latter might be a little more readable than the former.

> I thought that it would be much easier to
> just clean the string once, and then add methods that would give me
> the various transformations that I wanted from the cleaned string.

If you intended these transformations to be new instances of MyString, 
then this would probably not be abuse of the str built in type.

> Using __new__ seems to be the solution I was looking for.

>>Additionally, it has been suggested that you use __new__. E.g.:
>>
>>py> class MyString(str):
>>...   def __new__(cls, astr):
>>...     astr = astr.replace('m', 'f')
>>...     return super(MyString, cls).__new__(cls, astr)
>>...
>>py> MyString('mail')
>>'fail'
>>
>>But this is an abuse of the str class if you intend to populate your
>>subclasses with self-modifying methods such as your clean method. In
>>this case, you might consider composition, wherein you access an
>>instance of str as an attribute of class instances. The python standard
>>library make this easy with the UserString class and the ability to add
>>custom methods to its subclasses:


> What constitutes an abuse of the str class?

Changing its value as if it were mutable. This might be ok (though not 
recommended) during instantiation, but you wouldn't want something like 
this:

py> class MyString(str):
...   [etc.]
...
py> s = MyString('mail man')
py> s
'fail fan'
py> # kind-of ok up till now, but...
py> s.fix_whitespace()
py> s
'fail-fan'
py> # abusive to str

Probably better, if subclassing str, would be something more explicit:

py> s = MyString('mail man')
py> s
'mail man'
py> s = s.fix_ms()
py> s
'fail fan'
py> s = s.fix_whitespace()
py> s
'fail-fan'
py> s = MyString('mail man')
py> s
'mail man'
py> s.clean_up()
'fail-fan'

In this way, users would not need to understand the implementation of 
MyString (i.e. that it gets cleaned by default), and its behavior more 
intuitively resembles the built-in str class--except that MyString has 
added functionality.

> Is there some performance
> decrement that results from subclassing str like this?(Unfortunately
> my implementation seems to have a pretty large memory footprint, 400mb
> for about 400,000 files.) Or do you just mean from a philsophical
> standpoint?

Philosophical from a standpoint of desiring intuitively usable, 
reusable, and maintainable code.

> I guess I don't understand what benefits come from using
> UserString instead of just str.

Probably not many if you think of MyString as I suggest above. But if 
you want it to be magic, as you described originally, then you might 
think about UserString.

James

-- 
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/



More information about the Python-list mailing list