How can I create customized classes that have similar properties as 'str'?

Licheng Fang fanglicheng at gmail.com
Sat Nov 24 06:44:59 EST 2007


On Nov 24, 7:05 pm, Bjoern Schliessmann <usenet-
mail-0306.20.chr0n... at spamgourmet.com> wrote:
> Licheng Fang wrote:
> > I find myself frequently in need of classes like this for two
> > reasons. First, it's efficient in memory.
>
> Are you using millions of objects, or MB size objects? Otherwise,
> this is no argument.

Yes, millions. In my natural language processing tasks, I almost
always need to define patterns, identify their occurrences in a huge
data, and count them. Say, I have a big text file, consisting of
millions of words, and I want to count the frequency of trigrams:

trigrams([1,2,3,4,5]) == [(1,2,3),(2,3,4),(3,4,5)]

I can save the counts in a dict D1. Later, I may want to recount the
trigrams, with some minor modifications, say, doing it on every other
line of the input file, and the counts are saved in dict D2. Problem
is, D1 and D2 have almost the same set of keys (trigrams of the text),
yet the keys in D2 are new instances, even though these keys probably
have already been inserted into D1. So I end up with unnecessary
duplicates of keys. And this can be a great waste of memory with huge
input data.

>
> BTW, what happens if you, by some operation, make a == b, and
> afterwards change b so another object instance must be created?
> This instance management is quite a runtime overhead.
>

I probably need this class to be immutable.

> > Second, when two instances are compared for equality only their
> > pointers are compared.
>
> I state that the object management will often eat more performance
> than equality testing. Except you have a huge number of equal
> objects. If the latter was the case you should rethink your program
> design.
>

Yeah, counting is all about equal or not.

> > (I think that's how Python compares 'str's.
>
> Generally not. In CPython, just very short strings are created only
> once.
>
> >>> a=" "
> >>> b=" "
> >>> a is b
> True
> >>> a="  "
> >>> b="  "
> >>> a is b
>

Wow, I didn't know this. But exactly how Python manage these strings?
My interpretator gave me such results:

>>> a = 'this'
>>> b = 'this'
>>> a is b
True
>>> a = 'this is confusing'
>>> b = 'this is confusing'
>>> a is b
False


> False
>
> Regards,
>
> Björn
>
> --
> BOFH excuse #430:
>
> Mouse has out-of-cheese-error




More information about the Python-list mailing list