PEP 218 Re: ANN: set-0.1 module available

James J. Besemer jb at cascade-sys.com
Fri May 17 17:36:41 EDT 2002


"Denis S. Otkidach" wrote:

> We often need a collection that is extended step-by-step avoiding
> duplicates.  Certainly we can use immutable sets:

That's the position I took initially when first thinking about the
problem.  But as I thought more of it, I had to abandon
'efficiency'.  In practice, I routinely manipulate immutable strings
100K long.  For a lot of text processing I read the whole file into a
string and whack away at it.  Yup, each string replace makes a copy
of the string but performance has never been an issue.

> s = {-}
> for item in some_sequence:
>     if is_good(item):
>         s ?= {item}  # create new set if it's immutable or just
>                      # add otherwise

It's a matter of English semantics.

A smart compiler/interpreter would recognize that the modify-in-place
does not require that the original value of s be saved and thus
modification could be performed in place and reuse the data rep.
even though the language Semantics said sets were immutable.

I onced worked on an APL interpreter that worked this way.  You made
deep copies of aggregate results on the stack only if you absolutely
had to and wherever possible you reused temporary results in place.
This was strictly an interpreter issue.  The compiler didn't even
have to know.  'Language semantics' were essentially that everything
was immutable.

> but such code will be too expensive for large sets.

> If we write similar code for strings we'll use StringIO (consider
> it's some kind of mutable string) or intermidiate list (and join
> items afterwards).  Collecting in list is OK with sets too if we
> don't need intermediate results as sets.  But what about StringIO
> equivalent for sets?  Other immutable types are not agregates
> (except complex numbers, but this is a special case) and we need
> not mutable form for them.

IF the proposal on the table is (a) sets are immutable but (b) we
also have a SetIO library function that allows us to manipulate them
as if they were mutable THEN I wholeheartedly agree.

IF you're saying the base implementation of sets MUST be mutable
because of performance reasons, then I don't get it, as it seems
there are a variety of solutions to minimize the performance issue
(which is very small for most practical applications in the first
place).

Regards

--jb
--
James J. Besemer  503-280-0838 voice
http://cascade-sys.com  503-280-0375 fax
mailto:jb at cascade-sys.com







More information about the Python-list mailing list