Interface of the set classes

Fri Oct 29 19:18:02 EDT 2004

Pierre Barbier de Reuille <pierre.barbier at cirad.fr> wrote:
   ...
> > So, you cannot "switch freely" -- there are constraints.
> 
> Right ... but take hash_set and you have exactly the same kind of "set".
> In my case, I already had the '<' operator ... so it wasn't a problem.
> Then, you'll argue that you need to be hashable. True, but as I said (I
> don't mean here I was clear about that :o) ), the choice of the 
> container usually depend on the data set.

Sure.

> > call it S.  You can write V[3], but you can't write S[3], for example.
> > Here, too, your assertion that "you can switch freely" is utterly
> > baffling.
> 
> Yes, that's why I needed __getitem__ only because some things cannot be
> done (in Python 2.3) without.

Such as...?  'tee' is only needed if you need to "move forward a bit" on
one branch and then _backtrack_ to a previous snapshot, or the like.  If
all you're doing is suspend an iteration and then resume it, say:

iter1 = iter(container1)
iter2 = iter(container2)

print 'here is the part of ct 1 up to the first FOO if any'
for item1 in iter1:
    print item1
    if item1 == 'FOO': break
else:
    print 'Ah well, no FOO!-('

print 'and here is the part of ct 2 up to the first BAR if any'
for item2 in iter2:
    print item2
    if item2 == 'BAR': break
else:
    print 'Sigh, no BAR!-('

if item1=='FOO':
    print 'And here is the rest of ct1'
    for item1 in iter1: print item1

if item2=='BAR':
    print 'And here is the rest of ct2'
    for item2 in iter2: print item2

IOW, as long as you only need forward iteration (w/o
snapshot/backtrack), you're in clover.

If you do need to "snapshot and backtrack", in the _general_ case you're
best off snapshotting into a list anyway - tee, in general, can't do
better than that (until and unless the PEP on copyable iterators lets
suitable iterator objects cooperate more deeply with tee...).  So
"cannot be done" is a definite overbid -- you may just be spending a
little more time or memory than you'd like, worst case.

> > It's not: neither in Python nor in C++ can you switch freely.
> 
> True, but in part ^_^ In C++, you can easyly design algorithm working
> with whatever sequence you have. But for that you'll have to use 
> templates, iterators, probably tags, and on. But it's possible ... I 
> don't see a way to do it in Python ... and that's my whole point.

Python has iterators, and doesn't need templates, since signature based
polymorphism is intrinsic.  Python only has forward iterators (actually
only _input_ iterators, since you can never alter the underlying
container directly through an iterator) so what would it do w/tags?

If your complaint is that Python needs more categories of iterators,
please express your complaint in these terms -- rather than in unrelated
terms criticizing the design of container types, a cri de coeur that's
actually quite unrelated to the complaint.

I used to wish for 'snapshottable'/'restartable' iterators, but 'tee'
basically gives me the equivalent (though I'd still dearly love to know
that all suitable iterators _cooperate optimally_ with 'tee', that, at
least for builtin ones, is just a matter of implementation -- exposing
it as _design_ is ``just'' an issue of letting *user-coded* iterators
cooperate with 'tee' just as fully, if and when feasible).

I still do wish (somewhat) for some architecture for output iterators,
and (more) for one for forward iterators.  (I would imagine that, if we
had forward iterators, an output iterator would just be a forward
iterator that doesn't return anything useful upon .next() and doesn't
raise StopIteration -- just offers something like a .set(newval) method,
or whatever a forward iterator would offer for that purpose).

Funny enough, not once since C++ stopped being my main programming
language have I found myself wishing for any other category of iterator
in any other language.  Perhaps just because the kind of container that
naturally can support a forward-backward iterator isn't common anyway,
and a random-access iterator basically means indexing -- and then I'd
rather use indexing (and slicing...).

> Yes, but use insert_iterator and you can insert with both vector and set
> with the same syntax ... even if, obviously, the position you gave for
> set will be ignored. So :
> 
>    inserter(result, result.end()) = 3;
> 
> will insert 3 into result, at the end if result is "ordered" (as the 
> list in python), or where it has to be for "sorted" or "unordered" 
> containers. Genericity in C++ comes from template functions most of the
> time.

Again, this basically seems to me to be a wish for more categories of
iterator.  The key reason this cannot be done directly in Python is that
there is no output iterator concept.

Of course, it's not rocket science to 'roll your own' in Python (in
Python style, of course).  The equivalent of "specializing a template"
requires some kind of typetesting or featuretesting; and we don't have a
"just past the end" ``bookmark'' to use, so, for example, the insert
method of a list can never be used to mean 'append' (sigh).  So, we
probably need name distinctions between insert_at_end and
insert_before_position -- I don't think anybody will ever convince GvR
that the latter should accept and ignore a position argument, btw.  But
at least insert_at_end should be uncontroversial:

def insert_at_end(container):
    try: return container.append
    except AttributeError: pass
    try: return container.add
    except AttributeError: pass
    try: update = container.update
    except AttributeError: pass
    else: return lambda xy: update((xy,))

    raise TypeError, "can't insert-at-end in %s' % type(container)

and then of course the equivalent of your snippet above is

   insert_at_end(result)(3)

[[ You'll never manage to get 'assign to a function call' into Python,
   but the syntax sugar of a call is pretty reasonable here anyhoo. ]]

Of course, if result is a dict this will fail (you'd need a 2-item
tuple, or something that can pass for it in a dim light, as the
argument), just like your C++ snippet will fail if result is a std::map
(you'd need an std::pair as the RHS) -- though C++ will fail at compile
time, and Python at runtime, as usual.

> > Not when the semantics are too different, as in the case of S and V.
> 
> Ok, that's something I don't agree ... for me there is a common semantic
> between S and V which is "storing values in a linear container". After
> that, you can put the properties you need in that container, but if I
> need to write an algorithm whose only assumption is "a linear container"
> I want to be able to use whatever linear container I have ... and I 
> don't want to reimplement it for each.

So write that insert_at_end function -- it IS pretty simple, after all.
Put it in your myneatutils.py module and import it everywhere.  Or put
it into a module that you call std.py instead, so the parallel with C++
will be even clearer -- std::inserter vs std.insert_at_end, etc.

Faking that you can insert at any point for a container in which you
CAN'T is extremely dubious, btw.  Even the _at_end part of the name of
the function I suggest may be going overboard.  After all, any algorithm
that depends on the common semantics between S and V had better NOT
believe anything it inserts does go 'at end', so maybe a name such as
'insert_wherever_it_is_most_convenient_for_you' might be better
(although a little bit verbose).

> > Well, you surely didn't get operator[] on C++'s std::set<...>s, that's
> > for sure...
> 
> Well, no ... but it's not needed ^_^ And if I really need it you have
> the "advance" template function that do the __getitem__ stuff in the 
> most efficient way, whatever your _linear_ container is.

Again, that's obviously trivial to write for Python, since Python
doesn't have random access iterators, so the most efficient way for any
container c to get an iterator starting at its 5th item will be to start
with iter(c) and step forward 5 times (with itertools.islice)...;-)
(Well, with a list you could slice -- if you never needed to write into
the original list, which, as I mentioned, Python has no iterator kind
for doing, anyway; if you want to prefer this solution, try/except is
just friend -- feature-testing, basically).

Again, a reasonable complaint (if you can convince Raymond Hettinger,
basically) is that itertools.islice should be able to cooperate
specially with some privileged iterators (builtin ones only, or more
ambitiously, user-coded ones too), like itertools.tee should, again by
feature-testing.  Essentially an implementation issue as long as you
stick to builtins, becomes a design one if you want usercoded ones too.

> So I hope I made my point clearer this time ^_^ I used Python for a bit

Sure!  If you had whined about the povery of kinds of iterators that
Python supplies, rather than (as per subject) about the interface of the
set classes, it would have been quite clear from the start.

> lists, sets and dictionnary. And I don't really see any other way to do
> that (but to add methods to the sets ... and I don't want to do so).

No need: the Python equivalent of C++'s specializing of templates (just
like that of C++'s overloading) is feature-testing (or type-testing, but
generally that's _not_ warranted -- feature-testing allows user-coded
types to cooperate just as well!).  So, the above-exemplified function
is the Python equivalent of a C++ template function with
specializations.  Of course Python does more at runtime that C++ does at
compiletime -- but that applies in several other ways, anyhow;-).

> Ok, so how do you write an algorithm on a container without specifying
> which, knowing that you cannot use the iterators, neither the signature
> of the class, nor the derivation ?

You encapsulate the feature-testing (looking for suitable methods to
either return directly, or wrap in inner functions -- not necessarily
lambdas, an inner 'def' will often be preferable -- to return) into
functions, once -- equivalent to a C++ template functions
w/specializations.  Then you write your algorithms in terms of those
functions, just like, in C++, you'd write them in terms of template
functions (either coded by yourself, or taken from some library coded by
others, such as the excellent Boost people).

> That was all the methods I imagined for Python's genericity. But none of
> them work when you see the base containers.

"The signature of the class" (or, rather, some specific methods thereof)
is really all you need.  But it's best to push that detail into a
function, as above.  Should you ever need to optimize away the process
for some given types, it's easy, too:

at_end_inserters_by_type = {
    list: list.append,
    collections.deque: collections.deque.append,
    set: set.add,
}
def insert_at_end(container):
    m = at_end_inserters_by_type.get(type(container))
    if m is not None: return m.__get__(container)
    ...same as before...

Note that I have kept the helper dictionary exposed so it's easy to add
optimized customizations, just as in C++ it's easy to add template
specializations.  Not just for user-coded types, but also for existing
ones which this particular utility module isn't optimizing for, e.g.:

import std
import array
std.at_end_inserters_by_type[array.array] = array.array.append

and the like (assuming std.py is the module where we have insert_at_end
and its supporting optimizations-helper-dictionary).

If you have several such functions needing helper dictionaries, you may
make a dictionary of dictionaries, keyed first by the function, and
maybe expose a helper function to register optimization-specializations
for a given function, rather than let user-level code deal with directly
indexing dictionaries.  Then the above specialization might become:

std.specialize(insert_at_end, array.array, array.array.append)

As long as, after checking the specializations, the function continues
with feature-testing as before, the specializations may only be needed
for performance, or for existing types (maybe from some library) that
supply the needed functionality but need interface adaptation, basically
like in C++.

Alex