Algorithms Library - Asking for Pointers

Fri Sep 2 21:39:21 EDT 2011

On Sep 2, 6:49 pm, Travis Parks <jehugalea... at gmail.com> wrote:
> On Sep 2, 4:09 pm, Ian Kelly <ian.g.ke... at gmail.com> wrote:
>
>
>
>
>
> > On Fri, Sep 2, 2011 at 10:59 AM, Travis Parks <jehugalea... at gmail.com> wrote:
> > > Hello:
>
> > > I am working on an algorithms library. It provides LINQ like
> > > functionality to Python iterators. Eventually, I plan on having
> > > feaures that work against sequences and mappings.
>
> > > I have the code up athttp://code.google.com/p/py-compass.
>
> > > This is my first project in Python, so I'd like some feedback. I want
> > > to know if I am following conventions (overall style and quality of
> > > code).
>
> > Sure, here are my comments.
>
> > In the "forever" and "__forever" functions, your use of the term
> > "generator" is confusing.  "__forever" is a generator function,
> > because it has a yield statement.  Its argument, called "generator",
> > appears to be a callable, not a generator or even necessarily a
> > generator function.  Also, note that __forever(lambda: value) is
> > functionally equivalent to the more efficient itertools.repeat(value).
>
> > The staticmethod __next(iterator) accesses the class it is defined in,
> > which suggests that it might be better written as a classmethod
> > __next(cls, iterator).
>
> > Each of the LINQ-style methods is divided into two parts: the public
> > method that contains the docstring and some argument checks, and a
> > private staticmethod that contains the implementation.  I'm not
> > certain what the purpose of that is.  If it's to facilitate overriding
> > the implementation in subclasses, then you need to change the names of
> > the private methods to start with only one _ character so that they
> > won't be mangled by the compiler.
>
> > The comments before each method that only contain the name of the
> > immediately following method are redundant.
>
> > aggregate: the default aggregator is unintuitive to me.  I would make
> > it a required field and add a separate method called sum that calls
> > aggregate with the operator.add aggregator.
> > Also, the implementation doesn't look correct.  Instead of passing in
> > each item to the aggregator, you're passing in the number of items
> > seen so far?  The LINQ Aggregate method is basically reduce, so rather
> > than reinvent the wheel I would suggest this:
>
> > # MISSING is a unique object solely defined to represent missing arguments.
> > # Unlike None we can safely assume it will never be passed as actual data.
> > MISSING = object()
>
> > def aggregate(self, aggregator, seed=MISSING):
> >     if seed is self.MISSING:
> >         return reduce(aggregator, self._iterable)
> >     else:
> >         return reduce(aggregator, self._iterable, seed)
>
> > Note for compatibility that in Python 3 the reduce function has been
> > demoted from a builtin to a member of the functools module.
>
> > any: the name of this method could cause some confusion with the "any"
> > builtin that does something a bit different.
>
> > compare: the loop would more DRY as a for loop:
>
> > def __compare(first, second, comparison):
> >     for firstval, secondval in itertools.izip_longest(first, second,
> > fillvalue=self.MISSING):
> >         if firstval is self.MISSING:
> >             return -1
> >         elif secondval is self.MISSING:
> >             return 1
> >         else:
> >             result = comparison(firstval, secondval)
> >             if result != 0:
> >                 return result
> >     return 0
>
> > concatenate: again, no need to reinvent the wheel.  This should be
> > more efficient:
>
> > def concatenate(self, other):
> >     return extend(itertools.chain(self.__iterable, other))
>
> > equals: could be just "return self.compare(other, comparison) == 0"
>
> > __last: the loop could be a for loop:
>
> >         # assume we're looking at the last item and try moving to the next
> >         item = result.Value
> >         for item in iterator: pass
> >         return item
>
> > lastOrDefault: there's a lot of repeated logic here.  This could just be:
>
> > def lastOrDefault(self, default=None):
> >     try:
> >         return self.last()
> >     except ValueError:
> >         return default
>
> > map / forEach: .NET has to separate these into separate methods due to
> > static typing.  It seems a bit silly to have both of them in Python.
> > Also, map would be more efficient as "return itertools.imap(mapper,
> > self.__iterable)"
>
> > max / min: it would be more efficient to use the builtin:
> > def max(self, key):
> >     return max(self.__iterable, key=key)
> > If somebody really needs to pass a comparison function instead of a
> > key function, they can use functools.cmp_to_key.
>
> > randomSamples: a more canonical way to pass the RNG would be to pass
> > an instance of random.Random rather than an arbitrary function. Then
> > to get a random integer you can just call generator.randrange(total).
> > Note that for the default you can just use the random module itself,
> > which provides default implementations of all the Random methods.
> > Also, for loops:
>
> >     def __randomSamples(iterable, sampleCount, generator):
> >         samples = []
> >         iterator = iter(iterable)
> >         # fill up the samples with the items from the beginning of the iterable
> >         for i, value in zip(xrange(sampleCount), iterator):
> >             samples.append(value)
> >         # replace items if the generated number is less than the total
> >         total = len(samples)
> >         for value in iterator:
> >             total += 1
> >             index = generator.randrange(total)
> >             if index < len(samples):
> >                 samples[index] = result
> >         return samples
>
> > __reverse: you could just "return reversed(list(iterable))"
>
> > __rotateLeft:
> > def __rotateLeft(iterable, shift):
> >     iterator = iter(iterable)
> >     front = list(itertools.islice(iterator, shift))
> >     return itertools.chain(iterator, front)
>
> > skipWhile: suggest using itertools.dropwhile
> > take: suggest using itertools.islice
> > takeWhile: suggest using itertools.takewhile
> > __toList: "return list(iterable)"
> > __toSet: "return set(iterable)"
> > __toTuple: "return tuple(iterable)".  Note that as currently written
> > this actually returns a generator, not a tuple.
> > __where: suggest using itertools.ifilter
> > __zip: suggest using a for loop over itertools.izip(first, second)
>
> > Lookup: is inconsistent.  The overridden __iter__ method returns an
> > iterator over the values of the groups, but all the inherited methods
> > are going to iterate over the keys.  Probably you want to pass
> > groups.values() to the superclass __init__ method.
>
> > Cheers,
> > Ian
>
> Awesome tips. I appreciate the time you spent commenting on just about
> every function. I really like your suggestions about using itertools
> more, and for loops. I was feeling like some things were becoming way
>too complicated.
>
> I also noted the bugs you discovered. I will incorporate your
> suggestions.
>
> Thanks again!- Hide quoted text -
>
> - Show quoted text -

You commented that the itertools algorithms will perform faster than
the hand-written ones. Are these algorithms optimized internally?