[python-win32] Identify unique data from sequence array

Mike Diehn mike.diehn at ansys.com
Wed Dec 22 17:01:52 CET 2010


I'm a unix guy.  That's what we call a sort-uniq operation, after the
pipeline we'd use: sort datafile | uniq > uniq-lines.txt.  So I google that
with python and ....

As Jason Petrone wrote when he withdrew PEP 270  in
http://www.python.org/dev/peps/pep-0270/:


"creating a sequence without duplicates is just a matter of
choosing a different data structure: a set instead of a list."


At the time, sets.py was a nifty new thing.  Since then, the set datatype
has
been added to python's base.

set() can consume a list of tuples, but not a list of lists, like the X you
showed us.  You're job will be getting your massive list of lists into a
list of tuples.

This works, but for your very large arrays, may take large time:

X = [[1,2], [1,2], [3,4], [3,4]]

Y = set( [tuple(x) for x in X] )


There may be faster methods.  The map() function might help, but I really
don't know.  Here's something to try:

Y = set( map(tuple, X )


Or you can go old school route, from before the days of set(), that is:

http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/


Best,
Mike

On Wed, Dec 22, 2010 at 10:28 AM, Aahz <aahz at pythoncraft.com> wrote:

> On Wed, Dec 22, 2010, otrov wrote:
> >
> > I failed in my first idea to solve this problem with matlab/octave,
> > as I just started using this tools for data manipulation, and then
> > thought to try python as more feature rich descriptive language and
> > post this problem to python group I'm subscribed already
>
> You may get better answers posting to a general Python group (e.g.
> comp.lang.python).
> --
> Aahz (aahz at pythoncraft.com)           <*>
> http://www.pythoncraft.com/
>
> "Think of it as evolution in action."  --Tony Rand
> _______________________________________________
> python-win32 mailing list
> python-win32 at python.org
> http://mail.python.org/mailman/listinfo/python-win32
>



-- 
Mike Diehn
Senior Systems Administrator
ANSYS, Inc - Lebanon, NH Office
mike.diehn at ansys.com, (603) 727-5492
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-win32/attachments/20101222/fb64dcbf/attachment.html>


More information about the python-win32 mailing list