[python-win32] python-win32 Digest, Vol 93, Issue 26

Gerard Blais gerard.blais at gmail.com
Wed Dec 22 19:22:21 CET 2010


One thought is a default dictionary

import containers

counts = containers.defaultdict(int)

for pair in my_array:
   counts[pair] += 1

duplicated_pairs = [x for x in counts if counts[x] > 1]

Gerry

On Wed, Dec 22, 2010 at 12:21 PM, <python-win32-request at python.org> wrote:

> Send python-win32 mailing list submissions to
>        python-win32 at python.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://mail.python.org/mailman/listinfo/python-win32
> or, via email, send a message with subject or body 'help' to
>        python-win32-request at python.org
>
> You can reach the person managing the list at
>        python-win32-owner at python.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of python-win32 digest..."
>
>
> Today's Topics:
>
>   1. Identify unique data from sequence array (otrov)
>   2. Re: Intenet explorer using PythonWin Help (Mike Driscoll)
>   3. Re: Identify unique data from sequence array (Aahz)
>   4. Re: Identify unique data from sequence array (Mike Diehn)
>   5. Re: Identify unique data from sequence array (otrov)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 22 Dec 2010 13:11:43 +0100
> From: otrov <dejan.org at gmail.com>
> To: python-win32 at python.org
> Subject: [python-win32] Identify unique data from sequence array
> Message-ID: <1877904314.20101222131143 at gmail.com>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
> I failed in my first idea to solve this problem with matlab/octave, as I
> just started using this tools for data manipulation, and then thought to try
> python as more feature rich descriptive language and post this problem to
> python group I'm subscribed already
>
> Let's consider this simple dictionary object (scipy array):
>
> X = array([[1, 2],
>           [1, 2],
>           [2, 2],
>           [3, 1],
>           [2, 3],
>           [1, 2],
>           [1, 2],
>           [2, 2],
>           [3, 1],
>           [2, 3],
>           [1, 2],
>           [1, 2],
>           [2, 2],
>           [3, 1],
>           [2, 3],
>           ...,
>           [1, 2],
>           [1, 2],
>           [2, 2],
>           [3, 1],
>           [2, 3]]
>
> I would like to extract repeated sequence data:
>
> Y = array([[1, 2],
>           [1, 2],
>           [2, 2],
>           [3, 1],
>           [2, 3]]
>
> as a result.
>
> Arrays are consisted of 10^7 to 10^8 elements, and unique sequence consists
> of maximum 10^6 elements, usually less like 10^5
>
> Thanks for your time
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 22 Dec 2010 08:31:27 -0600
> From: Mike Driscoll <mdriscoll at co.marshall.ia.us>
> Cc: python-win32 at python.org
> Subject: Re: [python-win32] Intenet explorer using PythonWin Help
> Message-ID: <4D120BBF.3040108 at co.marshall.ia.us>
> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> On 1:59 PM, Pat McGuire wrote:
> >
> > I am new at programming with Python and am using Pythonwin.  I have a
> > couple of questions'
> >
> > 1.  The code below after doc.FormName.submit() will navigate to
> > the correct page but if I print the url it shows the url of the page I
> > logged in at.  I thought submit would be just like if I clicked on the
> > submit button.
> >
> > import win32com.client
> > import win32api
> > ie = win32com.client.Dispatch( "InternetExplorer.Application" )
> > ie.Visible = 1
> > ie.Navigate("urlhere
> > <http://posting.www.backpage.com/classifieds/central/index>")
> > while ie.Busy == True:
> >     win32api.Sleep(1000)
> > doc = ie.Document
> > doc.FormName.email.value = "emailaddress <mailto:doublepllc at gmail.com>"
> > doc.FormName.password.value = "mypassword"
> > doc.FormName.submit()
> >
> > 2.  Can you point me to a site that which show me how to access each
> > type of form element, i.e. option, hrefs, links, etc
> >
> >
> > Any help is greatly appreciated.
> >
>
> I've heard good things about Mechanize:
> http://mechanize.rubyforge.org/mechanize/
>
> It's not PyWin32, but it's probably easier to use than win32com methods.
>
>
>
> --
> Mike Driscoll
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/python-win32/attachments/20101222/f201837c/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Wed, 22 Dec 2010 07:28:25 -0800
> From: Aahz <aahz at pythoncraft.com>
> To: python-win32 at python.org
> Subject: Re: [python-win32] Identify unique data from sequence array
> Message-ID: <20101222152825.GB3725 at panix.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Wed, Dec 22, 2010, otrov wrote:
> >
> > I failed in my first idea to solve this problem with matlab/octave,
> > as I just started using this tools for data manipulation, and then
> > thought to try python as more feature rich descriptive language and
> > post this problem to python group I'm subscribed already
>
> You may get better answers posting to a general Python group (e.g.
> comp.lang.python).
> --
> Aahz (aahz at pythoncraft.com)           <*>
> http://www.pythoncraft.com/
>
> "Think of it as evolution in action."  --Tony Rand
>
>
> ------------------------------
>
> Message: 4
> Date: Wed, 22 Dec 2010 11:01:52 -0500
> From: Mike Diehn <mike.diehn at ansys.com>
> To: Aahz <aahz at pythoncraft.com>
> Cc: python-win32 at python.org
> Subject: Re: [python-win32] Identify unique data from sequence array
> Message-ID:
>        <AANLkTi=9RVD+gcO2jR3t_YSwMwuXoVgwqRHupY4HRk=9 at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I'm a unix guy.  That's what we call a sort-uniq operation, after the
> pipeline we'd use: sort datafile | uniq > uniq-lines.txt.  So I google that
> with python and ....
>
> As Jason Petrone wrote when he withdrew PEP 270  in
> http://www.python.org/dev/peps/pep-0270/:
>
>
> "creating a sequence without duplicates is just a matter of
> choosing a different data structure: a set instead of a list."
>
>
> At the time, sets.py was a nifty new thing.  Since then, the set datatype
> has
> been added to python's base.
>
> set() can consume a list of tuples, but not a list of lists, like the X you
> showed us.  You're job will be getting your massive list of lists into a
> list of tuples.
>
> This works, but for your very large arrays, may take large time:
>
> X = [[1,2], [1,2], [3,4], [3,4]]
>
> Y = set( [tuple(x) for x in X] )
>
>
> There may be faster methods.  The map() function might help, but I really
> don't know.  Here's something to try:
>
> Y = set( map(tuple, X )
>
>
> Or you can go old school route, from before the days of set(), that is:
>
>
> http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
>
>
> Best,
> Mike
>
> On Wed, Dec 22, 2010 at 10:28 AM, Aahz <aahz at pythoncraft.com> wrote:
>
> > On Wed, Dec 22, 2010, otrov wrote:
> > >
> > > I failed in my first idea to solve this problem with matlab/octave,
> > > as I just started using this tools for data manipulation, and then
> > > thought to try python as more feature rich descriptive language and
> > > post this problem to python group I'm subscribed already
> >
> > You may get better answers posting to a general Python group (e.g.
> > comp.lang.python).
> > --
> > Aahz (aahz at pythoncraft.com)           <*>
> > http://www.pythoncraft.com/
> >
> > "Think of it as evolution in action."  --Tony Rand
> > _______________________________________________
> > python-win32 mailing list
> > python-win32 at python.org
> > http://mail.python.org/mailman/listinfo/python-win32
> >
>
>
>
> --
> Mike Diehn
> Senior Systems Administrator
> ANSYS, Inc - Lebanon, NH Office
> mike.diehn at ansys.com, (603) 727-5492
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/python-win32/attachments/20101222/fb64dcbf/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Wed, 22 Dec 2010 18:21:27 +0100
> From: otrov <dejan.org at gmail.com>
> To: python-win32 at python.org
> Subject: Re: [python-win32] Identify unique data from sequence array
> Message-ID: <18010463321.20101222182127 at gmail.com>
> Content-Type: text/plain; charset=us-ascii
>
> > I'm a unix guy.  That's what we call a sort-uniq operation, after the
> > pipeline we'd use: sort datafile | uniq > uniq-lines.txt.  So I google
> that
> > with python and ....
>
> > As Jason Petrone wrote when he withdrew PEP 270  in
> > http://www.python.org/dev/peps/pep-0270/:
>
>
> > "creating a sequence without duplicates is just a matter of
> > choosing a different data structure: a set instead of a list."
>
>
> > At the time, sets.py was a nifty new thing.  Since then, the set datatype
> > has
> > been added to python's base.
>
> > set() can consume a list of tuples, but not a list of lists, like the X
> you
> > showed us.  You're job will be getting your massive list of lists into a
> > list of tuples.
>
> > This works, but for your very large arrays, may take large time:
>
> > X = [[1,2], [1,2], [3,4], [3,4]]
>
> > Y = set( [tuple(x) for x in X] )
>
>
> > There may be faster methods.  The map() function might help, but I really
> > don't know.  Here's something to try:
>
> > Y = set( map(tuple, X )
>
>
> > Or you can go old school route, from before the days of set(), that is:
>
> >
> http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
>
>
> > Best,
> > Mike
>
> Thanks for your reply, but perhaps there is misunderstanding:
>
> I don't want unique values, but unique sequence (block) of data that is
> repeated in array:
>
> A B C D D D A B C D D D A B C D D D
> |_________| |_________| |_________|
>     |           |           |
>   unique      unique      unique
>  sequence    sequence    sequence
>    data        data        data
>
> I tested your approach and won't say it's slow. It works great but that's
> not what I'm after. Thanks anyway
>
> Cheers
>
>
>
> ------------------------------
>
> _______________________________________________
> python-win32 mailing list
> python-win32 at python.org
> http://mail.python.org/mailman/listinfo/python-win32
>
>
> End of python-win32 Digest, Vol 93, Issue 26
> ********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-win32/attachments/20101222/6709670e/attachment-0001.html>


More information about the python-win32 mailing list