From simeonf at gmail.com  Wed Jun  1 00:55:58 2011
From: simeonf at gmail.com (Simeon Franklin)
Date: Tue, 31 May 2011 15:55:58 -0700
Subject: [Baypiggies] Video from May 26th Bay Piggies event online
In-Reply-To: <4DE55FB0.4090104@marakana.com>
References: <4DE55FB0.4090104@marakana.com>
Message-ID: <BANLkTi=m3igWLL6_NH=m7cHvN-upFGNjTw@mail.gmail.com>

Awesome job - thanks Max! I forget who is in charge of content on
baypiggies.net (thank you, btw) but can we get the videos linked from
the baypiggies talks archive?

-regards
Simeon Franklin

On Tue, May 31, 2011 at 2:37 PM, Max Walker - Marakana
<max.walker at marakana.com> wrote:
> Hi all,
>
> Just letting you know that the video from Jeff Fischer's newbie nugget talk
> on Implementing Mix-ins in Python is now online: http://mrkn.co/f/345
>
> Coming soon is the video for Alan DuBoff's presentation on Writing Titanium
> Desktop Applications with Python. I'll shoot another email to the list later
> this week when it's up.
>
> Cheers!!
>
> - Max
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From bdbaddog at gmail.com  Wed Jun  1 08:35:23 2011
From: bdbaddog at gmail.com (William Deegan)
Date: Tue, 31 May 2011 23:35:23 -0700
Subject: [Baypiggies] Video from May 26th Bay Piggies event online
In-Reply-To: <BANLkTi=m3igWLL6_NH=m7cHvN-upFGNjTw@mail.gmail.com>
References: <4DE55FB0.4090104@marakana.com>
	<BANLkTi=m3igWLL6_NH=m7cHvN-upFGNjTw@mail.gmail.com>
Message-ID: <64DEA821-75AC-4E92-A4FD-7A7EA296E363@gmail.com>

Simeon,

I'll try and do that in the next few days.

-Bill
On May 31, 2011, at 3:55 PM, Simeon Franklin wrote:

> Awesome job - thanks Max! I forget who is in charge of content on
> baypiggies.net (thank you, btw) but can we get the videos linked from
> the baypiggies talks archive?
> 
> -regards
> Simeon Franklin
> 
> On Tue, May 31, 2011 at 2:37 PM, Max Walker - Marakana
> <max.walker at marakana.com> wrote:
>> Hi all,
>> 
>> Just letting you know that the video from Jeff Fischer's newbie nugget talk
>> on Implementing Mix-ins in Python is now online: http://mrkn.co/f/345
>> 
>> Coming soon is the video for Alan DuBoff's presentation on Writing Titanium
>> Desktop Applications with Python. I'll shoot another email to the list later
>> this week when it's up.
>> 
>> Cheers!!
>> 
>> - Max
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies


From max.walker at marakana.com  Fri Jun  3 02:13:45 2011
From: max.walker at marakana.com (Max Walker - Marakana)
Date: Thu, 02 Jun 2011 17:13:45 -0700
Subject: [Baypiggies] video for Alan DuBoff's Preso on Writing Titanium
 Desktop Apps in Python
Message-ID: <4DE82739.40107@marakana.com>

Hi guys,

Just letting you know that the video for Alan DuBoff's presentation on 
Writing Titanium Desktop apps in Python from the May 26th Bay Piggies 
Meetup is now online: http://mrkn.co/f/347 - so check it out!

Cheers!

- Max


From jjinux at gmail.com  Fri Jun  3 19:11:42 2011
From: jjinux at gmail.com (Shannon -jj Behrens)
Date: Fri, 3 Jun 2011 10:11:42 -0700
Subject: [Baypiggies] getting a fair flip out of an unfair coin
Message-ID: <BANLkTin3sZAa3mk0gjQmfcb15qJG97RNxw@mail.gmail.com>

A couple months ago at BayPiggies, someone asked for an algorithm to
get a fair flip out of an unfair coin.  My buddy Hy Carrinski and I
came up with the following algorithm:

http://jjinux.blogspot.com/2011/06/python-getting-fair-flip-out-of-unfair.html

Happy Hacking!
-jj

-- 
In this life we cannot do great things. We can only do small things
with great love. -- Mother Teresa

From japerk at gmail.com  Mon Jun  6 16:35:46 2011
From: japerk at gmail.com (Jacob Perkins)
Date: Mon, 6 Jun 2011 07:35:46 -0700
Subject: [Baypiggies] job at weotta
Message-ID: <BANLkTi=C-zHVauiGrkKzMCS-buC_MvzenA@mail.gmail.com>

Hi,

http://www.weotta.com, which just launched at TechCrunch Disrupt NY, is
looking for some experienced Python developers to be our first key
engineering hires. You'll work closely with me and the rest of the founding
team (http://www.weotta.com/about/) to make Weotta even more more awesome :)

We're currently based in Los Gatos, but will be moving to San Francisco once
our funding round closes. There's two main areas of focus, and we're looking
for strong technical devs that can quickly dive in to at least one of these:

* frontend web app development with Django & jQuery
* backend NLP with NLTK

We also use the following tech, so it's ideal if you're familiar with some
of these already:

* Mercurial
* pip & virtualenv
* Fabric
* South & MySQL
* Nginx
* EC2 deployment
* MongoDB
* Redis
* Facebook API

If you'd like to learn more about Weotta, check out our press page
http://www.weotta.com/press/. And to get in and try it, you can signup using
your facebook account at http://www.weotta.com/s/4QEuIgWS/. If you like the
product and want to make it better and expand our coverage, please reply
with some links about you and work you've done, ideally open source projects
you've created or contributed to. You can also contact me at
https://github.com/japerk/ and http://www.linkedin.com/in/jacobperkins.

Jacob
---
http://www.weotta.com/
http://streamhacker.com/
http://twitter.com/japerk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110606/d3dc72dc/attachment.html>

From c1 at caseyc.net  Tue Jun  7 06:32:21 2011
From: c1 at caseyc.net (Casey Callendrello)
Date: Mon, 06 Jun 2011 21:32:21 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
Message-ID: <4DEDA9D5.1090906@caseyc.net>

Hi there,
I've got a simple problem that I've already solved effectively, but I 
can't help thinking that there must be a more "pythonic" way to do it. 
Especially because my solution uses a list index, which I *know* can't 
possibly be the Python way ;-).

In any case, I have two lists: one of machines, and one of jobs. Either 
one can be of arbitrary length, including zero. I want to generate 
(machine, job) pairs where every machine gets at most one job, each job 
is only executed once, and as much work as possible is done. The actual 
index or order is irrelevant.

The simple, C-inspired solution is:

i = 0
while i<len(jobs) and i<len(machines):
     do_job(jobs[i], machines[i])
     i += 1

There has to be a cleaner way than that! Any suggestions?

--Casey


From c1 at caseyc.net  Tue Jun  7 06:33:51 2011
From: c1 at caseyc.net (Casey Callendrello)
Date: Mon, 06 Jun 2011 21:33:51 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <4DEDA9D5.1090906@caseyc.net>
References: <4DEDA9D5.1090906@caseyc.net>
Message-ID: <4DEDAA2F.6010105@caseyc.net>

I should add, I'm actually more interested in a list of (job, machine) 
tuples, since that's added to a queue and sent to a threadpool.

--Casey

On 6/6/11 9:32 PM, Casey Callendrello wrote:
> Hi there,
> I've got a simple problem that I've already solved effectively, but I 
> can't help thinking that there must be a more "pythonic" way to do it. 
> Especially because my solution uses a list index, which I *know* can't 
> possibly be the Python way ;-).
>
> In any case, I have two lists: one of machines, and one of jobs. 
> Either one can be of arbitrary length, including zero. I want to 
> generate (machine, job) pairs where every machine gets at most one 
> job, each job is only executed once, and as much work as possible is 
> done. The actual index or order is irrelevant.
>
> The simple, C-inspired solution is:
>
> i = 0
> while i<len(jobs) and i<len(machines):
>     do_job(jobs[i], machines[i])
>     i += 1
>
> There has to be a cleaner way than that! Any suggestions?
>
> --Casey
>
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies


From me at rpatterson.net  Tue Jun  7 06:37:04 2011
From: me at rpatterson.net (Ross Patterson)
Date: Mon, 6 Jun 2011 21:37:04 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <4DEDA9D5.1090906@caseyc.net>
References: <4DEDA9D5.1090906@caseyc.net>
Message-ID: <BANLkTi=9ofi-1_AJFCb=HYb_5RtdT4wbFw@mail.gmail.com>

I suspect you could use itertools.izip_longest:

http://docs.python.org/library/itertools.html#itertools.izip_longest

Ross

On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello <c1 at caseyc.net> wrote:

> Hi there,
> I've got a simple problem that I've already solved effectively, but I can't
> help thinking that there must be a more "pythonic" way to do it. Especially
> because my solution uses a list index, which I *know* can't possibly be the
> Python way ;-).
>
> In any case, I have two lists: one of machines, and one of jobs. Either one
> can be of arbitrary length, including zero. I want to generate (machine,
> job) pairs where every machine gets at most one job, each job is only
> executed once, and as much work as possible is done. The actual index or
> order is irrelevant.
>
> The simple, C-inspired solution is:
>
> i = 0
> while i<len(jobs) and i<len(machines):
>    do_job(jobs[i], machines[i])
>    i += 1
>
> There has to be a cleaner way than that! Any suggestions?
>
> --Casey
>
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110606/4a95f4e8/attachment.html>

From hcarrinski at gmail.com  Tue Jun  7 06:44:19 2011
From: hcarrinski at gmail.com (Hy Carrinski)
Date: Mon, 6 Jun 2011 21:44:19 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <BANLkTi=9ofi-1_AJFCb=HYb_5RtdT4wbFw@mail.gmail.com>
References: <4DEDA9D5.1090906@caseyc.net>
	<BANLkTi=9ofi-1_AJFCb=HYb_5RtdT4wbFw@mail.gmail.com>
Message-ID: <BANLkTiknSXORNZPqKMJXXEVW6P4KBM36NA@mail.gmail.com>

The following will work.
Does it fully solve your problem?

from itertools import izip

for (job, machine) in izip(jobs, machines):
    do_job(job, machine)

On Mon, Jun 6, 2011 at 9:37 PM, Ross Patterson <me at rpatterson.net> wrote:
> I suspect you could use itertools.izip_longest:
> http://docs.python.org/library/itertools.html#itertools.izip_longest
> Ross
> On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello <c1 at caseyc.net> wrote:
>>
>> Hi there,
>> I've got a simple problem that I've already solved effectively, but I
>> can't help thinking that there must be a more "pythonic" way to do it.
>> Especially because my solution uses a list index, which I *know* can't
>> possibly be the Python way ;-).
>>
>> In any case, I have two lists: one of machines, and one of jobs. Either
>> one can be of arbitrary length, including zero. I want to generate (machine,
>> job) pairs where every machine gets at most one job, each job is only
>> executed once, and as much work as possible is done. The actual index or
>> order is irrelevant.
>>
>> The simple, C-inspired solution is:
>>
>> i = 0
>> while i<len(jobs) and i<len(machines):
>> ? ?do_job(jobs[i], machines[i])
>> ? ?i += 1
>>
>> There has to be a cleaner way than that! Any suggestions?
>>
>> --Casey
>>
>>
>>
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From max at theslimmers.net  Tue Jun  7 06:55:27 2011
From: max at theslimmers.net (Max Slimmer)
Date: Mon, 6 Jun 2011 21:55:27 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <4DEDA9D5.1090906@caseyc.net>
References: <4DEDA9D5.1090906@caseyc.net>
Message-ID: <BANLkTikrg4-m7FT042jHVdatorq4C+JMEA@mail.gmail.com>

A more realistic and in some ways interesting problem is to deal with
potentially more jobs than machines. I would think that you want all
the jobs to get done therefore any one machine might need to do more
than one job, Then for fun some machines are might be more efficient,
in either time or cost.  :-)

max


On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello <c1 at caseyc.net> wrote:
> Hi there,
> I've got a simple problem that I've already solved effectively, but I can't
> help thinking that there must be a more "pythonic" way to do it. Especially
> because my solution uses a list index, which I *know* can't possibly be the
> Python way ;-).
>
> In any case, I have two lists: one of machines, and one of jobs. Either one
> can be of arbitrary length, including zero. I want to generate (machine,
> job) pairs where every machine gets at most one job, each job is only
> executed once, and as much work as possible is done. The actual index or
> order is irrelevant.
>
> The simple, C-inspired solution is:
>
> i = 0
> while i<len(jobs) and i<len(machines):
> ? ?do_job(jobs[i], machines[i])
> ? ?i += 1
>
> There has to be a cleaner way than that! Any suggestions?
>
> --Casey
>
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From jeremy.r.fishman at gmail.com  Tue Jun  7 07:27:14 2011
From: jeremy.r.fishman at gmail.com (Jeremy Fishman)
Date: Mon, 6 Jun 2011 22:27:14 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <BANLkTiknSXORNZPqKMJXXEVW6P4KBM36NA@mail.gmail.com>
References: <4DEDA9D5.1090906@caseyc.net>
	<BANLkTi=9ofi-1_AJFCb=HYb_5RtdT4wbFw@mail.gmail.com>
	<BANLkTiknSXORNZPqKMJXXEVW6P4KBM36NA@mail.gmail.com>
Message-ID: <BANLkTik8=a5jk6GrZY+YJqOhmROq3BFb=g@mail.gmail.com>

More information:

izip() is the iterative version of the core Python builtin zip(), which
returns a list.
    http://docs.python.org/library/functions.html#zip

In Python 3+ zip() returns an iterable object (it's a type)
    http://docs.python.org/release/3.2/library/functions.html#zip

<http://docs.python.org/release/3.2/library/functions.html#zip>Cheers,
Jeremy

On Mon, Jun 6, 2011 at 9:44 PM, Hy Carrinski <hcarrinski at gmail.com> wrote:

> The following will work.
> Does it fully solve your problem?
>
> from itertools import izip
>
> for (job, machine) in izip(jobs, machines):
>    do_job(job, machine)
>
> On Mon, Jun 6, 2011 at 9:37 PM, Ross Patterson <me at rpatterson.net> wrote:
> > I suspect you could use itertools.izip_longest:
> > http://docs.python.org/library/itertools.html#itertools.izip_longest
> > Ross
> > On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello <c1 at caseyc.net>
> wrote:
> >>
> >> Hi there,
> >> I've got a simple problem that I've already solved effectively, but I
> >> can't help thinking that there must be a more "pythonic" way to do it.
> >> Especially because my solution uses a list index, which I *know* can't
> >> possibly be the Python way ;-).
> >>
> >> In any case, I have two lists: one of machines, and one of jobs. Either
> >> one can be of arbitrary length, including zero. I want to generate
> (machine,
> >> job) pairs where every machine gets at most one job, each job is only
> >> executed once, and as much work as possible is done. The actual index or
> >> order is irrelevant.
> >>
> >> The simple, C-inspired solution is:
> >>
> >> i = 0
> >> while i<len(jobs) and i<len(machines):
> >>    do_job(jobs[i], machines[i])
> >>    i += 1
> >>
> >> There has to be a cleaner way than that! Any suggestions?
> >>
> >> --Casey
> >>
> >>
> >>
> >> _______________________________________________
> >> Baypiggies mailing list
> >> Baypiggies at python.org
> >> To change your subscription options or unsubscribe:
> >> http://mail.python.org/mailman/listinfo/baypiggies
> >
> >
> > _______________________________________________
> > Baypiggies mailing list
> > Baypiggies at python.org
> > To change your subscription options or unsubscribe:
> > http://mail.python.org/mailman/listinfo/baypiggies
> >
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110606/4a60566c/attachment.html>

From cbc at unc.edu  Tue Jun  7 07:28:51 2011
From: cbc at unc.edu (Chris Calloway)
Date: Tue, 07 Jun 2011 01:28:51 -0400
Subject: [Baypiggies] Seattle PyCamp 2011
Message-ID: <4DEDB713.8000201@unc.edu>

University of Washington Marketing and the Seattle Plone Gathering host 
the inaugural Seattle PyCamp 2011 at The Paul G. Allen Center for 
Computer Science & Engineering on Monday, August 29 through Friday, 
September 2, 2011.

Register today at http://trizpug.org/boot-camp/seapy11/

For beginners, this ultra-low-cost Python Boot Camp makes you productive 
so you can get your work done quickly. PyCamp emphasizes the features 
which make Python a simpler and more efficient language. Following along 
with example Python PushUps? speeds your learning process. Become a 
self-sufficient Python developer in just five days at PyCamp! PyCamp is 
conducted on the campus of the University of Washington in a state of 
the art high technology classroom.

-- 
Sincerely,

Chris Calloway http://nccoos.org/Members/cbc
office: 3313 Venable Hall   phone: (919) 599-3530
mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599

From simeonf at gmail.com  Tue Jun  7 07:43:41 2011
From: simeonf at gmail.com (Simeon Franklin)
Date: Mon, 6 Jun 2011 22:43:41 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <BANLkTik8=a5jk6GrZY+YJqOhmROq3BFb=g@mail.gmail.com>
References: <4DEDA9D5.1090906@caseyc.net>
	<BANLkTi=9ofi-1_AJFCb=HYb_5RtdT4wbFw@mail.gmail.com>
	<BANLkTiknSXORNZPqKMJXXEVW6P4KBM36NA@mail.gmail.com>
	<BANLkTik8=a5jk6GrZY+YJqOhmROq3BFb=g@mail.gmail.com>
Message-ID: <BANLkTikvX6cahvK2cZLP+NRYvDWUs3Tuug@mail.gmail.com>

I taught a Python Fundamentals class last week for Marakana and
noticed that other programmers coming from languages that are not
specifically functionally oriented were unfamiliar with zip as a
concept. Most explanations of zip tend to focus on the two case (given
two lists it returns paired elements) and the more general Python
documentation explanation was met with thoughtful incomprehension:

>This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.

When I paraphrased this as "zip will take arguments that represent
rows of input data and return a list whose elements are the columns of
the input data" mental lightbulbs went on all over the room. YMMV but
I thought it made for an intuitive explanation... It also leads me to
think more naturally of possible applications of zip and iterative
friends.

-regards
Simeon Franklin

From kwgoodman at gmail.com  Tue Jun  7 18:31:07 2011
From: kwgoodman at gmail.com (Keith Goodman)
Date: Tue, 7 Jun 2011 09:31:07 -0700
Subject: [Baypiggies] [job] Python Job at Hedge Fund
Message-ID: <BANLkTi=Z8sHVmCVGzwhpXqyBnxFpjn2t3Q@mail.gmail.com>

We are looking for help to predict tomorrow's stock returns.

The challenge is model selection in the presence of noisy data. The
tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck,
git.

A quantitative background and experience or interest in model
selection, machine learning, and software development are a plus.

This is a full time position in Berkeley, California, two blocks from
UC Berkeley.

If you are interested send a CV or similar (or questions) to
'.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1]

From mvoorhie at yahoo.com  Tue Jun  7 18:34:41 2011
From: mvoorhie at yahoo.com (Mark Voorhies)
Date: Tue, 7 Jun 2011 09:34:41 -0700
Subject: [Baypiggies] Pythonic way to iterate over two lists?
In-Reply-To: <BANLkTikvX6cahvK2cZLP+NRYvDWUs3Tuug@mail.gmail.com>
References: <4DEDA9D5.1090906@caseyc.net>
	<BANLkTik8=a5jk6GrZY+YJqOhmROq3BFb=g@mail.gmail.com>
	<BANLkTikvX6cahvK2cZLP+NRYvDWUs3Tuug@mail.gmail.com>
Message-ID: <201106070934.41605.mvoorhie@yahoo.com>

On Monday, June 06, 2011 10:43:41 pm Simeon Franklin wrote:
> I taught a Python Fundamentals class last week for Marakana and
> noticed that other programmers coming from languages that are not
> specifically functionally oriented were unfamiliar with zip as a
> concept. Most explanations of zip tend to focus on the two case (given
> two lists it returns paired elements) and the more general Python
> documentation explanation was met with thoughtful incomprehension:
> 
> >This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.
> 
> When I paraphrased this as "zip will take arguments that represent
> rows of input data and return a list whose elements are the columns of
> the input data" mental lightbulbs went on all over the room. YMMV but
> I thought it made for an intuitive explanation... It also leads me to
> think more naturally of possible applications of zip and iterative
> friends.

Yes!
transpose_A = zip(*A)
# if A is, e.g., a rectangular matrix as a list of lists

Thanks for the very useful point of view =)

--Mark

> 
> -regards
> Simeon Franklin
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
> 

From hcarrinski at gmail.com  Tue Jun  7 21:40:09 2011
From: hcarrinski at gmail.com (Hy Carrinski)
Date: Tue, 7 Jun 2011 12:40:09 -0700
Subject: [Baypiggies] Question about breaking out of a loop
Message-ID: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>

I am working on code to solve a combinatorial probability problem, and
plan to send a link to the full code in a few days.

There is generator that yields tuples in a defined order into a loop
that performs a calculation. I would like to provide an option to stop
the calculation when the threshold is reached.

I have put a simplified sample of this on github:
https://gist.github.com/1012945

My questions are:
   1. Is it an antipattern to change a datatype to cause an exception?
   2. If so, how would you improve on my version 3 function?

The function in version 3 is pretty close to my current solution, but
the functions combinations(), f() and g() are standing in for more
computationally intensive functions. The potential antipattern
involves temporarily setting a value to None in a dictionary of
integers.

Thank you,
Hy

From jeremy.r.fishman at gmail.com  Tue Jun  7 22:32:44 2011
From: jeremy.r.fishman at gmail.com (Jeremy Fishman)
Date: Tue, 7 Jun 2011 13:32:44 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
Message-ID: <BANLkTimE0Z2Se5mZdijgCk-nu_RPVeDXSg@mail.gmail.com>

Your second and third definitions aren't really different from each other.
 Both incur a "per-iteration penalty", the first with an if-statement and
the second with a dictionary lookup.

I bet you are not going to get a noticeable speedup over a simple
if-statement check, but an alternative approach is to solve the problem you
are checking for up-front:

>>> # warning: not a proof
...
>>> def count(seq):
...   return sum(1 for e in seq)
...
>>> def f(n, w, t):
...   return (c for c in combinations(range(n), w) if c[0] < t)
...
>>> def g(n, w, t):
...   for i in range(t):
...     for c in combinations(range(i + 1, n), w - 1):
...       yield (i,) + c
...
>>> [count(f(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> [count(g(10, 5, i)) for i in range(5)]
[0, 126, 196, 231, 246]
>>> list(f(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2,
3), (1, 2, 4), (1, 3, 4)]
>>> list(g(5, 3, 2))
[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2,
3), (1, 2, 4), (1, 3, 4)]


  - Jeremy

On Tue, Jun 7, 2011 at 12:40 PM, Hy Carrinski <hcarrinski at gmail.com> wrote:

> I am working on code to solve a combinatorial probability problem, and
> plan to send a link to the full code in a few days.
>
> There is generator that yields tuples in a defined order into a loop
> that performs a calculation. I would like to provide an option to stop
> the calculation when the threshold is reached.
>
> I have put a simplified sample of this on github:
> https://gist.github.com/1012945
>
> My questions are:
>   1. Is it an antipattern to change a datatype to cause an exception?
>   2. If so, how would you improve on my version 3 function?
>
> The function in version 3 is pretty close to my current solution, but
> the functions combinations(), f() and g() are standing in for more
> computationally intensive functions. The potential antipattern
> involves temporarily setting a value to None in a dictionary of
> integers.
>
> Thank you,
> Hy
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110607/51203cb7/attachment.html>

From krid at otisbean.com  Tue Jun  7 22:06:04 2011
From: krid at otisbean.com (Dirk Bergstrom)
Date: Tue, 07 Jun 2011 13:06:04 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
Message-ID: <4DEE84AC.7040401@otisbean.com>

On 06/07/2011 12:40 PM, Hy Carrinski wrote:
> There is generator that yields tuples in a defined order into a loop
> that performs a calculation. I would like to provide an option to stop
> the calculation when the threshold is reached.
> The function in version 3 is pretty close to my current solution, but
> the functions combinations(), f() and g() are standing in for more
> computationally intensive functions.

This seems like a perfect example of premature optimization.  You've got 
a loop with two computationally intensive operations per cycle, and 
you're worried about optimizing away a single if-equals check per cycle. 
  Will that single if statement really make so much difference once you 
put the real (and presumably much more time consuming) functions in place?

-- 
        --------------------------------------
       Dirk Bergstrom           krid at otisbean.com
              http://otisbean.com/

From hcarrinski at gmail.com  Tue Jun  7 22:42:53 2011
From: hcarrinski at gmail.com (Hy Carrinski)
Date: Tue, 7 Jun 2011 13:42:53 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <4DEE84AC.7040401@otisbean.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
	<4DEE84AC.7040401@otisbean.com>
Message-ID: <BANLkTi=_SP13BVHeaoXiakdJ-yqTyG-rkg@mail.gmail.com>

I agree that this optimization is not critical. Premature optimization
is certainly something that I try to avoid, and is an easy path to
follow.

My actual code has gone through a few rounds of refactoring and some
optimization. This is question involves a specific area which my
profiling has shown can introduce around a 10% decrease in runtime
based only on eliminating this conditional. These rounds have involved
effort to maintain and increase clarity. The computationally intensive
functions actually make use of caching so they do not consume much
time.

I did not include many of these details in the original posting.

By introducing the if statement, the runtime of the profiled actual
code increases by around 20%. While that increase is not too
important, the structure with the if statement does not seem right to
me because it introduces the overhead whether or not the threshold
option is exercised. Is there a more Pythonic way to introduce the
break without the commensurate increase in overhead? By the way, I do
think that this loop is the most appropriate place to introduce a
threshold.

w.r.t. Jeremy's recent interesting suggestions. I think that filtering
on the generator may not actually result in StopIteration without
computing the values. My actual generator makes a sequence of only the
values that I care about. After writing it, I found a posting by Tim
Peters from a few years ago at
http://code.activestate.com/recipes/218332/ which uses an algorithm
similar to mine (for the generator). But, please note that link is
pretty far astray from the present questions.

Thank you,
Hy


On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <krid at otisbean.com> wrote:
> On 06/07/2011 12:40 PM, Hy Carrinski wrote:
>>
>> There is generator that yields tuples in a defined order into a loop
>> that performs a calculation. I would like to provide an option to stop
>> the calculation when the threshold is reached.
>> The function in version 3 is pretty close to my current solution, but
>> the functions combinations(), f() and g() are standing in for more
>> computationally intensive functions.
>
> This seems like a perfect example of premature optimization. ?You've got a
> loop with two computationally intensive operations per cycle, and you're
> worried about optimizing away a single if-equals check per cycle. ?Will that
> single if statement really make so much difference once you put the real
> (and presumably much more time consuming) functions in place?
>
> --
> ? ? ? --------------------------------------
> ? ? ?Dirk Bergstrom ? ? ? ? ? krid at otisbean.com
> ? ? ? ? ? ? http://otisbean.com/
>

From jeremy.r.fishman at gmail.com  Tue Jun  7 22:57:09 2011
From: jeremy.r.fishman at gmail.com (Jeremy Fishman)
Date: Tue, 7 Jun 2011 13:57:09 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <BANLkTi=_SP13BVHeaoXiakdJ-yqTyG-rkg@mail.gmail.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
	<4DEE84AC.7040401@otisbean.com>
	<BANLkTi=_SP13BVHeaoXiakdJ-yqTyG-rkg@mail.gmail.com>
Message-ID: <BANLkTinr0qVt1a47h10TEzFi0wprDyk5ig@mail.gmail.com>

Yes thank you Hy for pointing out f() function will not break early.  I am
not sure why I changed the implementation as I had intended f() to be a copy
of your for-loop from loop_fcn_v2() to demonstrate equivalency.

I believe g() generates exactly the values expected and no more.
  - Jeremy


On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski <hcarrinski at gmail.com> wrote:

> I agree that this optimization is not critical. Premature optimization
> is certainly something that I try to avoid, and is an easy path to
> follow.
>
> My actual code has gone through a few rounds of refactoring and some
> optimization. This is question involves a specific area which my
> profiling has shown can introduce around a 10% decrease in runtime
> based only on eliminating this conditional. These rounds have involved
> effort to maintain and increase clarity. The computationally intensive
> functions actually make use of caching so they do not consume much
> time.
>
> I did not include many of these details in the original posting.
>
> By introducing the if statement, the runtime of the profiled actual
> code increases by around 20%. While that increase is not too
> important, the structure with the if statement does not seem right to
> me because it introduces the overhead whether or not the threshold
> option is exercised. Is there a more Pythonic way to introduce the
> break without the commensurate increase in overhead? By the way, I do
> think that this loop is the most appropriate place to introduce a
> threshold.
>
> w.r.t. Jeremy's recent interesting suggestions. I think that filtering
> on the generator may not actually result in StopIteration without
> computing the values. My actual generator makes a sequence of only the
> values that I care about. After writing it, I found a posting by Tim
> Peters from a few years ago at
> http://code.activestate.com/recipes/218332/ which uses an algorithm
> similar to mine (for the generator). But, please note that link is
> pretty far astray from the present questions.
>
> Thank you,
> Hy
>
>
> On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <krid at otisbean.com> wrote:
> > On 06/07/2011 12:40 PM, Hy Carrinski wrote:
> >>
> >> There is generator that yields tuples in a defined order into a loop
> >> that performs a calculation. I would like to provide an option to stop
> >> the calculation when the threshold is reached.
> >> The function in version 3 is pretty close to my current solution, but
> >> the functions combinations(), f() and g() are standing in for more
> >> computationally intensive functions.
> >
> > This seems like a perfect example of premature optimization.  You've got
> a
> > loop with two computationally intensive operations per cycle, and you're
> > worried about optimizing away a single if-equals check per cycle.  Will
> that
> > single if statement really make so much difference once you put the real
> > (and presumably much more time consuming) functions in place?
> >
> > --
> >       --------------------------------------
> >      Dirk Bergstrom           krid at otisbean.com
> >             http://otisbean.com/
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110607/6cc67cdb/attachment.html>

From hcarrinski at gmail.com  Wed Jun  8 02:03:26 2011
From: hcarrinski at gmail.com (Hy Carrinski)
Date: Tue, 7 Jun 2011 17:03:26 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <BANLkTinr0qVt1a47h10TEzFi0wprDyk5ig@mail.gmail.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
	<4DEE84AC.7040401@otisbean.com>
	<BANLkTi=_SP13BVHeaoXiakdJ-yqTyG-rkg@mail.gmail.com>
	<BANLkTinr0qVt1a47h10TEzFi0wprDyk5ig@mail.gmail.com>
Message-ID: <BANLkTikLiqTR1OgafYT65JrjmQWgc9DcbA@mail.gmail.com>

Jeremy's function g() does produce outputs equivalent to the sample
code without using a conditional. Thank you also for including
examples. As a generator, it actually could serve as a three parameter
wrapper for itertools.combinations().

I hope that this present email can better specify the question. I very
much appreciate the good thoughts thus far.

1. The only information we have about the generator is that its
outputs have a defined order. I find this abstraction useful because
it may help any answers to this question to be more generally
applicable, and also contains the complexity of the code for the
generator itself by not considering multiple starting or ending
points.

> There is generator that yields tuples in a defined order into a loop
> that performs a calculation.

2. I wrote the gist in order to answer the primary question:

    Is it an antipattern to change a datatype to cause an exception?

3. I also began to think that itertools.takewhile() might be a better
option, but does not seem to be in this case. First, it introduces a
need for something larger than an integer to compare, for the case
where the threshold is None. Second, while it looked nice, it also
caused a significant slowdown (~50%)

Thanks,
Hy


On Tue, Jun 7, 2011 at 1:57 PM, Jeremy Fishman
<jeremy.r.fishman at gmail.com> wrote:
> Yes thank you Hy for pointing out f() function will not break early. ?I am
> not sure why I changed the implementation as I had intended f() to be a copy
> of your for-loop from?loop_fcn_v2() to demonstrate equivalency.
> I believe g() generates exactly the values expected and no more.
> ? - Jeremy
>
>
> On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski <hcarrinski at gmail.com> wrote:
>>
>> I agree that this optimization is not critical. Premature optimization
>> is certainly something that I try to avoid, and is an easy path to
>> follow.
>>
>> My actual code has gone through a few rounds of refactoring and some
>> optimization. This is question involves a specific area which my
>> profiling has shown can introduce around a 10% decrease in runtime
>> based only on eliminating this conditional. These rounds have involved
>> effort to maintain and increase clarity. The computationally intensive
>> functions actually make use of caching so they do not consume much
>> time.
>>
>> I did not include many of these details in the original posting.
>>
>> By introducing the if statement, the runtime of the profiled actual
>> code increases by around 20%. While that increase is not too
>> important, the structure with the if statement does not seem right to
>> me because it introduces the overhead whether or not the threshold
>> option is exercised. Is there a more Pythonic way to introduce the
>> break without the commensurate increase in overhead? By the way, I do
>> think that this loop is the most appropriate place to introduce a
>> threshold.
>>
>> w.r.t. Jeremy's recent interesting suggestions. I think that filtering
>> on the generator may not actually result in StopIteration without
>> computing the values. My actual generator makes a sequence of only the
>> values that I care about. After writing it, I found a posting by Tim
>> Peters from a few years ago at
>> http://code.activestate.com/recipes/218332/ which uses an algorithm
>> similar to mine (for the generator). But, please note that link is
>> pretty far astray from the present questions.
>>
>> Thank you,
>> Hy
>>
>>
>> On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom <krid at otisbean.com> wrote:
>> > On 06/07/2011 12:40 PM, Hy Carrinski wrote:
>> >>
>> >> There is generator that yields tuples in a defined order into a loop
>> >> that performs a calculation. I would like to provide an option to stop
>> >> the calculation when the threshold is reached.
>> >> The function in version 3 is pretty close to my current solution, but
>> >> the functions combinations(), f() and g() are standing in for more
>> >> computationally intensive functions.
>> >
>> > This seems like a perfect example of premature optimization. ?You've got
>> > a
>> > loop with two computationally intensive operations per cycle, and you're
>> > worried about optimizing away a single if-equals check per cycle. ?Will
>> > that
>> > single if statement really make so much difference once you put the real
>> > (and presumably much more time consuming) functions in place?
>> >
>> > --
>> > ? ? ? --------------------------------------
>> > ? ? ?Dirk Bergstrom ? ? ? ? ? krid at otisbean.com
>> > ? ? ? ? ? ? http://otisbean.com/
>> >
>
>

From mvoorhie at yahoo.com  Wed Jun  8 02:22:14 2011
From: mvoorhie at yahoo.com (Mark Voorhies)
Date: Tue, 7 Jun 2011 17:22:14 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <BANLkTikLiqTR1OgafYT65JrjmQWgc9DcbA@mail.gmail.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
	<BANLkTinr0qVt1a47h10TEzFi0wprDyk5ig@mail.gmail.com>
	<BANLkTikLiqTR1OgafYT65JrjmQWgc9DcbA@mail.gmail.com>
Message-ID: <201106071722.14580.mvoorhie@yahoo.com>

On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote:
> 2. I wrote the gist in order to answer the primary question:
> 
>     Is it an antipattern to change a datatype to cause an exception?

A different way to phrase this might be:
   What are reasonable sentinel patterns?

In Python, None is a reasonable sentinel value in a container of references,
in the same way that a null pointer is a reasonable sentinel value in C/C++.

It is also reasonable to use an exception to handle an "exceptional" case of
control flow (encountering the sentinel value), and you've shown that this
doesn't introduce overhead in Python.

So, I don't think there's anything inherently objectionable about your implementation
(comments about premature optimization notwithstanding).  It might be useful to
think of what you're doing as the special case: "marking a reference as null" 
rather than the more general and potentially hackier: "changing a datatype".

Mark


From shally at indosys.com  Thu Jun  9 03:02:52 2011
From: shally at indosys.com (Shally Singh)
Date: Wed, 8 Jun 2011 18:02:52 -0700
Subject: [Baypiggies] Please add to mailing list
Message-ID: <00c501cc2640$f73ec430$e5bc4c90$@com>

 
Thanks & Regards, 

Shally Singh
Sr. Recruiter
Indosys Corporation
408-627-8008
 <mailto:shally at indosys.com> shally at indosys.com
 <http://www.indosys.com/> www.indosys.com

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110608/b698b22d/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: pythonjob.txt
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110608/b698b22d/attachment.txt>

From hcarrinski at gmail.com  Thu Jun  9 07:02:56 2011
From: hcarrinski at gmail.com (Hy Carrinski)
Date: Wed, 8 Jun 2011 22:02:56 -0700
Subject: [Baypiggies] Question about breaking out of a loop
In-Reply-To: <201106071722.14580.mvoorhie@yahoo.com>
References: <BANLkTinN=M2y=BB4Dq_t6owfqts5eXBeaA@mail.gmail.com>
	<BANLkTinr0qVt1a47h10TEzFi0wprDyk5ig@mail.gmail.com>
	<BANLkTikLiqTR1OgafYT65JrjmQWgc9DcbA@mail.gmail.com>
	<201106071722.14580.mvoorhie@yahoo.com>
Message-ID: <BANLkTin2beBu_R7VRg5PzFCAk-423Mht9g@mail.gmail.com>

Thank you for the advice. I have updated the gist to include each of
the suggestions and to serve as a set of examples rather than a
question. Finally, I found that this may be a good application for
itertools.groupby().

https://gist.github.com/1012945

Thanks,
Hy

On Tue, Jun 7, 2011 at 5:22 PM, Mark Voorhies <mvoorhie at yahoo.com> wrote:
> On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote:
>> 2. I wrote the gist in order to answer the primary question:
>>
>> ? ? Is it an antipattern to change a datatype to cause an exception?
>
> A different way to phrase this might be:
> ? What are reasonable sentinel patterns?
>
> In Python, None is a reasonable sentinel value in a container of references,
> in the same way that a null pointer is a reasonable sentinel value in C/C++.
>
> It is also reasonable to use an exception to handle an "exceptional" case of
> control flow (encountering the sentinel value), and you've shown that this
> doesn't introduce overhead in Python.
>
> So, I don't think there's anything inherently objectionable about your implementation
> (comments about premature optimization notwithstanding). ?It might be useful to
> think of what you're doing as the special case: "marking a reference as null"
> rather than the more general and potentially hackier: "changing a datatype".
>
> Mark
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From annaraven at gmail.com  Thu Jun  9 07:57:33 2011
From: annaraven at gmail.com (Anna Ravenscroft)
Date: Wed, 8 Jun 2011 22:57:33 -0700
Subject: [Baypiggies] What hourly rate are you charging a startup?
Message-ID: <BANLkTik28_skt1CNFF3FQhsNsqarPWSiww@mail.gmail.com>

Hi folks:

I'm going to be chatting with a startup next week. I'd love to hear
what hourly rate folks are charging startups these days. (I can also
ask for options, so I'd like some ballpark range on the cash portion
of the comp. )

I'd love to hear from new programmers, as well as experienced
consultants, to get a good range.

Please contact me offlist and I promise to keep your answers confidential.

-- 
cordially,
Anna

From jason at mischievous.org  Thu Jun  9 09:16:58 2011
From: jason at mischievous.org (Jason Culverhouse)
Date: Thu, 9 Jun 2011 00:16:58 -0700
Subject: [Baypiggies] What hourly rate are you charging a startup?
In-Reply-To: <BANLkTik28_skt1CNFF3FQhsNsqarPWSiww@mail.gmail.com>
References: <BANLkTik28_skt1CNFF3FQhsNsqarPWSiww@mail.gmail.com>
Message-ID: <A0E64848-438B-47ED-85AA-15D248243000@mischievous.org>

On Jun 8, 2011, at 10:57 PM, Anna Ravenscroft wrote:

> Hi folks:
> 
> I'm going to be chatting with a startup next week. I'd love to hear
> what hourly rate folks are charging startups these days. (I can also
> ask for options, so I'd like some ballpark range on the cash portion
> of the comp. )

It's hard to guess cash at a startup, it depends on their funding.
If you work for stock, I would start at a rate where working 40 hours
a week for a year allowed me to accrue 1% of the company.

> I'd love to hear from new programmers, as well as experienced
> consultants, to get a good range.
> 
> Please contact me offlist and I promise to keep your answers confidential.

Jason

From dineshbvadhia at hotmail.com  Thu Jun  9 12:14:55 2011
From: dineshbvadhia at hotmail.com (Dinesh B Vadhia)
Date: Thu, 9 Jun 2011 03:14:55 -0700
Subject: [Baypiggies] What hourly rate are you charging a startup?
Message-ID: <COL103-DS2420A79371F8C44241DEECA3650@phx.gbl>

a. Do developers work for stock only these days (post dot.com bubble) in the Bay Area?

b. Doesn't a minimum wage have to be paid even when working for stock only in California?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110609/9c5500e7/attachment.html>

From camembert at gmail.com  Thu Jun  9 17:34:40 2011
From: camembert at gmail.com (Elizabeth Leddy)
Date: Thu, 09 Jun 2011 08:34:40 -0700
Subject: [Baypiggies] What hourly rate are you charging a startup?
In-Reply-To: <COL103-DS2420A79371F8C44241DEECA3650@phx.gbl>
References: <COL103-DS2420A79371F8C44241DEECA3650@phx.gbl>
Message-ID: <4DF0E810.8000202@gmail.com>

On 6/9/11 3:14 AM, Dinesh B Vadhia wrote:
> a. Do developers work for stock only these days (post dot.com bubble) 
> in the Bay Area?
Base salary plus stock is the new "poor startup". And from what I can 
tell the base salary still has to be pretty high.
> b. Doesn't a minimum wage have to be paid even when working for stock 
> only in California?
Pretty sure it does but the SBA would be a better reference for that.

Liz
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies


-- 
Elizabeth Leddy
elizabeth.leddy at gmail.com
707.776.6797

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110609/4fdcf512/attachment.html>

From keith at dart.us.com  Fri Jun 17 02:55:36 2011
From: keith at dart.us.com (Keith Dart)
Date: Thu, 16 Jun 2011 17:55:36 -0700
Subject: [Baypiggies] Looking for a Python automation developer, contract job
Message-ID: <20110616175536.6f390ec1@dart.us.com>

Greetings everyone,

I'm currently working a contract job at Thales e-security. We need
another person to write automated test cases and tools. The test cases
and tools will of course be written in Python. This is a contract job
(at least for now), and you may have to sign up with Oxford and
Associates.

If you're interested please contact me. The requirements are as follows.

Required:
* Reasonably proficient with Python
* Familiar with OO concepts

Nice to have:
* Familiarity with test plans, test cases, and automated
  testing.
* Proficient with Unix, especially Linux.
* Have some knowledge of data networks.
* Knowledge of cryptography

The product is a crypto key management product being developed by
Thales e-security.


-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   Keith Dart <keith at dart.us.com>
   =====================================================================

From glen at glenjarvis.com  Fri Jun 17 04:24:14 2011
From: glen at glenjarvis.com (Glen Jarvis)
Date: Thu, 16 Jun 2011 19:24:14 -0700
Subject: [Baypiggies] The company I work at is hiring like gangbusters
Message-ID: <BANLkTim0gn9nXmwhM3psAJVd6YthkWJrsQ@mail.gmail.com>

Although I posted something similar before, I want to throw one more out
there. The company I'm with is aggressively hiring. I participated in four
interviews today and have two more for tomorrow - and that's just me..

I'd love to see some BayPIGgie caliber step up to the plate... especially
since I'd be working fairly closely with the people that we hire...


W00t


Glen
P.S. Please contact me off list.
-- 
Things which matter most must never be at the mercy of things which matter
least.

-- Goethe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110616/63bc5d64/attachment.html>

From gracelaw at mac.com  Sat Jun 18 03:56:21 2011
From: gracelaw at mac.com (Grace Law)
Date: Fri, 17 Jun 2011 18:56:21 -0700
Subject: [Baypiggies] JOB: Python Server / Scalability Engineer for online
 games with 10+M users
Message-ID: <C2E38688-5FE9-46F3-858A-187A5C4440B9@mac.com>

Hi there

My company is looking to add 2 more python server engineers in SF- see below for more info and feel free to pass this along.    Aside from being a python fan, you should like small teams, smart engineers, fun and collaborative work environment, optimized codes, and iterating quickly.  

Cheers
Grace

------------------------


Do you want to write and scale high availability servers in Python handling millions of users?

Do you want to work with smart and fun people and make an impact in the social gaming industry?

If so, our _server_ team want to talk to you.

About Lolapps:

- Our 20 engineers are responsible for Ravenwood Fair on Facebook with 10+ millions of users
- Our **3** people server team is dealing with  100s of servers handling 12K simultaneous requests and growing quickly.
- Our core technology stack consists of Python(Pylons), AS3, MySQL, and Mongo.
- We believe in small teams, smart engineers, fun and collaborative work environment, optimized codes, and iterating quickly.  
- People say we have the best engineers in the gaming space in the bay area.  We say, that can't be true.  There are tons of smart people and we want to work with more of them so we can all grow.
- People say our game run faster and play better than Zynga's.  We like it.

We're looking for 2 more server / performance engineers to work onsite in SF with our fun team.

About you:

- a go getter and a team player
- can bang out high quality codes quickly and have personal side projects
- Love python and can code it in your sleep
- Superior knowledge of Linux, scripting, and SQL
- Understand when MySQL is great and experiment with NoSQL solutions (Memcached/MongoDB/Redis/Cassandra)
- Know how to put together a web-application stack (We use Pylons/Paste.)
- Strong in CS, have the capacity / experience to work with tons of data, write caching solutions, deal with scalability challenges of high transaction, high availability servers, tinker on real-time solutions.
- Enjoy bouncing ideas off of your teammates to build up solutions no one person could of thought up by himself
- Care about your implementations and find yourself compulsively checking that your latest experimental deploy is working the way you thought it would
- Prefer the pace and excitement of building consumer internet applications over enterprise solutions
- Definitely prefers making an impact at start-ups

You'll get to:

- Work in a 55 people company, strategically positioned in an innovative space that is expanding into a billion dollar industry.
- Work with 3 really smart software engineers and own the infrastructure of very high transactions servers
- Design and implement large chunks of scalability features that will take Lolapps' games to the next level.
- Help make key infrastructure decisions (databases, replication layouts, caching solutions, etc.)
- Experiment with the newest emerging open-source technologies.
- Test your ideas and strategies out on millions of users and enormous data sets.
- Have fun. Play ping pong, foosball, video games
- Eat. We buy your lunches.
- Be healthy. We offer free pilates classes onsite.


Sounds intriguing?

Play our latest game and see why millions are returning to Ravenwood Fair, one of Facebook's Top Social Games: (http://www.facebook.com/RavenwoodFair)

To apply:

http://hire.jobvite.com/j/?aj=oXmLVfwJ&s=BayPIGgies

or Write to grace at lolapps.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110617/0c65c224/attachment.html>

From david at zachary.com  Tue Jun 21 00:38:16 2011
From: david at zachary.com (David Creemer)
Date: Mon, 20 Jun 2011 15:38:16 -0700
Subject: [Baypiggies] (off-topic) data wiring contractors?
Message-ID: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com>

Hi Folks -- sorry for the mostly off topic post.

Does anyone have any experience with local data wiring / networking contractors? My startup is looking into new office spaces, and I'm trying to get an idea of the costs associated with running cable, setting up patch-panels, etc. I'd very much appreciate any recommendations and information.

Thanks!
-- David

From kpguy1975 at gmail.com  Tue Jun 21 16:08:42 2011
From: kpguy1975 at gmail.com (Vikram K)
Date: Tue, 21 Jun 2011 10:08:42 -0400
Subject: [Baypiggies] itemgetter function unavailable in linux
Message-ID: <BANLkTin849jLrUM_5RdC=oL+XZ3xRrmASQ@mail.gmail.com>

I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]]

I need to sort this nested list based on the second element of each element
in the nested list so that i end up with:

[['cat',5], ['dragon',7], ['dog',10]]

This used to be easy. Just use the itemgetter function in the itertools
module. But on my linux machine, to my horror, there is no itemgetter
function to be found in the itertools module.

-----

Python 2.7.2 (default, Jun 21 2011, 09:56:35)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import itertools
>>> from itertools import itemgetter
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name itemgetter
>>> dir(itertools)
['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations',
'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile',
'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip',
'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile',
'tee']
>>>

----------
I am now going to grind away and do it the hard way, but can someone tell me
why the itemgetter function is not available on linux although i have been
using it when programming in windows?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110621/e4b9211f/attachment.html>

From kwgoodman at gmail.com  Tue Jun 21 16:22:49 2011
From: kwgoodman at gmail.com (Keith Goodman)
Date: Tue, 21 Jun 2011 07:22:49 -0700
Subject: [Baypiggies] itemgetter function unavailable in linux
In-Reply-To: <BANLkTin849jLrUM_5RdC=oL+XZ3xRrmASQ@mail.gmail.com>
References: <BANLkTin849jLrUM_5RdC=oL+XZ3xRrmASQ@mail.gmail.com>
Message-ID: <BANLkTinHc5MdnOaEHj5muutOidN1oUDYFw@mail.gmail.com>

On Tue, Jun 21, 2011 at 7:08 AM, Vikram K <kpguy1975 at gmail.com> wrote:
> I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]]
>
> I need to sort this nested list based on the second element of each element
> in the nested list so that i end up with:
>
> [['cat',5], ['dragon',7], ['dog',10]]
>
> This used to be easy. Just use the itemgetter function in the itertools
> module. But on my linux machine, to my horror, there is no itemgetter
> function to be found in the itertools module.

Is this the one you want: from operator import itemgetter?

From kpguy1975 at gmail.com  Tue Jun 21 16:23:57 2011
From: kpguy1975 at gmail.com (Vikram K)
Date: Tue, 21 Jun 2011 10:23:57 -0400
Subject: [Baypiggies] itemgetter function unavailable in linux
In-Reply-To: <BANLkTinHc5MdnOaEHj5muutOidN1oUDYFw@mail.gmail.com>
References: <BANLkTin849jLrUM_5RdC=oL+XZ3xRrmASQ@mail.gmail.com>
	<BANLkTinHc5MdnOaEHj5muutOidN1oUDYFw@mail.gmail.com>
Message-ID: <BANLkTintmVC8czeJPkkC-WrsAgEj=DXu_Q@mail.gmail.com>

That's correct. Thanks.

On Tue, Jun 21, 2011 at 10:22 AM, Keith Goodman <kwgoodman at gmail.com> wrote:

> On Tue, Jun 21, 2011 at 7:08 AM, Vikram K <kpguy1975 at gmail.com> wrote:
> > I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]]
> >
> > I need to sort this nested list based on the second element of each
> element
> > in the nested list so that i end up with:
> >
> > [['cat',5], ['dragon',7], ['dog',10]]
> >
> > This used to be easy. Just use the itemgetter function in the itertools
> > module. But on my linux machine, to my horror, there is no itemgetter
> > function to be found in the itertools module.
>
> Is this the one you want: from operator import itemgetter?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110621/643b9dd9/attachment.html>

From bdbaddog at gmail.com  Thu Jun 23 00:19:14 2011
From: bdbaddog at gmail.com (William Deegan)
Date: Wed, 22 Jun 2011 15:19:14 -0700
Subject: [Baypiggies] off topic,
	but maybe interesting to the group.. C++ presentation by Herb
	Sutter 6/29 in Santa Clara
Message-ID: <4CCC1430-74E6-4ACF-B2F3-EFAB978E3638@gmail.com>

http://blogs.msdn.com/b/matt-harrington/archive/2011/06/08/herb-sutter-c-now-and-forever-june-29-in-santa-clara.aspx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110622/7b5a0cd6/attachment.html>

From simeonf at gmail.com  Thu Jun 23 02:43:25 2011
From: simeonf at gmail.com (Simeon Franklin)
Date: Wed, 22 Jun 2011 17:43:25 -0700
Subject: [Baypiggies] Aaron Maxwell offering a ride
Message-ID: <BANLkTimekPGRUunEBYUrVy_HEwp-xJi5sw@mail.gmail.com>

Aaron Maxwell is offering a ride from SF to Baypiggies - for some reason his
message bounced as spam and given that this is time sensitive and I don't
see any spam filter management features in mailman I'm just forwarding it on
to the list myself (see below). For rides please contact Aaron at
amax at redsymbol.net

-regards
Simeon Franklin

------

Hi all,

I'm going to be driving in from San Francisco for this month's meeting, and
have space for at least a couple of people.  If you'd like a ride there and
back, contact me off list.

--
Aaron Maxwell
http://redsymbol.net/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110622/7000f21f/attachment.html>

From bibha.tripathi at jpmchase.com  Thu Jun 23 15:03:05 2011
From: bibha.tripathi at jpmchase.com (Tripathi, Bibha)
Date: Thu, 23 Jun 2011 14:03:05 +0100
Subject: [Baypiggies] sorting a table by column
Message-ID: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>

a huge table like an excel sheet, saved and accumulating more data more rows, may be more tables
user chooses which column to sort on

what's the best python data structure to use? and which sorting method to make it look like real time as the user enters her choice of column to sort on?

thanks.


This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110623/1c1badbd/attachment.html>

From kwgoodman at gmail.com  Thu Jun 23 15:40:35 2011
From: kwgoodman at gmail.com (Keith Goodman)
Date: Thu, 23 Jun 2011 06:40:35 -0700
Subject: [Baypiggies] sorting a table by column
In-Reply-To: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
Message-ID: <BANLkTin54dyKjnr8FhWwk09fjTX0JubveA@mail.gmail.com>

On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha
<bibha.tripathi at jpmchase.com> wrote:
> a huge table like an excel sheet, saved and accumulating more data more
> rows, may be more tables
>
> user chooses which column to sort on
>
> what's the best python data structure to use? and which sorting method to
> make it look like real time as the user enters her choice of column to sort
> on?

I haven't tried it, but you may want to take a look at tabular:
http://pypi.python.org/pypi/tabular

From david.berthelot at gmail.com  Thu Jun 23 15:47:40 2011
From: david.berthelot at gmail.com (David Berthelot)
Date: Thu, 23 Jun 2011 06:47:40 -0700
Subject: [Baypiggies] sorting a table by column
In-Reply-To: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
Message-ID: <BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>

Looks like a typical SQL problem.

If I was to solve it in Python, assuming that due to computational
time motivations the data cannot be resorted completely with:
sort(key=itemgetter(column))

Then I would keep the table unsorted and I would create an index
structure per column.
index = [[] for c in xrange(columns)]

When a row is added to the data table, I would add it to the index
lists in sorted manner using the bisect module:
row_id = len(data)
data.append(row)
for c in xrange(columns):
  bisect.insort_right(index[c],(data[row_id][c],row_id))

To lookup the table in sorted order according to column c, you would
get the table indexes:
ilist = map(itemgetter(1),index[c])
for x in ilist:
  print data[x]

I just typed this on top of my head, so it's more to give the general
principle than a robust implementation obviously.

Similarly you could implement multi-column indexes, by replacing the
tuple (data[row_id][x],row_id) with
(data[row_id][col_1],data[row_id][col_2],...,data[row_id][col_n],row_id)
assuming you desire a multi-column index on col_1,...,col_n

On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha
<bibha.tripathi at jpmchase.com> wrote:
> a huge table like an excel sheet, saved and accumulating more data more
> rows, may be more tables
>
> user chooses which column to sort on
>
>
>
> what's the best python data structure to use? and which sorting method to
> make it look like real time as the user enters her choice of column to sort
> on?
>
>
>
> thanks.
>
> This communication is for informational purposes only. It is not intended as
> an offer or solicitation for the purchase or sale of any financial
> instrument or as an official confirmation of any transaction. All market
> prices, data and other information are not warranted as to completeness or
> accuracy and are subject to change without notice. Any comments or
> statements made herein do not necessarily reflect those of JPMorgan Chase &
> Co., its subsidiaries and affiliates. This transmission may contain
> information that is privileged, confidential, legally privileged, and/or
> exempt from disclosure under applicable law. If you are not the intended
> recipient, you are hereby notified that any disclosure, copying,
> distribution, or use of the information contained herein (including any
> reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any
> attachments are believed to be free of any virus or other defect that might
> affect any computer system into which it is received and opened, it is the
> responsibility of the recipient to ensure that it is virus free and no
> responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and
> affiliates, as applicable, for any loss or damage arising in any way from
> its use. If you received this transmission in error, please immediately
> contact the sender and destroy the material in its entirety, whether in
> electronic or hard copy format. Thank you. Please refer to
> http://www.jpmorgan.com/pages/disclosures for disclosures relating to
> European legal entities.
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From david.berthelot at gmail.com  Thu Jun 23 16:05:44 2011
From: david.berthelot at gmail.com (David Berthelot)
Date: Thu, 23 Jun 2011 07:05:44 -0700
Subject: [Baypiggies] sorting a table by column
In-Reply-To: <BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>
References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
	<BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>
Message-ID: <BANLkTim6TW-2VQqxzir_B-9E7b5ieANpQw@mail.gmail.com>

Alternatively to bisect module which has log2(n) insertion cost, you
could look into B+trees which have an insertion cost of logB(n):
http://en.wikipedia.org/wiki/B%2B_tree

There's a Python implementation linked on that page. I used it before
and it seemed to have quite some problems while the performance over
bisect was non-existent (for my particular needs). But that being
said, it's worth checking.

On Thu, Jun 23, 2011 at 6:47 AM, David Berthelot
<david.berthelot at gmail.com> wrote:
> Looks like a typical SQL problem.
>
> If I was to solve it in Python, assuming that due to computational
> time motivations the data cannot be resorted completely with:
> sort(key=itemgetter(column))
>
> Then I would keep the table unsorted and I would create an index
> structure per column.
> index = [[] for c in xrange(columns)]
>
> When a row is added to the data table, I would add it to the index
> lists in sorted manner using the bisect module:
> row_id = len(data)
> data.append(row)
> for c in xrange(columns):
> ?bisect.insort_right(index[c],(data[row_id][c],row_id))
>
> To lookup the table in sorted order according to column c, you would
> get the table indexes:
> ilist = map(itemgetter(1),index[c])
> for x in ilist:
> ?print data[x]
>
> I just typed this on top of my head, so it's more to give the general
> principle than a robust implementation obviously.
>
> Similarly you could implement multi-column indexes, by replacing the
> tuple (data[row_id][x],row_id) with
> (data[row_id][col_1],data[row_id][col_2],...,data[row_id][col_n],row_id)
> assuming you desire a multi-column index on col_1,...,col_n
>
> On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha
> <bibha.tripathi at jpmchase.com> wrote:
>> a huge table like an excel sheet, saved and accumulating more data more
>> rows, may be more tables
>>
>> user chooses which column to sort on
>>
>>
>>
>> what's the best python data structure to use? and which sorting method to
>> make it look like real time as the user enters her choice of column to sort
>> on?
>>
>>
>>
>> thanks.
>>
>> This communication is for informational purposes only. It is not intended as
>> an offer or solicitation for the purchase or sale of any financial
>> instrument or as an official confirmation of any transaction. All market
>> prices, data and other information are not warranted as to completeness or
>> accuracy and are subject to change without notice. Any comments or
>> statements made herein do not necessarily reflect those of JPMorgan Chase &
>> Co., its subsidiaries and affiliates. This transmission may contain
>> information that is privileged, confidential, legally privileged, and/or
>> exempt from disclosure under applicable law. If you are not the intended
>> recipient, you are hereby notified that any disclosure, copying,
>> distribution, or use of the information contained herein (including any
>> reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any
>> attachments are believed to be free of any virus or other defect that might
>> affect any computer system into which it is received and opened, it is the
>> responsibility of the recipient to ensure that it is virus free and no
>> responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and
>> affiliates, as applicable, for any loss or damage arising in any way from
>> its use. If you received this transmission in error, please immediately
>> contact the sender and destroy the material in its entirety, whether in
>> electronic or hard copy format. Thank you. Please refer to
>> http://www.jpmorgan.com/pages/disclosures for disclosures relating to
>> European legal entities.
>>
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>>
>

From Chris.Clark at ingres.com  Thu Jun 23 19:31:44 2011
From: Chris.Clark at ingres.com (Chris Clark)
Date: Thu, 23 Jun 2011 10:31:44 -0700
Subject: [Baypiggies] sorting a table by column
In-Reply-To: <BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>
References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
	<BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>
Message-ID: <4E037880.9040100@ingres.com>

Tripathi, Bibha wrote:
>
> a huge table like an excel sheet, saved and accumulating more data 
> more rows, may be more tables
> user chooses which column to sort on
>
> what's the best python data structure to use?

It probably depends on what "huge" means. If the "table" data fits in 
memory (either physical or virtual) it probably isn't that big and doing 
operations in Python (e.g. using list comprehension) is probably 
appropriate. If it doesn't fit in memory you can't easily use list 
comprehension and need to look into loops and generators, see 
http://danielrech.net/2011/python-generators-presentation-by-david-beazley/ 
(I think Alex might have done something similar at PyCon a few years ago 
too).

There are some third party libs on PyPi that are worth checking out. I 
keep wanting to find an excuse to kick the tires on 
http://pypi.python.org/pypi/blist/ but I've not had cause to do so yet. 
Search PyPi for "tree" and there are a lot of hits.

You can even use good old Schwartzian transforms (aka 
decorate-sort-undecorate) to handle changes in columns if for some 
reason there isn't a key argument to sort() provided by the structure 
you choose.


David Berthelot wrote:
> Looks like a typical SQL problem.
>   

Agreed, without more information this sounds like a classic "ORDERY BY" 
clause on a SELECT statement. Relational database really excel, lower 
case "e", rather than upper case "E" :-), at this sort of thing..... If 
you check my email address domain name, of course I'm going to say that 
;-) Databases do a lot of the heavy lifting for you.

If you're doing some sort of BI analytics a database is probably your 
best bet (I'm taking a guess you are based on your email address domain 
name). Shameless promotion, take a gander at 
http://www.thevirtualcircle.com/2011/02/vectorwise-theres-a-disturbance-in-the-force/ 
and http://www.ingres.com/products/vectorwise (I don't work on 
Vectorwise but I'm always blown away at how fast it is). If you are 
avoiding a traditional DBMS for performance reasons VW may well surprise 
you..

Chris


From cappy2112 at gmail.com  Fri Jun 24 00:57:11 2011
From: cappy2112 at gmail.com (Tony Cappellini)
Date: Thu, 23 Jun 2011 15:57:11 -0700
Subject: [Baypiggies] Looking for 1 reviewer to review an ebook copy of The
 Python Standard Library by Example" by Doug Hellmann.
Message-ID: <BANLkTik6W6i-YPxV1pgKmJqanCQuVeu=ag@mail.gmail.com>

Pearson is looking for *** 1 *** reviewer to review an eBook version of The
Python Standard Library by Example"  by Doug Hellmann.

Please reply OFF-LIST if interested.

Thanks

"The Python Standard Library by Example" (Doug Hellmann), published by
Addison-Wesley Professional, June 2011, Copyright 2011 Pearson Education,
Inc.

Publisher page:

www.informit.com/title/0321767349

Introduction 1

 ** Chapter 1: Text (page 3)

1.1 string?Text Constants and Templates

1.2 textwrap?Formatting Text Paragraphs

1.3 re?Regular Expressions

1.4 difflib?Compare Sequences

** Chapter 2: Data Structures (page 69)

2.1 collections?Container Data Types

2.2 array?Sequence of Fixed-Type Data

2.3 heapq?Heap Sort Algorithm

2.4 bisect?Maintain Lists in Sorted Order

2.5 Queue?Thread-Safe FIFO Implementation

2.6 struct?Binary Data Structures

2.7 weakref?Impermanent References to Objects

2.8 copy?Duplicate Objects

2.9 pprint?Pretty-Print Data Structures

 **Chapter 3: Algorithms (page 129)

3.1 functools?Tools for Manipulating Functions

3.2 itertools?Iterator Functions

3.3 operator?Functional Interface to Built-in Operators

3.4 contextlib?Context Manager Utilities

 Chapter 4: Dates and Times (page 173)

4.1 time?Clock Time 173

4.2 datetime?Date and Time Value Manipulation 180

4.3 calendar?Work with Dates 191

**Chapter 5: Mathematics (page 197)

5.1 decimal?Fixed and Floating-Point Math

5.2 fractions?Rational Numbers

5.3 random?Pseudorandom Number Generators

5.4 math?Mathematical Functions

** Chapter 6: The File System (page 247)

6.1 os.path?Platform-Independent Manipulation of Filenames

6.2 glob?Filename Pattern Matching

6.3 linecache?Read Text Files Efficiently

6.4 tempfile?Temporary File System Objects

6.5 shutil?High-Level File Operations

6.6 mmap?Memory-Map Files

6.7 codecs?String Encoding and Decoding

6.8 StringIO?Text Buffers with a File-like API

6.9 fnmatch?UNIX-Style Glob Pattern Matching

6.10 dircache?Cache Directory Listings

6.11 filecmp?Compare Files

** Chapter 7: Data Persistence and Exchange (page 333)

7.1 pickle?Object Serialization

7.2 shelve?Persistent Storage of Objects

7.3 anydbm?DBM-Style Databases

7.4 whichdb?Identify DBM-Style Database Formats

7.5 sqlite3?Embedded Relational Database

7.6 xml.etree.ElementTree?XML Manipulation API

7.7 csv?Comma-Separated Value Files

** Chapter 8: Data Compression and Archiving (page 421)

8.1 zlib?GNU zlib Compression

8.2 gzip?Read and Write GNU Zip Files

8.3 bz2?bzip2 Compression

8.4 tarfile?Tar Archive Access

8.5 zipfile?ZIP Archive Access

** Chapter 9: Cryptography (page 469)

9.1 hashlib?Cryptographic Hashing

9.2 hmac?Cryptographic Message Signing and Verification

** Chapter 10: Processes and Threads (page 481)

10.1 subprocess?Spawning Additional Processes

10.2 signal?Asynchronous System Events

10.3 threading?Manage Concurrent Operations

10.4 multiprocessing?Manage Processes like Threads

 ** Chapter 11: Networking (page 561)

11.1 socket?Network Communication

11.2 select?Wait for I/O Efficiently

11.3 SocketServer?Creating Network Servers

11.4 asyncore?Asynchronous I/O

11.5 asynchat?Asynchronous Protocol Handler

 Chapter 12: The Internet (page 637)

12.1 urlparse?Split URLs into Components

12.2 BaseHTTPServer?Base Classes for Implementing Web Servers

12.3 urllib?Network Resource Access

12.4 urllib2?Network Resource Access

12.5 base64?Encode Binary Data with ASCII

12.6 robotparser?Internet Spider Access Control

12.7 Cookie?HTTP Cookies

12.8 uuid?Universally Unique Identifiers

12.9 json?JavaScript Object Notation

12.10 xmlrpclib?Client Library for XML-RPC

12.11 SimpleXMLRPCServer?An XML-RPC Server

 ** Chapter 13: Email (page 727)

13.1 smtplib?Simple Mail Transfer Protocol Client

13.2 smtpd?Sample Mail Servers

13.3 imaplib?IMAP4 Client Library

13.4 mailbox?Manipulate Email Archives

**Chapter 14: Application Building Blocks (page 769)

14.1 getopt?Command-Line Option Parsing

14.2 optparse?Command-Line Option Parser

14.3 argparse?Command-Line Option and Argument Parsing

14.4 readline?The GNU Readline Library

14.5 getpass?Secure Password Prompt

14.6 cmd?Line-Oriented Command Processors

14.7 shlex?Parse Shell-Style Syntaxes

14.8 ConfigParser?Work with Configuration Files

14.9 logging?Report Status, Error, and Informational Messages

14.10 fileinput?Command-Line Filter Framework

14.11 atexit?Program Shutdown Callbacks

14.12 sched?Timed Event Scheduler

** Chapter 15: Internationalization and Localization (page 899)

15.1 gettext?Message Catalogs

15.2 locale?Cultural Localization API

 ** Chapter 16: Developer Tools (page 919)

16.1 pydoc?Online Help for Modules

16.2 doctest?Testing through Documentation

16.3 unittest?Automated Testing Framework

16.4 traceback?Exceptions and Stack Traces

16.5 cgitb?Detailed Traceback Reports

16.6 pdb?Interactive Debugger

16.7 trace?Follow Program Flow

16.8 profile and pstats?Performance Analysis

16.9 timeit?Time the Execution of Small Bits of Python Code

16.10 compileall?Byte-Compile Source Files

16.11 pyclbr?Class Browser

** Chapter 17: Runtime Features (page 1045)

17.1 site?Site-Wide Configuration

17.2 sys?System-Specific Configuration

17.3 os?Portable Access to Operating System Specific Features

17.4 platform?System Version Information

17.5 resource?System Resource Management

17.6 gc?Garbage Collector

17.7 sysconfig?Interpreter Compile-Time Configuration

 ** Chapter 18: Language Tools (page 1169)

18.1 warnings?Nonfatal Alerts

18.2 abc?Abstract Base Classes

18.3 dis?Python Bytecode Disassembler

18.4 inspect?Inspect Live Objects

18.5 exceptions?Built-in Exception Classes

** Chapter 19: Modules and Packages (page 1235)

19.1 imp?Python?s Import Mechanism

19.2 zipimport?Load Python Code from ZIP Archives

19.3 pkgutil?Package Utilities

 *Index of Python Modules (page 1259)*

Index (page 1261)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110623/908edf5c/attachment-0001.html>

From cappy2112 at gmail.com  Fri Jun 24 01:29:11 2011
From: cappy2112 at gmail.com (Tony Cappellini)
Date: Thu, 23 Jun 2011 16:29:11 -0700
Subject: [Baypiggies] Reviewer found for The Python Standard Library by
 Example- no more replies are necessary
Message-ID: <BANLkTikrW_HnicpvrsOH+B5fSPpgP5LV4g@mail.gmail.com>

Reviewer found for The Python Standard Library by Example- no more replies
are necessary

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110623/225a8ac5/attachment.html>

From jim at systemateka.com  Fri Jun 24 02:28:38 2011
From: jim at systemateka.com (jim)
Date: Thu, 23 Jun 2011 17:28:38 -0700
Subject: [Baypiggies] Ride down and back from SF to Dan Robert's PyPy talk
Message-ID: <1308875318.1681.19.camel@jim-LAPTOP>


    I'm at Noisebridge. It's about 5:30. I'll leave for 
the BayPIGgies meeting around 6:00 PM. You wanna ride 
down and back, ask via email between now and 6 PM. 
jim 


From jim at well.com  Fri Jun 24 20:44:19 2011
From: jim at well.com (jim)
Date: Fri, 24 Jun 2011 11:44:19 -0700
Subject: [Baypiggies] (off-topic) data wiring contractors?
In-Reply-To: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com>
References: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com>
Message-ID: <1308941059.1736.10.camel@jim-LAPTOP>


    If the job is primarily screwing equipment into racks 
and labelling and pulling cables to switches for local 
groups, Systemateka can do much of the work at $40 per 
hour. 
    This assumes appliances and standard configuration 
for firewalls, gateways, and network configuration. 
    Simple architecture (usually the case) is between $60 
and $80 per hour. 
    Some specialized jobs such as custom firewalls and 
special network configuration may be billed at $80 per 
hour or more. 

    Without knowing the number of seats and boxes and 
so forth, there's no way to estimate the total cost of 
the job. 


On Mon, 2011-06-20 at 15:38 -0700, David Creemer wrote:
> Hi Folks -- sorry for the mostly off topic post.
> 
> Does anyone have any experience with local data wiring / networking contractors? My startup is looking into new office spaces, and I'm trying to get an idea of the costs associated with running cable, setting up patch-panels, etc. I'd very much appreciate any recommendations and information.
> 
> Thanks!
> -- David
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
> 


From ademan555 at gmail.com  Sat Jun 25 21:56:18 2011
From: ademan555 at gmail.com (Dan Roberts)
Date: Sat, 25 Jun 2011 12:56:18 -0700
Subject: [Baypiggies] PyPy 101 Talk Slides from Thursday
Message-ID: <BANLkTimcOt_TGK4vg81ZiedE8F2aPbYk_A@mail.gmail.com>

Hi Baypiggies,
    At least a couple of people wanted to see slides from my
presentation on Thursday. I've hosted them temporarily at
http://codespeak.net/~dan/talk.pdf I'm also happy to answer any
questions that weren't adequately answered during my talk, and of
course over in #pypy on irc.freenode.net there are even more answers.

Cheers everyone,
Dan

From spmcinerney at hotmail.com  Sat Jun 25 22:42:16 2011
From: spmcinerney at hotmail.com (Stephen McInerney)
Date: Sat, 25 Jun 2011 13:42:16 -0700
Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?
Message-ID: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>


What do people use for scraping on a website requiring (login form-based) authentication?
BeautifulSoup: does not handle authentication or cookiesScrapy: does but more heavyweight paradigm to learn, incl. XPath
Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python

Thanks,
Stephen

 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/17db6291/attachment.html>

From tbibha at gmail.com  Fri Jun 24 07:40:02 2011
From: tbibha at gmail.com (Bibha Tripathi)
Date: Fri, 24 Jun 2011 06:40:02 +0100
Subject: [Baypiggies] sorting a table by column
In-Reply-To: <4E037880.9040100@ingres.com>
References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net>
	<BANLkTinFnMd-km8b=Nm6k1peWNofCYA5jg@mail.gmail.com>
	<4E037880.9040100@ingres.com>
Message-ID: <C2F41CD5-DC94-4DDB-932D-81476553360F@gmail.com>

Has anyone used PYNQ? how does it perform compared with dict seart?

Cheers,
BT

~~~~~
"Every sound ends in music: The edge of every surface is tinged with prismatic rays."- RWE

Sent from my iPhone

On 23 Jun 2011, at 06:31 PM, Chris Clark <Chris.Clark at ingres.com> wrote:

> Tripathi, Bibha wrote:
>> 
>> a huge table like an excel sheet, saved and accumulating more data more rows, may be more tables
>> user chooses which column to sort on
>> 
>> what's the best python data structure to use?
> 
> It probably depends on what "huge" means. If the "table" data fits in memory (either physical or virtual) it probably isn't that big and doing operations in Python (e.g. using list comprehension) is probably appropriate. If it doesn't fit in memory you can't easily use list comprehension and need to look into loops and generators, see http://danielrech.net/2011/python-generators-presentation-by-david-beazley/ (I think Alex might have done something similar at PyCon a few years ago too).
> 
> There are some third party libs on PyPi that are worth checking out. I keep wanting to find an excuse to kick the tires on http://pypi.python.org/pypi/blist/ but I've not had cause to do so yet. Search PyPi for "tree" and there are a lot of hits.
> 
> You can even use good old Schwartzian transforms (aka decorate-sort-undecorate) to handle changes in columns if for some reason there isn't a key argument to sort() provided by the structure you choose.
> 
> 
> David Berthelot wrote:
>> Looks like a typical SQL problem.
>>  
> 
> Agreed, without more information this sounds like a classic "ORDERY BY" clause on a SELECT statement. Relational database really excel, lower case "e", rather than upper case "E" :-), at this sort of thing..... If you check my email address domain name, of course I'm going to say that ;-) Databases do a lot of the heavy lifting for you.
> 
> If you're doing some sort of BI analytics a database is probably your best bet (I'm taking a guess you are based on your email address domain name). Shameless promotion, take a gander at http://www.thevirtualcircle.com/2011/02/vectorwise-theres-a-disturbance-in-the-force/ and http://www.ingres.com/products/vectorwise (I don't work on Vectorwise but I'm always blown away at how fast it is). If you are avoiding a traditional DBMS for performance reasons VW may well surprise you..
> 
> Chris
> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies

From peter.borocz at gmail.com  Sat Jun 25 23:38:27 2011
From: peter.borocz at gmail.com (Peter Borocz)
Date: Sat, 25 Jun 2011 14:38:27 -0700
Subject: [Baypiggies] Scraping with authentication: Scrapy vs
	BeautifulSoup?
In-Reply-To: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
References: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
Message-ID: <BANLkTinvNVvVFPNEpVdOcUBQDQEfu0_WfQ@mail.gmail.com>

While usually thought of only for testing, I've happily used
twill<http://twill.idyll.org/commands.html>for the
authentication/cookie/form-handling portion then beautifulsoup for
the parsing. Twill can be configured to use beautifulsoup directly but with
direct access to the underlying page, you can use any parsing library you
like.

PeterB

On Sat, Jun 25, 2011 at 1:42 PM, Stephen McInerney
<spmcinerney at hotmail.com>wrote:

>
> What do people use for scraping on a website requiring (login form-based)
> authentication?
>
>    - BeautifulSoup: does not handle authentication or cookies
>    - Scrapy: does but more heavyweight paradigm to learn, incl. XPath
>
>
> Some discussion:
> http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>
> Thanks,
> Stephen
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>


-- 

peter.borocz at gmail dot com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/7b881732/attachment.html>

From glen at glenjarvis.com  Sun Jun 26 03:48:54 2011
From: glen at glenjarvis.com (Glen Jarvis)
Date: Sat, 25 Jun 2011 18:48:54 -0700
Subject: [Baypiggies] Scraping with authentication: Scrapy vs
	BeautifulSoup?
In-Reply-To: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
References: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
Message-ID: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com>

Stephen,
    Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you.

    You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page.

Cheers,


Glen

On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com> wrote:

> 
> What do people use for scraping on a website requiring (login form-based) authentication?
> BeautifulSoup: does not handle authentication or cookies
> Scrapy: does but more heavyweight paradigm to learn, incl. XPath
> 
> Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
> 
> Thanks,
> Stephen
> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/d298d302/attachment.html>

From aaron at midnightresearch.com  Sun Jun 26 04:14:19 2011
From: aaron at midnightresearch.com (Aaron Peterson)
Date: Sat, 25 Jun 2011 19:14:19 -0700
Subject: [Baypiggies] Scraping with authentication: Scrapy vs
	BeautifulSoup?
In-Reply-To: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
References: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
Message-ID: <BANLkTikPZkjV76hBUfBiWgx1eWXWOS197w@mail.gmail.com>

Hello:

Mechanize is another good module for automating this kind of thing.

HTH,

Aaron
On Jun 25, 2011 1:43 PM, "Stephen McInerney" <spmcinerney at hotmail.com>
wrote:
>
>
> What do people use for scraping on a website requiring (login form-based)
authentication?
> BeautifulSoup: does not handle authentication or cookiesScrapy: does but
more heavyweight paradigm to learn, incl. XPath
> Some discussion:
http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>
> Thanks,
> Stephen
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110625/0244ee0a/attachment-0001.html>

From ademan555 at gmail.com  Mon Jun 27 06:40:43 2011
From: ademan555 at gmail.com (Dan Roberts)
Date: Sun, 26 Jun 2011 21:40:43 -0700
Subject: [Baypiggies] PyPy 101 Talk Slides from Thursday
In-Reply-To: <BANLkTi=D8r93bXwnxJSo8nYQ0r3yjJLx7A@mail.gmail.com>
References: <BANLkTimcOt_TGK4vg81ZiedE8F2aPbYk_A@mail.gmail.com>
	<BANLkTi=D8r93bXwnxJSo8nYQ0r3yjJLx7A@mail.gmail.com>
Message-ID: <BANLkTimUiQodUWKS5fEi4Vr6DHqmESfogw@mail.gmail.com>

Well, like I said that was before my time. My impression was that it
just didn't yield any benefits.
http://doc.pypy.org/en/latest/project-ideas.html?highlight=llvm
suggests that LLVM just wasn't ready to be used with PyPy. More
recently we heard from the Unladen Swallow project that LLVM wasn't
ready for that either. I confirmed this with one of the older PyPy
developers, LLVM was just too buggy at the time, and PyPy has been
burned by it multiple times.

I personally don't know what LLVM would bring to the table. I'm far
from an expert on LLVM, so I may be ignoring important features that
it has, so feel free to chime in if you think there's something I'm
missing. For the translation process (where currently we generate
either C, Jasmin JVM assembler, or CLI bytecode, this is the offline
process analagous to "compile time") I don't think LLVM would be
beneficial, executables produced by GCC still tend to beat clang's
binaries. However, for the JIT it might have worthwhile code
generation features. PyPy implements a fairly effective set of
optimizations on the JIT code it emits. It's possible that layering
LLVM's optimizations would produce better code at runtime. I'm fairly
confident in saying that no core PyPy developer would be interested in
pursuing this again, however, the door is always wide open for people
to try "crazy" things with PyPy. If someone is willing to devote time
to bring LLVM support up to par, and didn't break a bunch of other
things (no reason why it should), it would definitely be accepted. One
could probably implement LLVM as a JIT backend similar to how
different CPU architectures are supported, and that would be "fairly
trivial".

Anyways, the short answer is that it was too immature for several attempts.

On Sat, Jun 25, 2011 at 1:26 PM, Tony Cappellini <tony at tcapp.com> wrote:
> Dan
> When I asked you about pyp using the LLVM, you said they tried before you
> got involved with the project,
> but then just let that branch go to bitrot.
> Do you know why they stopped using the LLVM?
> It seems as though it would save you a lot of work- but if the performance
> wasn't good enough, that would be reason enough.
> I'm quite impressed with the speed that you demonstrated.
>
> On Sat, Jun 25, 2011 at 12:56 PM, Dan Roberts <ademan555 at gmail.com> wrote:
>>
>> Hi Baypiggies,
>> ? ?At least a couple of people wanted to see slides from my
>> presentation on Thursday. I've hosted them temporarily at
>> http://codespeak.net/~dan/talk.pdf I'm also happy to answer any
>> questions that weren't adequately answered during my talk, and of
>> course over in #pypy on irc.freenode.net there are even more answers.
>>
>> Cheers everyone,
>> Dan
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>
>

From kpguy1975 at gmail.com  Mon Jun 27 23:06:58 2011
From: kpguy1975 at gmail.com (Vikram K)
Date: Mon, 27 Jun 2011 17:06:58 -0400
Subject: [Baypiggies] nested list question
Message-ID: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>

Suppose i have the following nested list:

>>> x
[['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2',
'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066',
'MISSENSE']]


How do i obtain from nested list x (given above), the following nested list
z:

>>> z
[['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']]


------
In other words, if the third element of an element of x is the same, then i
wish to combine it into a single element.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110627/740b263f/attachment.html>

From dwight_hubbard at yahoo.com  Tue Jun 28 00:07:40 2011
From: dwight_hubbard at yahoo.com (Dwight Hubbard)
Date: Mon, 27 Jun 2011 15:07:40 -0700 (PDT)
Subject: [Baypiggies] Scraping with authentication: Scrapy vs
	BeautifulSoup?
In-Reply-To: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com>
References: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
	<68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com>
Message-ID: <1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com>

For scraping with authentication I find the twill module is very good.


>________________________________
>From: Glen Jarvis <glen at glenjarvis.com>
>To: Stephen McInerney <spmcinerney at hotmail.com>
>Cc: "<baypiggies at python.org>" <baypiggies at python.org>
>Sent: Saturday, June 25, 2011 6:48 PM
>Subject: Re: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup?
>
>
>Stephen,
>?? ?Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you.
>
>
>?? ?You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page.
>
>Cheers,
>
>
>
>
>Glen
>
>On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com> wrote:
>
>
>
>>What do people use for scraping on a website requiring (login form-based) authentication?
>>
>>	* BeautifulSoup: does not handle authentication or cookies
>>	* Scrapy: does but more heavyweight paradigm to learn, incl. XPath
>>Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>>
>>Thanks,
>>Stephen
>>
>>
>_______________________________________________
>>Baypiggies mailing list
>>Baypiggies at python.org
>>To change your subscription options or unsubscribe:
>>http://mail.python.org/mailman/listinfo/baypiggies
>_______________________________________________
>Baypiggies mailing list
>Baypiggies at python.org
>To change your subscription options or unsubscribe:
>http://mail.python.org/mailman/listinfo/baypiggies
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110627/8d11ba01/attachment.html>

From jason at mischievous.org  Tue Jun 28 00:29:30 2011
From: jason at mischievous.org (Jason Culverhouse)
Date: Mon, 27 Jun 2011 15:29:30 -0700
Subject: [Baypiggies] nested list question
In-Reply-To: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>
References: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>
Message-ID: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org>

On Jun 27, 2011, at 2:06 PM, Vikram K wrote:

> Suppose i have the following nested list:
> 
> >>> x
> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']]
> 
> 
> How do i obtain from nested list x (given above), the following nested list z:
> 
> >>> z
> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']]
> 

How about:

	list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0)))

or the whole nested list with just

	list(unique_everseen(x, operator.itemgetter(2)))

 where :

	unique_everseen is from 
	http://docs.python.org/library/itertools.html

If you data is already sorted by the key then 
	unique_justseen

might be more efficient?

Jason

> ------
> In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110627/566fec8b/attachment.html>

From ryan at larrabure.org  Tue Jun 28 17:25:18 2011
From: ryan at larrabure.org (Ryan Larrabure)
Date: Tue, 28 Jun 2011 08:25:18 -0700
Subject: [Baypiggies] Scraping with authentication: Scrapy vs
	BeautifulSoup?
In-Reply-To: <1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com>
References: <BAY155-w45BD016D99934440F11AC1A0550@phx.gbl>
	<68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com>
	<1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com>
Message-ID: <BANLkTinFVciON4-uP8M-xPob656bLXyk7Q@mail.gmail.com>

If you're scraping HTML, all reasonable roads seem to lead to xpath.
I'd use httplib2 and lxml.  Avoid mechanize.  It's form handling is
very poor (it'll read forms stored inline within javascript tags).

On Mon, Jun 27, 2011 at 3:07 PM, Dwight Hubbard
<dwight_hubbard at yahoo.com> wrote:
> For scraping with authentication I find the twill module is very good.
>
> ________________________________
> From: Glen Jarvis <glen at glenjarvis.com>
> To: Stephen McInerney <spmcinerney at hotmail.com>
> Cc: "<baypiggies at python.org>" <baypiggies at python.org>
> Sent: Saturday, June 25, 2011 6:48 PM
> Subject: Re: [Baypiggies] Scraping with authentication: Scrapy vs
> BeautifulSoup?
>
> Stephen,
> ?? ?Beautiful soup really just parses the HTML. It doesn't (have to)
> retrieve the page for you.
> ?? ?You can use the built-in httplib2, urllib libraries to retrieve the page
> (also with authentication) and then use BeautifulSoup to parse the page.
> Cheers,
>
> Glen
> On Jun 25, 2011, at 1:42 PM, Stephen McInerney <spmcinerney at hotmail.com>
> wrote:
>
>
> What do people use for scraping on a website requiring (login form-based)
> authentication?
>
> BeautifulSoup: does not handle authentication or cookies
> Scrapy: does but more heavyweight paradigm to learn, incl. XPath
>
> Some discussion:
> http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python
>
> Thanks,
> Stephen
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>

From kpguy1975 at gmail.com  Tue Jun 28 17:43:46 2011
From: kpguy1975 at gmail.com (Vikram K)
Date: Tue, 28 Jun 2011 11:43:46 -0400
Subject: [Baypiggies] nested list question
In-Reply-To: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org>
References: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>
	<16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org>
Message-ID: <BANLkTin9NbrYMZM83_bLrWW-Dc6xZre0gQ@mail.gmail.com>

Thanks Jason. Could you (or someone else) suggest some approach for the
following:

>>> x
[['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2',
'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066',
'MISSENSE']]


How do i obtain from nested list x (given above), the following nested list
z:

>>> z
[[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762',
'1','chr14_23354066', 'MISSENSE', 'heterozygous']]

In list x, the first element is loci, second element is allele, third
element is chromosome_positionofchange, fourth is type of change. Based on
the value of the second and third element a new element has to be created
--'homozygous' if both allele 1 and allele 2 have the change and
'heterozygous' if only one allele has the change.


On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse <jason at mischievous.org>wrote:

> On Jun 27, 2011, at 2:06 PM, Vikram K wrote:
>
> Suppose i have the following nested list:
>
> >>> x
> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2',
> 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066',
> 'MISSENSE']]
>
>
> How do i obtain from nested list x (given above), the following nested list
> z:
>
> >>> z
> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']]
>
>
> How about:
>
> list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0)))
>
> or the whole nested list with just
>
> list(unique_everseen(x, operator.itemgetter(2)))
>
>  where :
>
> unique_everseen is from
> http://docs.python.org/library/itertools.html
>
> If you data is already sorted by the key then
> unique_justseen
>
> might be more efficient?
>
> Jason
>
> ------
> In other words, if the third element of an element of x is the same, then i
> wish to combine it into a single element.
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110628/a7983f60/attachment.html>

From brian.curtin at gmail.com  Wed Jun 29 05:11:24 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Tue, 28 Jun 2011 22:11:24 -0500
Subject: [Baypiggies] Python User Group International Survey
Message-ID: <BANLkTi=jh_CSbr2LkCvo8vt9eJ=amyACvg@mail.gmail.com>

The PSF is happy to launch today an international survey of Pythonuser
group organizers to help it better serve the large and ever-expanding
international Python user community.

The survey contains questions on user group organization, events,
demographics, and growth. There are some questions with numerical
answers, and while your best guess is fine, you may find it helpful to
gather some statistics on your user group membership before starting
the survey (example statistics include the number of active members
and the size and topics for recent user group events).

We expect this survey to take around 30 minutes to complete. We
appreciate your time and honesty in answering these questions.

The PSF blog post announcing the survey:
http://pyfound.blogspot.com/2011/06/tell-us-about-your-user-group.html

The survey was written by Jessica McKellar (http://jesstess.com),
organizer for the Boston Python Meetup
(http://meetup.bostonpython.com), and Jesse Noller
(http://jessenoller.com/), PSF board member and PyCon chair with input
and feedback from survey specialists and others.

https://www.surveymonkey.com/s/BWLG8SZ

The survey was pretested with a handful of user group organizers, and
their answers were phenomenal. Organizers have tons to say about these
topics, and we hope to get a lot of great, actionable data for
strengthening the relationship between the PSF and Python user groups
out of this effort.

Outreach, education, diversity and community building are critical for
Python as a community, and the Foundation - this data should greatly
assist in our targeting our resources and furthering the mission of
the Foundation in all ways.

Thank you

The Python Software Foundation
Jessica McKellar
Jesse Noller
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/baypiggies/attachments/20110628/b82308e1/attachment.html>

From jason at mischievous.org  Wed Jun 29 07:06:39 2011
From: jason at mischievous.org (Jason Culverhouse)
Date: Tue, 28 Jun 2011 22:06:39 -0700
Subject: [Baypiggies] nested list question
In-Reply-To: <BANLkTin9NbrYMZM83_bLrWW-Dc6xZre0gQ@mail.gmail.com>
References: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>
	<16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org>
	<BANLkTin9NbrYMZM83_bLrWW-Dc6xZre0gQ@mail.gmail.com>
Message-ID: <83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org>

    
On Jun 28, 2011, at 8:43 AM, Vikram K wrote:

> Thanks Jason. Could you (or someone else) suggest some approach for the following:
> 
> >>> x
> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']]
> 
> 
> How do i obtain from nested list x (given above), the following nested list z:
> 
> >>> z
> [[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762', '1','chr14_23354066', 'MISSENSE', 'heterozygous']]
> 
> In list x, the first element is loci, second element is allele, third element is chromosome_positionofchange, fourth is type of change. Based on the value of the second and third element a new element has to be created --'homozygous' if both allele 1 and allele 2 have the change and 'heterozygous' if only one allele has the change.
> 
> 

Just for kicks... Is this an employment test?

Does anyone have a better way to code the inside of the for loop?
----
from operator import itemgetter
from itertools import groupby

from somewhere import unique_justseen # http://docs.python.org/library/itertools.html

key_func = itemgetter(0,2,3)

output = []
# you need to sort to make group by work properly
for k, v in groupby(sorted(x, key=key_func), key_func):
     #these are sorted to unique_justseen is a good option
     # as long as there are not that many allele
     inner = list(unique_justseen(v))
     output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) and 'homozygous' or 'heterozygous'])

print output

[['18467762', '1', 'chr14_23354066', 'MISSENSE', 'homozygous'], ['19600894', '1/2', 'chr15_76136768', 'MISSENSE', 'homozygous']]

Jason


> 
> On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse <jason at mischievous.org> wrote:
> On Jun 27, 2011, at 2:06 PM, Vikram K wrote:
> 
>> Suppose i have the following nested list:
>> 
>> >>> x
>> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']]
>> 
>> 
>> How do i obtain from nested list x (given above), the following nested list z:
>> 
>> >>> z
>> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']]
>> 
> 
> How about:
> 
> 	list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0)))
> 
> or the whole nested list with just
> 
> 	list(unique_everseen(x, operator.itemgetter(2)))
> 
>  where :
> 
> 	unique_everseen is from 
> 	http://docs.python.org/library/itertools.html
> 
> If you data is already sorted by the key then 
> 	unique_justseen
> 
> might be more efficient?
> 
> Jason
> 
>> ------
>> In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. 
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
> 
> 


From jason at mischievous.org  Wed Jun 29 07:20:40 2011
From: jason at mischievous.org (Jason Culverhouse)
Date: Tue, 28 Jun 2011 22:20:40 -0700
Subject: [Baypiggies] nested list question
In-Reply-To: <83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org>
References: <BANLkTikyqxJ56aQj-fsSSkf0_DMhVrXdcA@mail.gmail.com>
	<16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org>
	<BANLkTin9NbrYMZM83_bLrWW-Dc6xZre0gQ@mail.gmail.com>
	<83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org>
Message-ID: <A8A0CAB6-0D55-46BA-B37B-05EE6E2BBFDF@mischievous.org>


On Jun 28, 2011, at 10:06 PM, Jason Culverhouse wrote:

> 
> On Jun 28, 2011, at 8:43 AM, Vikram K wrote:
> 
>> Thanks Jason. Could you (or someone else) suggest some approach for the following:
>> 
>>>>> x
>> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']]
>> 
>> 
>> How do i obtain from nested list x (given above), the following nested list z:
>> 
>>>>> z
>> [[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762', '1','chr14_23354066', 'MISSENSE', 'heterozygous']]
>> 
>> In list x, the first element is loci, second element is allele, third element is chromosome_positionofchange, fourth is type of change. Based on the value of the second and third element a new element has to be created --'homozygous' if both allele 1 and allele 2 have the change and 'heterozygous' if only one allele has the change.
>> 
>> 
> 
> Just for kicks... Is this an employment test?
> 
> Does anyone have a better way to code the inside of the for loop?
> ----
> from operator import itemgetter
> from itertools import groupby
> 
> from somewhere import unique_justseen # http://docs.python.org/library/itertools.html
> 
> key_func = itemgetter(0,2,3)
> 
> output = []
> # you need to sort to make group by work properly
> for k, v in groupby(sorted(x, key=key_func), key_func):
>     #these are sorted to unique_justseen is a good option
>     # as long as there are not that many allele
>     inner = list(unique_justseen(v))
>     output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) and 'homozygous' or 'heterozygous'])

A "minor fix"  to paste the correct  'homozygous' or 'heterozygous' computation below....

    output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) > 1 and 'homozygous' or 'heterozygous'])

> print output
> 
> [['18467762', '1', 'chr14_23354066', 'MISSENSE', 'homozygous'], ['19600894', '1/2', 'chr15_76136768', 'MISSENSE', 'homozygous']]
> 
> Jason
> 
> 
> 
>> 
>> On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse <jason at mischievous.org> wrote:
>> On Jun 27, 2011, at 2:06 PM, Vikram K wrote:
>> 
>>> Suppose i have the following nested list:
>>> 
>>>>>> x
>>> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']]
>>> 
>>> 
>>> How do i obtain from nested list x (given above), the following nested list z:
>>> 
>>>>>> z
>>> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']]
>>> 
>> 
>> How about:
>> 
>> 	list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0)))
>> 
>> or the whole nested list with just
>> 
>> 	list(unique_everseen(x, operator.itemgetter(2)))
>> 
>> where :
>> 
>> 	unique_everseen is from 
>> 	http://docs.python.org/library/itertools.html
>> 
>> If you data is already sorted by the key then 
>> 	unique_justseen
>> 
>> might be more efficient?
>> 
>> Jason
>> 
>>> ------
>>> In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. 
>>> _______________________________________________
>>> Baypiggies mailing list
>>> Baypiggies at python.org
>>> To change your subscription options or unsubscribe:
>>> http://mail.python.org/mailman/listinfo/baypiggies
>> 
>> 
> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies