From simeonf at gmail.com Wed Jun 1 00:55:58 2011 From: simeonf at gmail.com (Simeon Franklin) Date: Tue, 31 May 2011 15:55:58 -0700 Subject: [Baypiggies] Video from May 26th Bay Piggies event online In-Reply-To: <4DE55FB0.4090104@marakana.com> References: <4DE55FB0.4090104@marakana.com> Message-ID: Awesome job - thanks Max! I forget who is in charge of content on baypiggies.net (thank you, btw) but can we get the videos linked from the baypiggies talks archive? -regards Simeon Franklin On Tue, May 31, 2011 at 2:37 PM, Max Walker - Marakana wrote: > Hi all, > > Just letting you know that the video from Jeff Fischer's newbie nugget talk > on Implementing Mix-ins in Python is now online: http://mrkn.co/f/345 > > Coming soon is the video for Alan DuBoff's presentation on Writing Titanium > Desktop Applications with Python. I'll shoot another email to the list later > this week when it's up. > > Cheers!! > > - Max > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From bdbaddog at gmail.com Wed Jun 1 08:35:23 2011 From: bdbaddog at gmail.com (William Deegan) Date: Tue, 31 May 2011 23:35:23 -0700 Subject: [Baypiggies] Video from May 26th Bay Piggies event online In-Reply-To: References: <4DE55FB0.4090104@marakana.com> Message-ID: <64DEA821-75AC-4E92-A4FD-7A7EA296E363@gmail.com> Simeon, I'll try and do that in the next few days. -Bill On May 31, 2011, at 3:55 PM, Simeon Franklin wrote: > Awesome job - thanks Max! I forget who is in charge of content on > baypiggies.net (thank you, btw) but can we get the videos linked from > the baypiggies talks archive? > > -regards > Simeon Franklin > > On Tue, May 31, 2011 at 2:37 PM, Max Walker - Marakana > wrote: >> Hi all, >> >> Just letting you know that the video from Jeff Fischer's newbie nugget talk >> on Implementing Mix-ins in Python is now online: http://mrkn.co/f/345 >> >> Coming soon is the video for Alan DuBoff's presentation on Writing Titanium >> Desktop Applications with Python. I'll shoot another email to the list later >> this week when it's up. >> >> Cheers!! >> >> - Max >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies >> > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies From max.walker at marakana.com Fri Jun 3 02:13:45 2011 From: max.walker at marakana.com (Max Walker - Marakana) Date: Thu, 02 Jun 2011 17:13:45 -0700 Subject: [Baypiggies] video for Alan DuBoff's Preso on Writing Titanium Desktop Apps in Python Message-ID: <4DE82739.40107@marakana.com> Hi guys, Just letting you know that the video for Alan DuBoff's presentation on Writing Titanium Desktop apps in Python from the May 26th Bay Piggies Meetup is now online: http://mrkn.co/f/347 - so check it out! Cheers! - Max From jjinux at gmail.com Fri Jun 3 19:11:42 2011 From: jjinux at gmail.com (Shannon -jj Behrens) Date: Fri, 3 Jun 2011 10:11:42 -0700 Subject: [Baypiggies] getting a fair flip out of an unfair coin Message-ID: A couple months ago at BayPiggies, someone asked for an algorithm to get a fair flip out of an unfair coin. My buddy Hy Carrinski and I came up with the following algorithm: http://jjinux.blogspot.com/2011/06/python-getting-fair-flip-out-of-unfair.html Happy Hacking! -jj -- In this life we cannot do great things. We can only do small things with great love. -- Mother Teresa From japerk at gmail.com Mon Jun 6 16:35:46 2011 From: japerk at gmail.com (Jacob Perkins) Date: Mon, 6 Jun 2011 07:35:46 -0700 Subject: [Baypiggies] job at weotta Message-ID: Hi, http://www.weotta.com, which just launched at TechCrunch Disrupt NY, is looking for some experienced Python developers to be our first key engineering hires. You'll work closely with me and the rest of the founding team (http://www.weotta.com/about/) to make Weotta even more more awesome :) We're currently based in Los Gatos, but will be moving to San Francisco once our funding round closes. There's two main areas of focus, and we're looking for strong technical devs that can quickly dive in to at least one of these: * frontend web app development with Django & jQuery * backend NLP with NLTK We also use the following tech, so it's ideal if you're familiar with some of these already: * Mercurial * pip & virtualenv * Fabric * South & MySQL * Nginx * EC2 deployment * MongoDB * Redis * Facebook API If you'd like to learn more about Weotta, check out our press page http://www.weotta.com/press/. And to get in and try it, you can signup using your facebook account at http://www.weotta.com/s/4QEuIgWS/. If you like the product and want to make it better and expand our coverage, please reply with some links about you and work you've done, ideally open source projects you've created or contributed to. You can also contact me at https://github.com/japerk/ and http://www.linkedin.com/in/jacobperkins. Jacob --- http://www.weotta.com/ http://streamhacker.com/ http://twitter.com/japerk -------------- next part -------------- An HTML attachment was scrubbed... URL: From c1 at caseyc.net Tue Jun 7 06:32:21 2011 From: c1 at caseyc.net (Casey Callendrello) Date: Mon, 06 Jun 2011 21:32:21 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? Message-ID: <4DEDA9D5.1090906@caseyc.net> Hi there, I've got a simple problem that I've already solved effectively, but I can't help thinking that there must be a more "pythonic" way to do it. Especially because my solution uses a list index, which I *know* can't possibly be the Python way ;-). In any case, I have two lists: one of machines, and one of jobs. Either one can be of arbitrary length, including zero. I want to generate (machine, job) pairs where every machine gets at most one job, each job is only executed once, and as much work as possible is done. The actual index or order is irrelevant. The simple, C-inspired solution is: i = 0 while i References: <4DEDA9D5.1090906@caseyc.net> Message-ID: <4DEDAA2F.6010105@caseyc.net> I should add, I'm actually more interested in a list of (job, machine) tuples, since that's added to a queue and sent to a threadpool. --Casey On 6/6/11 9:32 PM, Casey Callendrello wrote: > Hi there, > I've got a simple problem that I've already solved effectively, but I > can't help thinking that there must be a more "pythonic" way to do it. > Especially because my solution uses a list index, which I *know* can't > possibly be the Python way ;-). > > In any case, I have two lists: one of machines, and one of jobs. > Either one can be of arbitrary length, including zero. I want to > generate (machine, job) pairs where every machine gets at most one > job, each job is only executed once, and as much work as possible is > done. The actual index or order is irrelevant. > > The simple, C-inspired solution is: > > i = 0 > while i do_job(jobs[i], machines[i]) > i += 1 > > There has to be a cleaner way than that! Any suggestions? > > --Casey > > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies From me at rpatterson.net Tue Jun 7 06:37:04 2011 From: me at rpatterson.net (Ross Patterson) Date: Mon, 6 Jun 2011 21:37:04 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: <4DEDA9D5.1090906@caseyc.net> References: <4DEDA9D5.1090906@caseyc.net> Message-ID: I suspect you could use itertools.izip_longest: http://docs.python.org/library/itertools.html#itertools.izip_longest Ross On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello wrote: > Hi there, > I've got a simple problem that I've already solved effectively, but I can't > help thinking that there must be a more "pythonic" way to do it. Especially > because my solution uses a list index, which I *know* can't possibly be the > Python way ;-). > > In any case, I have two lists: one of machines, and one of jobs. Either one > can be of arbitrary length, including zero. I want to generate (machine, > job) pairs where every machine gets at most one job, each job is only > executed once, and as much work as possible is done. The actual index or > order is irrelevant. > > The simple, C-inspired solution is: > > i = 0 > while i do_job(jobs[i], machines[i]) > i += 1 > > There has to be a cleaner way than that! Any suggestions? > > --Casey > > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcarrinski at gmail.com Tue Jun 7 06:44:19 2011 From: hcarrinski at gmail.com (Hy Carrinski) Date: Mon, 6 Jun 2011 21:44:19 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: References: <4DEDA9D5.1090906@caseyc.net> Message-ID: The following will work. Does it fully solve your problem? from itertools import izip for (job, machine) in izip(jobs, machines): do_job(job, machine) On Mon, Jun 6, 2011 at 9:37 PM, Ross Patterson wrote: > I suspect you could use itertools.izip_longest: > http://docs.python.org/library/itertools.html#itertools.izip_longest > Ross > On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello wrote: >> >> Hi there, >> I've got a simple problem that I've already solved effectively, but I >> can't help thinking that there must be a more "pythonic" way to do it. >> Especially because my solution uses a list index, which I *know* can't >> possibly be the Python way ;-). >> >> In any case, I have two lists: one of machines, and one of jobs. Either >> one can be of arbitrary length, including zero. I want to generate (machine, >> job) pairs where every machine gets at most one job, each job is only >> executed once, and as much work as possible is done. The actual index or >> order is irrelevant. >> >> The simple, C-inspired solution is: >> >> i = 0 >> while i> ? ?do_job(jobs[i], machines[i]) >> ? ?i += 1 >> >> There has to be a cleaner way than that! Any suggestions? >> >> --Casey >> >> >> >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From max at theslimmers.net Tue Jun 7 06:55:27 2011 From: max at theslimmers.net (Max Slimmer) Date: Mon, 6 Jun 2011 21:55:27 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: <4DEDA9D5.1090906@caseyc.net> References: <4DEDA9D5.1090906@caseyc.net> Message-ID: A more realistic and in some ways interesting problem is to deal with potentially more jobs than machines. I would think that you want all the jobs to get done therefore any one machine might need to do more than one job, Then for fun some machines are might be more efficient, in either time or cost. :-) max On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello wrote: > Hi there, > I've got a simple problem that I've already solved effectively, but I can't > help thinking that there must be a more "pythonic" way to do it. Especially > because my solution uses a list index, which I *know* can't possibly be the > Python way ;-). > > In any case, I have two lists: one of machines, and one of jobs. Either one > can be of arbitrary length, including zero. I want to generate (machine, > job) pairs where every machine gets at most one job, each job is only > executed once, and as much work as possible is done. The actual index or > order is irrelevant. > > The simple, C-inspired solution is: > > i = 0 > while i ? ?do_job(jobs[i], machines[i]) > ? ?i += 1 > > There has to be a cleaner way than that! Any suggestions? > > --Casey > > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From jeremy.r.fishman at gmail.com Tue Jun 7 07:27:14 2011 From: jeremy.r.fishman at gmail.com (Jeremy Fishman) Date: Mon, 6 Jun 2011 22:27:14 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: References: <4DEDA9D5.1090906@caseyc.net> Message-ID: More information: izip() is the iterative version of the core Python builtin zip(), which returns a list. http://docs.python.org/library/functions.html#zip In Python 3+ zip() returns an iterable object (it's a type) http://docs.python.org/release/3.2/library/functions.html#zip Cheers, Jeremy On Mon, Jun 6, 2011 at 9:44 PM, Hy Carrinski wrote: > The following will work. > Does it fully solve your problem? > > from itertools import izip > > for (job, machine) in izip(jobs, machines): > do_job(job, machine) > > On Mon, Jun 6, 2011 at 9:37 PM, Ross Patterson wrote: > > I suspect you could use itertools.izip_longest: > > http://docs.python.org/library/itertools.html#itertools.izip_longest > > Ross > > On Mon, Jun 6, 2011 at 9:32 PM, Casey Callendrello > wrote: > >> > >> Hi there, > >> I've got a simple problem that I've already solved effectively, but I > >> can't help thinking that there must be a more "pythonic" way to do it. > >> Especially because my solution uses a list index, which I *know* can't > >> possibly be the Python way ;-). > >> > >> In any case, I have two lists: one of machines, and one of jobs. Either > >> one can be of arbitrary length, including zero. I want to generate > (machine, > >> job) pairs where every machine gets at most one job, each job is only > >> executed once, and as much work as possible is done. The actual index or > >> order is irrelevant. > >> > >> The simple, C-inspired solution is: > >> > >> i = 0 > >> while i >> do_job(jobs[i], machines[i]) > >> i += 1 > >> > >> There has to be a cleaner way than that! Any suggestions? > >> > >> --Casey > >> > >> > >> > >> _______________________________________________ > >> Baypiggies mailing list > >> Baypiggies at python.org > >> To change your subscription options or unsubscribe: > >> http://mail.python.org/mailman/listinfo/baypiggies > > > > > > _______________________________________________ > > Baypiggies mailing list > > Baypiggies at python.org > > To change your subscription options or unsubscribe: > > http://mail.python.org/mailman/listinfo/baypiggies > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cbc at unc.edu Tue Jun 7 07:28:51 2011 From: cbc at unc.edu (Chris Calloway) Date: Tue, 07 Jun 2011 01:28:51 -0400 Subject: [Baypiggies] Seattle PyCamp 2011 Message-ID: <4DEDB713.8000201@unc.edu> University of Washington Marketing and the Seattle Plone Gathering host the inaugural Seattle PyCamp 2011 at The Paul G. Allen Center for Computer Science & Engineering on Monday, August 29 through Friday, September 2, 2011. Register today at http://trizpug.org/boot-camp/seapy11/ For beginners, this ultra-low-cost Python Boot Camp makes you productive so you can get your work done quickly. PyCamp emphasizes the features which make Python a simpler and more efficient language. Following along with example Python PushUps? speeds your learning process. Become a self-sufficient Python developer in just five days at PyCamp! PyCamp is conducted on the campus of the University of Washington in a state of the art high technology classroom. -- Sincerely, Chris Calloway http://nccoos.org/Members/cbc office: 3313 Venable Hall phone: (919) 599-3530 mail: Campus Box #3300, UNC-CH, Chapel Hill, NC 27599 From simeonf at gmail.com Tue Jun 7 07:43:41 2011 From: simeonf at gmail.com (Simeon Franklin) Date: Mon, 6 Jun 2011 22:43:41 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: References: <4DEDA9D5.1090906@caseyc.net> Message-ID: I taught a Python Fundamentals class last week for Marakana and noticed that other programmers coming from languages that are not specifically functionally oriented were unfamiliar with zip as a concept. Most explanations of zip tend to focus on the two case (given two lists it returns paired elements) and the more general Python documentation explanation was met with thoughtful incomprehension: >This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. When I paraphrased this as "zip will take arguments that represent rows of input data and return a list whose elements are the columns of the input data" mental lightbulbs went on all over the room. YMMV but I thought it made for an intuitive explanation... It also leads me to think more naturally of possible applications of zip and iterative friends. -regards Simeon Franklin From kwgoodman at gmail.com Tue Jun 7 18:31:07 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 7 Jun 2011 09:31:07 -0700 Subject: [Baypiggies] [job] Python Job at Hedge Fund Message-ID: We are looking for help to predict tomorrow's stock returns. The challenge is model selection in the presence of noisy data. The tools are ubuntu, python, cython, c, numpy, scipy, la, bottleneck, git. A quantitative background and experience or interest in model selection, machine learning, and software development are a plus. This is a full time position in Berkeley, California, two blocks from UC Berkeley. If you are interested send a CV or similar (or questions) to '.'.join(['htiek','scitylanayelekreb at namdoog','moc'][::-1])[::-1] From mvoorhie at yahoo.com Tue Jun 7 18:34:41 2011 From: mvoorhie at yahoo.com (Mark Voorhies) Date: Tue, 7 Jun 2011 09:34:41 -0700 Subject: [Baypiggies] Pythonic way to iterate over two lists? In-Reply-To: References: <4DEDA9D5.1090906@caseyc.net> Message-ID: <201106070934.41605.mvoorhie@yahoo.com> On Monday, June 06, 2011 10:43:41 pm Simeon Franklin wrote: > I taught a Python Fundamentals class last week for Marakana and > noticed that other programmers coming from languages that are not > specifically functionally oriented were unfamiliar with zip as a > concept. Most explanations of zip tend to focus on the two case (given > two lists it returns paired elements) and the more general Python > documentation explanation was met with thoughtful incomprehension: > > >This function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. > > When I paraphrased this as "zip will take arguments that represent > rows of input data and return a list whose elements are the columns of > the input data" mental lightbulbs went on all over the room. YMMV but > I thought it made for an intuitive explanation... It also leads me to > think more naturally of possible applications of zip and iterative > friends. Yes! transpose_A = zip(*A) # if A is, e.g., a rectangular matrix as a list of lists Thanks for the very useful point of view =) --Mark > > -regards > Simeon Franklin > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From hcarrinski at gmail.com Tue Jun 7 21:40:09 2011 From: hcarrinski at gmail.com (Hy Carrinski) Date: Tue, 7 Jun 2011 12:40:09 -0700 Subject: [Baypiggies] Question about breaking out of a loop Message-ID: I am working on code to solve a combinatorial probability problem, and plan to send a link to the full code in a few days. There is generator that yields tuples in a defined order into a loop that performs a calculation. I would like to provide an option to stop the calculation when the threshold is reached. I have put a simplified sample of this on github: https://gist.github.com/1012945 My questions are: 1. Is it an antipattern to change a datatype to cause an exception? 2. If so, how would you improve on my version 3 function? The function in version 3 is pretty close to my current solution, but the functions combinations(), f() and g() are standing in for more computationally intensive functions. The potential antipattern involves temporarily setting a value to None in a dictionary of integers. Thank you, Hy From jeremy.r.fishman at gmail.com Tue Jun 7 22:32:44 2011 From: jeremy.r.fishman at gmail.com (Jeremy Fishman) Date: Tue, 7 Jun 2011 13:32:44 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: References: Message-ID: Your second and third definitions aren't really different from each other. Both incur a "per-iteration penalty", the first with an if-statement and the second with a dictionary lookup. I bet you are not going to get a noticeable speedup over a simple if-statement check, but an alternative approach is to solve the problem you are checking for up-front: >>> # warning: not a proof ... >>> def count(seq): ... return sum(1 for e in seq) ... >>> def f(n, w, t): ... return (c for c in combinations(range(n), w) if c[0] < t) ... >>> def g(n, w, t): ... for i in range(t): ... for c in combinations(range(i + 1, n), w - 1): ... yield (i,) + c ... >>> [count(f(10, 5, i)) for i in range(5)] [0, 126, 196, 231, 246] >>> [count(g(10, 5, i)) for i in range(5)] [0, 126, 196, 231, 246] >>> list(f(5, 3, 2)) [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)] >>> list(g(5, 3, 2)) [(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 2, 3), (0, 2, 4), (0, 3, 4), (1, 2, 3), (1, 2, 4), (1, 3, 4)] - Jeremy On Tue, Jun 7, 2011 at 12:40 PM, Hy Carrinski wrote: > I am working on code to solve a combinatorial probability problem, and > plan to send a link to the full code in a few days. > > There is generator that yields tuples in a defined order into a loop > that performs a calculation. I would like to provide an option to stop > the calculation when the threshold is reached. > > I have put a simplified sample of this on github: > https://gist.github.com/1012945 > > My questions are: > 1. Is it an antipattern to change a datatype to cause an exception? > 2. If so, how would you improve on my version 3 function? > > The function in version 3 is pretty close to my current solution, but > the functions combinations(), f() and g() are standing in for more > computationally intensive functions. The potential antipattern > involves temporarily setting a value to None in a dictionary of > integers. > > Thank you, > Hy > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From krid at otisbean.com Tue Jun 7 22:06:04 2011 From: krid at otisbean.com (Dirk Bergstrom) Date: Tue, 07 Jun 2011 13:06:04 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: References: Message-ID: <4DEE84AC.7040401@otisbean.com> On 06/07/2011 12:40 PM, Hy Carrinski wrote: > There is generator that yields tuples in a defined order into a loop > that performs a calculation. I would like to provide an option to stop > the calculation when the threshold is reached. > The function in version 3 is pretty close to my current solution, but > the functions combinations(), f() and g() are standing in for more > computationally intensive functions. This seems like a perfect example of premature optimization. You've got a loop with two computationally intensive operations per cycle, and you're worried about optimizing away a single if-equals check per cycle. Will that single if statement really make so much difference once you put the real (and presumably much more time consuming) functions in place? -- -------------------------------------- Dirk Bergstrom krid at otisbean.com http://otisbean.com/ From hcarrinski at gmail.com Tue Jun 7 22:42:53 2011 From: hcarrinski at gmail.com (Hy Carrinski) Date: Tue, 7 Jun 2011 13:42:53 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: <4DEE84AC.7040401@otisbean.com> References: <4DEE84AC.7040401@otisbean.com> Message-ID: I agree that this optimization is not critical. Premature optimization is certainly something that I try to avoid, and is an easy path to follow. My actual code has gone through a few rounds of refactoring and some optimization. This is question involves a specific area which my profiling has shown can introduce around a 10% decrease in runtime based only on eliminating this conditional. These rounds have involved effort to maintain and increase clarity. The computationally intensive functions actually make use of caching so they do not consume much time. I did not include many of these details in the original posting. By introducing the if statement, the runtime of the profiled actual code increases by around 20%. While that increase is not too important, the structure with the if statement does not seem right to me because it introduces the overhead whether or not the threshold option is exercised. Is there a more Pythonic way to introduce the break without the commensurate increase in overhead? By the way, I do think that this loop is the most appropriate place to introduce a threshold. w.r.t. Jeremy's recent interesting suggestions. I think that filtering on the generator may not actually result in StopIteration without computing the values. My actual generator makes a sequence of only the values that I care about. After writing it, I found a posting by Tim Peters from a few years ago at http://code.activestate.com/recipes/218332/ which uses an algorithm similar to mine (for the generator). But, please note that link is pretty far astray from the present questions. Thank you, Hy On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom wrote: > On 06/07/2011 12:40 PM, Hy Carrinski wrote: >> >> There is generator that yields tuples in a defined order into a loop >> that performs a calculation. I would like to provide an option to stop >> the calculation when the threshold is reached. >> The function in version 3 is pretty close to my current solution, but >> the functions combinations(), f() and g() are standing in for more >> computationally intensive functions. > > This seems like a perfect example of premature optimization. ?You've got a > loop with two computationally intensive operations per cycle, and you're > worried about optimizing away a single if-equals check per cycle. ?Will that > single if statement really make so much difference once you put the real > (and presumably much more time consuming) functions in place? > > -- > ? ? ? -------------------------------------- > ? ? ?Dirk Bergstrom ? ? ? ? ? krid at otisbean.com > ? ? ? ? ? ? http://otisbean.com/ > From jeremy.r.fishman at gmail.com Tue Jun 7 22:57:09 2011 From: jeremy.r.fishman at gmail.com (Jeremy Fishman) Date: Tue, 7 Jun 2011 13:57:09 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: References: <4DEE84AC.7040401@otisbean.com> Message-ID: Yes thank you Hy for pointing out f() function will not break early. I am not sure why I changed the implementation as I had intended f() to be a copy of your for-loop from loop_fcn_v2() to demonstrate equivalency. I believe g() generates exactly the values expected and no more. - Jeremy On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski wrote: > I agree that this optimization is not critical. Premature optimization > is certainly something that I try to avoid, and is an easy path to > follow. > > My actual code has gone through a few rounds of refactoring and some > optimization. This is question involves a specific area which my > profiling has shown can introduce around a 10% decrease in runtime > based only on eliminating this conditional. These rounds have involved > effort to maintain and increase clarity. The computationally intensive > functions actually make use of caching so they do not consume much > time. > > I did not include many of these details in the original posting. > > By introducing the if statement, the runtime of the profiled actual > code increases by around 20%. While that increase is not too > important, the structure with the if statement does not seem right to > me because it introduces the overhead whether or not the threshold > option is exercised. Is there a more Pythonic way to introduce the > break without the commensurate increase in overhead? By the way, I do > think that this loop is the most appropriate place to introduce a > threshold. > > w.r.t. Jeremy's recent interesting suggestions. I think that filtering > on the generator may not actually result in StopIteration without > computing the values. My actual generator makes a sequence of only the > values that I care about. After writing it, I found a posting by Tim > Peters from a few years ago at > http://code.activestate.com/recipes/218332/ which uses an algorithm > similar to mine (for the generator). But, please note that link is > pretty far astray from the present questions. > > Thank you, > Hy > > > On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom wrote: > > On 06/07/2011 12:40 PM, Hy Carrinski wrote: > >> > >> There is generator that yields tuples in a defined order into a loop > >> that performs a calculation. I would like to provide an option to stop > >> the calculation when the threshold is reached. > >> The function in version 3 is pretty close to my current solution, but > >> the functions combinations(), f() and g() are standing in for more > >> computationally intensive functions. > > > > This seems like a perfect example of premature optimization. You've got > a > > loop with two computationally intensive operations per cycle, and you're > > worried about optimizing away a single if-equals check per cycle. Will > that > > single if statement really make so much difference once you put the real > > (and presumably much more time consuming) functions in place? > > > > -- > > -------------------------------------- > > Dirk Bergstrom krid at otisbean.com > > http://otisbean.com/ > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hcarrinski at gmail.com Wed Jun 8 02:03:26 2011 From: hcarrinski at gmail.com (Hy Carrinski) Date: Tue, 7 Jun 2011 17:03:26 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: References: <4DEE84AC.7040401@otisbean.com> Message-ID: Jeremy's function g() does produce outputs equivalent to the sample code without using a conditional. Thank you also for including examples. As a generator, it actually could serve as a three parameter wrapper for itertools.combinations(). I hope that this present email can better specify the question. I very much appreciate the good thoughts thus far. 1. The only information we have about the generator is that its outputs have a defined order. I find this abstraction useful because it may help any answers to this question to be more generally applicable, and also contains the complexity of the code for the generator itself by not considering multiple starting or ending points. > There is generator that yields tuples in a defined order into a loop > that performs a calculation. 2. I wrote the gist in order to answer the primary question: Is it an antipattern to change a datatype to cause an exception? 3. I also began to think that itertools.takewhile() might be a better option, but does not seem to be in this case. First, it introduces a need for something larger than an integer to compare, for the case where the threshold is None. Second, while it looked nice, it also caused a significant slowdown (~50%) Thanks, Hy On Tue, Jun 7, 2011 at 1:57 PM, Jeremy Fishman wrote: > Yes thank you Hy for pointing out f() function will not break early. ?I am > not sure why I changed the implementation as I had intended f() to be a copy > of your for-loop from?loop_fcn_v2() to demonstrate equivalency. > I believe g() generates exactly the values expected and no more. > ? - Jeremy > > > On Tue, Jun 7, 2011 at 1:42 PM, Hy Carrinski wrote: >> >> I agree that this optimization is not critical. Premature optimization >> is certainly something that I try to avoid, and is an easy path to >> follow. >> >> My actual code has gone through a few rounds of refactoring and some >> optimization. This is question involves a specific area which my >> profiling has shown can introduce around a 10% decrease in runtime >> based only on eliminating this conditional. These rounds have involved >> effort to maintain and increase clarity. The computationally intensive >> functions actually make use of caching so they do not consume much >> time. >> >> I did not include many of these details in the original posting. >> >> By introducing the if statement, the runtime of the profiled actual >> code increases by around 20%. While that increase is not too >> important, the structure with the if statement does not seem right to >> me because it introduces the overhead whether or not the threshold >> option is exercised. Is there a more Pythonic way to introduce the >> break without the commensurate increase in overhead? By the way, I do >> think that this loop is the most appropriate place to introduce a >> threshold. >> >> w.r.t. Jeremy's recent interesting suggestions. I think that filtering >> on the generator may not actually result in StopIteration without >> computing the values. My actual generator makes a sequence of only the >> values that I care about. After writing it, I found a posting by Tim >> Peters from a few years ago at >> http://code.activestate.com/recipes/218332/ which uses an algorithm >> similar to mine (for the generator). But, please note that link is >> pretty far astray from the present questions. >> >> Thank you, >> Hy >> >> >> On Tue, Jun 7, 2011 at 1:06 PM, Dirk Bergstrom wrote: >> > On 06/07/2011 12:40 PM, Hy Carrinski wrote: >> >> >> >> There is generator that yields tuples in a defined order into a loop >> >> that performs a calculation. I would like to provide an option to stop >> >> the calculation when the threshold is reached. >> >> The function in version 3 is pretty close to my current solution, but >> >> the functions combinations(), f() and g() are standing in for more >> >> computationally intensive functions. >> > >> > This seems like a perfect example of premature optimization. ?You've got >> > a >> > loop with two computationally intensive operations per cycle, and you're >> > worried about optimizing away a single if-equals check per cycle. ?Will >> > that >> > single if statement really make so much difference once you put the real >> > (and presumably much more time consuming) functions in place? >> > >> > -- >> > ? ? ? -------------------------------------- >> > ? ? ?Dirk Bergstrom ? ? ? ? ? krid at otisbean.com >> > ? ? ? ? ? ? http://otisbean.com/ >> > > > From mvoorhie at yahoo.com Wed Jun 8 02:22:14 2011 From: mvoorhie at yahoo.com (Mark Voorhies) Date: Tue, 7 Jun 2011 17:22:14 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: References: Message-ID: <201106071722.14580.mvoorhie@yahoo.com> On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote: > 2. I wrote the gist in order to answer the primary question: > > Is it an antipattern to change a datatype to cause an exception? A different way to phrase this might be: What are reasonable sentinel patterns? In Python, None is a reasonable sentinel value in a container of references, in the same way that a null pointer is a reasonable sentinel value in C/C++. It is also reasonable to use an exception to handle an "exceptional" case of control flow (encountering the sentinel value), and you've shown that this doesn't introduce overhead in Python. So, I don't think there's anything inherently objectionable about your implementation (comments about premature optimization notwithstanding). It might be useful to think of what you're doing as the special case: "marking a reference as null" rather than the more general and potentially hackier: "changing a datatype". Mark From shally at indosys.com Thu Jun 9 03:02:52 2011 From: shally at indosys.com (Shally Singh) Date: Wed, 8 Jun 2011 18:02:52 -0700 Subject: [Baypiggies] Please add to mailing list Message-ID: <00c501cc2640$f73ec430$e5bc4c90$@com> Thanks & Regards, Shally Singh Sr. Recruiter Indosys Corporation 408-627-8008 shally at indosys.com www.indosys.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: pythonjob.txt URL: From hcarrinski at gmail.com Thu Jun 9 07:02:56 2011 From: hcarrinski at gmail.com (Hy Carrinski) Date: Wed, 8 Jun 2011 22:02:56 -0700 Subject: [Baypiggies] Question about breaking out of a loop In-Reply-To: <201106071722.14580.mvoorhie@yahoo.com> References: <201106071722.14580.mvoorhie@yahoo.com> Message-ID: Thank you for the advice. I have updated the gist to include each of the suggestions and to serve as a set of examples rather than a question. Finally, I found that this may be a good application for itertools.groupby(). https://gist.github.com/1012945 Thanks, Hy On Tue, Jun 7, 2011 at 5:22 PM, Mark Voorhies wrote: > On Tuesday, June 07, 2011 05:03:26 pm Hy Carrinski wrote: >> 2. I wrote the gist in order to answer the primary question: >> >> ? ? Is it an antipattern to change a datatype to cause an exception? > > A different way to phrase this might be: > ? What are reasonable sentinel patterns? > > In Python, None is a reasonable sentinel value in a container of references, > in the same way that a null pointer is a reasonable sentinel value in C/C++. > > It is also reasonable to use an exception to handle an "exceptional" case of > control flow (encountering the sentinel value), and you've shown that this > doesn't introduce overhead in Python. > > So, I don't think there's anything inherently objectionable about your implementation > (comments about premature optimization notwithstanding). ?It might be useful to > think of what you're doing as the special case: "marking a reference as null" > rather than the more general and potentially hackier: "changing a datatype". > > Mark > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From annaraven at gmail.com Thu Jun 9 07:57:33 2011 From: annaraven at gmail.com (Anna Ravenscroft) Date: Wed, 8 Jun 2011 22:57:33 -0700 Subject: [Baypiggies] What hourly rate are you charging a startup? Message-ID: Hi folks: I'm going to be chatting with a startup next week. I'd love to hear what hourly rate folks are charging startups these days. (I can also ask for options, so I'd like some ballpark range on the cash portion of the comp. ) I'd love to hear from new programmers, as well as experienced consultants, to get a good range. Please contact me offlist and I promise to keep your answers confidential. -- cordially, Anna From jason at mischievous.org Thu Jun 9 09:16:58 2011 From: jason at mischievous.org (Jason Culverhouse) Date: Thu, 9 Jun 2011 00:16:58 -0700 Subject: [Baypiggies] What hourly rate are you charging a startup? In-Reply-To: References: Message-ID: On Jun 8, 2011, at 10:57 PM, Anna Ravenscroft wrote: > Hi folks: > > I'm going to be chatting with a startup next week. I'd love to hear > what hourly rate folks are charging startups these days. (I can also > ask for options, so I'd like some ballpark range on the cash portion > of the comp. ) It's hard to guess cash at a startup, it depends on their funding. If you work for stock, I would start at a rate where working 40 hours a week for a year allowed me to accrue 1% of the company. > I'd love to hear from new programmers, as well as experienced > consultants, to get a good range. > > Please contact me offlist and I promise to keep your answers confidential. Jason From dineshbvadhia at hotmail.com Thu Jun 9 12:14:55 2011 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Thu, 9 Jun 2011 03:14:55 -0700 Subject: [Baypiggies] What hourly rate are you charging a startup? Message-ID: a. Do developers work for stock only these days (post dot.com bubble) in the Bay Area? b. Doesn't a minimum wage have to be paid even when working for stock only in California? -------------- next part -------------- An HTML attachment was scrubbed... URL: From camembert at gmail.com Thu Jun 9 17:34:40 2011 From: camembert at gmail.com (Elizabeth Leddy) Date: Thu, 09 Jun 2011 08:34:40 -0700 Subject: [Baypiggies] What hourly rate are you charging a startup? In-Reply-To: References: Message-ID: <4DF0E810.8000202@gmail.com> On 6/9/11 3:14 AM, Dinesh B Vadhia wrote: > a. Do developers work for stock only these days (post dot.com bubble) > in the Bay Area? Base salary plus stock is the new "poor startup". And from what I can tell the base salary still has to be pretty high. > b. Doesn't a minimum wage have to be paid even when working for stock > only in California? Pretty sure it does but the SBA would be a better reference for that. Liz > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -- Elizabeth Leddy elizabeth.leddy at gmail.com 707.776.6797 -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith at dart.us.com Fri Jun 17 02:55:36 2011 From: keith at dart.us.com (Keith Dart) Date: Thu, 16 Jun 2011 17:55:36 -0700 Subject: [Baypiggies] Looking for a Python automation developer, contract job Message-ID: <20110616175536.6f390ec1@dart.us.com> Greetings everyone, I'm currently working a contract job at Thales e-security. We need another person to write automated test cases and tools. The test cases and tools will of course be written in Python. This is a contract job (at least for now), and you may have to sign up with Oxford and Associates. If you're interested please contact me. The requirements are as follows. Required: * Reasonably proficient with Python * Familiar with OO concepts Nice to have: * Familiarity with test plans, test cases, and automated testing. * Proficient with Unix, especially Linux. * Have some knowledge of data networks. * Knowledge of cryptography The product is a crypto key management product being developed by Thales e-security. -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Keith Dart ===================================================================== From glen at glenjarvis.com Fri Jun 17 04:24:14 2011 From: glen at glenjarvis.com (Glen Jarvis) Date: Thu, 16 Jun 2011 19:24:14 -0700 Subject: [Baypiggies] The company I work at is hiring like gangbusters Message-ID: Although I posted something similar before, I want to throw one more out there. The company I'm with is aggressively hiring. I participated in four interviews today and have two more for tomorrow - and that's just me.. I'd love to see some BayPIGgie caliber step up to the plate... especially since I'd be working fairly closely with the people that we hire... W00t Glen P.S. Please contact me off list. -- Things which matter most must never be at the mercy of things which matter least. -- Goethe -------------- next part -------------- An HTML attachment was scrubbed... URL: From gracelaw at mac.com Sat Jun 18 03:56:21 2011 From: gracelaw at mac.com (Grace Law) Date: Fri, 17 Jun 2011 18:56:21 -0700 Subject: [Baypiggies] JOB: Python Server / Scalability Engineer for online games with 10+M users Message-ID: Hi there My company is looking to add 2 more python server engineers in SF- see below for more info and feel free to pass this along. Aside from being a python fan, you should like small teams, smart engineers, fun and collaborative work environment, optimized codes, and iterating quickly. Cheers Grace ------------------------ Do you want to write and scale high availability servers in Python handling millions of users? Do you want to work with smart and fun people and make an impact in the social gaming industry? If so, our _server_ team want to talk to you. About Lolapps: - Our 20 engineers are responsible for Ravenwood Fair on Facebook with 10+ millions of users - Our **3** people server team is dealing with 100s of servers handling 12K simultaneous requests and growing quickly. - Our core technology stack consists of Python(Pylons), AS3, MySQL, and Mongo. - We believe in small teams, smart engineers, fun and collaborative work environment, optimized codes, and iterating quickly. - People say we have the best engineers in the gaming space in the bay area. We say, that can't be true. There are tons of smart people and we want to work with more of them so we can all grow. - People say our game run faster and play better than Zynga's. We like it. We're looking for 2 more server / performance engineers to work onsite in SF with our fun team. About you: - a go getter and a team player - can bang out high quality codes quickly and have personal side projects - Love python and can code it in your sleep - Superior knowledge of Linux, scripting, and SQL - Understand when MySQL is great and experiment with NoSQL solutions (Memcached/MongoDB/Redis/Cassandra) - Know how to put together a web-application stack (We use Pylons/Paste.) - Strong in CS, have the capacity / experience to work with tons of data, write caching solutions, deal with scalability challenges of high transaction, high availability servers, tinker on real-time solutions. - Enjoy bouncing ideas off of your teammates to build up solutions no one person could of thought up by himself - Care about your implementations and find yourself compulsively checking that your latest experimental deploy is working the way you thought it would - Prefer the pace and excitement of building consumer internet applications over enterprise solutions - Definitely prefers making an impact at start-ups You'll get to: - Work in a 55 people company, strategically positioned in an innovative space that is expanding into a billion dollar industry. - Work with 3 really smart software engineers and own the infrastructure of very high transactions servers - Design and implement large chunks of scalability features that will take Lolapps' games to the next level. - Help make key infrastructure decisions (databases, replication layouts, caching solutions, etc.) - Experiment with the newest emerging open-source technologies. - Test your ideas and strategies out on millions of users and enormous data sets. - Have fun. Play ping pong, foosball, video games - Eat. We buy your lunches. - Be healthy. We offer free pilates classes onsite. Sounds intriguing? Play our latest game and see why millions are returning to Ravenwood Fair, one of Facebook's Top Social Games: (http://www.facebook.com/RavenwoodFair) To apply: http://hire.jobvite.com/j/?aj=oXmLVfwJ&s=BayPIGgies or Write to grace at lolapps.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at zachary.com Tue Jun 21 00:38:16 2011 From: david at zachary.com (David Creemer) Date: Mon, 20 Jun 2011 15:38:16 -0700 Subject: [Baypiggies] (off-topic) data wiring contractors? Message-ID: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com> Hi Folks -- sorry for the mostly off topic post. Does anyone have any experience with local data wiring / networking contractors? My startup is looking into new office spaces, and I'm trying to get an idea of the costs associated with running cable, setting up patch-panels, etc. I'd very much appreciate any recommendations and information. Thanks! -- David From kpguy1975 at gmail.com Tue Jun 21 16:08:42 2011 From: kpguy1975 at gmail.com (Vikram K) Date: Tue, 21 Jun 2011 10:08:42 -0400 Subject: [Baypiggies] itemgetter function unavailable in linux Message-ID: I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]] I need to sort this nested list based on the second element of each element in the nested list so that i end up with: [['cat',5], ['dragon',7], ['dog',10]] This used to be easy. Just use the itemgetter function in the itertools module. But on my linux machine, to my horror, there is no itemgetter function to be found in the itertools module. ----- Python 2.7.2 (default, Jun 21 2011, 09:56:35) [GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import itertools >>> from itertools import itemgetter Traceback (most recent call last): File "", line 1, in ImportError: cannot import name itemgetter >>> dir(itertools) ['__doc__', '__file__', '__name__', '__package__', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'groupby', 'ifilter', 'ifilterfalse', 'imap', 'islice', 'izip', 'izip_longest', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee'] >>> ---------- I am now going to grind away and do it the hard way, but can someone tell me why the itemgetter function is not available on linux although i have been using it when programming in windows? -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Tue Jun 21 16:22:49 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 21 Jun 2011 07:22:49 -0700 Subject: [Baypiggies] itemgetter function unavailable in linux In-Reply-To: References: Message-ID: On Tue, Jun 21, 2011 at 7:08 AM, Vikram K wrote: > I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]] > > I need to sort this nested list based on the second element of each element > in the nested list so that i end up with: > > [['cat',5], ['dragon',7], ['dog',10]] > > This used to be easy. Just use the itemgetter function in the itertools > module. But on my linux machine, to my horror, there is no itemgetter > function to be found in the itertools module. Is this the one you want: from operator import itemgetter? From kpguy1975 at gmail.com Tue Jun 21 16:23:57 2011 From: kpguy1975 at gmail.com (Vikram K) Date: Tue, 21 Jun 2011 10:23:57 -0400 Subject: [Baypiggies] itemgetter function unavailable in linux In-Reply-To: References: Message-ID: That's correct. Thanks. On Tue, Jun 21, 2011 at 10:22 AM, Keith Goodman wrote: > On Tue, Jun 21, 2011 at 7:08 AM, Vikram K wrote: > > I have a nested list of the type [['dog,10], ['cat',5], ['dragon',7]] > > > > I need to sort this nested list based on the second element of each > element > > in the nested list so that i end up with: > > > > [['cat',5], ['dragon',7], ['dog',10]] > > > > This used to be easy. Just use the itemgetter function in the itertools > > module. But on my linux machine, to my horror, there is no itemgetter > > function to be found in the itertools module. > > Is this the one you want: from operator import itemgetter? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdbaddog at gmail.com Thu Jun 23 00:19:14 2011 From: bdbaddog at gmail.com (William Deegan) Date: Wed, 22 Jun 2011 15:19:14 -0700 Subject: [Baypiggies] off topic, but maybe interesting to the group.. C++ presentation by Herb Sutter 6/29 in Santa Clara Message-ID: <4CCC1430-74E6-4ACF-B2F3-EFAB978E3638@gmail.com> http://blogs.msdn.com/b/matt-harrington/archive/2011/06/08/herb-sutter-c-now-and-forever-june-29-in-santa-clara.aspx -------------- next part -------------- An HTML attachment was scrubbed... URL: From simeonf at gmail.com Thu Jun 23 02:43:25 2011 From: simeonf at gmail.com (Simeon Franklin) Date: Wed, 22 Jun 2011 17:43:25 -0700 Subject: [Baypiggies] Aaron Maxwell offering a ride Message-ID: Aaron Maxwell is offering a ride from SF to Baypiggies - for some reason his message bounced as spam and given that this is time sensitive and I don't see any spam filter management features in mailman I'm just forwarding it on to the list myself (see below). For rides please contact Aaron at amax at redsymbol.net -regards Simeon Franklin ------ Hi all, I'm going to be driving in from San Francisco for this month's meeting, and have space for at least a couple of people. If you'd like a ride there and back, contact me off list. -- Aaron Maxwell http://redsymbol.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From bibha.tripathi at jpmchase.com Thu Jun 23 15:03:05 2011 From: bibha.tripathi at jpmchase.com (Tripathi, Bibha) Date: Thu, 23 Jun 2011 14:03:05 +0100 Subject: [Baypiggies] sorting a table by column Message-ID: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> a huge table like an excel sheet, saved and accumulating more data more rows, may be more tables user chooses which column to sort on what's the best python data structure to use? and which sorting method to make it look like real time as the user enters her choice of column to sort on? thanks. This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities. -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Thu Jun 23 15:40:35 2011 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu, 23 Jun 2011 06:40:35 -0700 Subject: [Baypiggies] sorting a table by column In-Reply-To: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> Message-ID: On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha wrote: > a huge table like an excel sheet, saved and accumulating more data more > rows, may be more tables > > user chooses which column to sort on > > what's the best python data structure to use? and which sorting method to > make it look like real time as the user enters her choice of column to sort > on? I haven't tried it, but you may want to take a look at tabular: http://pypi.python.org/pypi/tabular From david.berthelot at gmail.com Thu Jun 23 15:47:40 2011 From: david.berthelot at gmail.com (David Berthelot) Date: Thu, 23 Jun 2011 06:47:40 -0700 Subject: [Baypiggies] sorting a table by column In-Reply-To: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> Message-ID: Looks like a typical SQL problem. If I was to solve it in Python, assuming that due to computational time motivations the data cannot be resorted completely with: sort(key=itemgetter(column)) Then I would keep the table unsorted and I would create an index structure per column. index = [[] for c in xrange(columns)] When a row is added to the data table, I would add it to the index lists in sorted manner using the bisect module: row_id = len(data) data.append(row) for c in xrange(columns): bisect.insort_right(index[c],(data[row_id][c],row_id)) To lookup the table in sorted order according to column c, you would get the table indexes: ilist = map(itemgetter(1),index[c]) for x in ilist: print data[x] I just typed this on top of my head, so it's more to give the general principle than a robust implementation obviously. Similarly you could implement multi-column indexes, by replacing the tuple (data[row_id][x],row_id) with (data[row_id][col_1],data[row_id][col_2],...,data[row_id][col_n],row_id) assuming you desire a multi-column index on col_1,...,col_n On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha wrote: > a huge table like an excel sheet, saved and accumulating more data more > rows, may be more tables > > user chooses which column to sort on > > > > what's the best python data structure to use? and which sorting method to > make it look like real time as the user enters her choice of column to sort > on? > > > > thanks. > > This communication is for informational purposes only. It is not intended as > an offer or solicitation for the purchase or sale of any financial > instrument or as an official confirmation of any transaction. All market > prices, data and other information are not warranted as to completeness or > accuracy and are subject to change without notice. Any comments or > statements made herein do not necessarily reflect those of JPMorgan Chase & > Co., its subsidiaries and affiliates. This transmission may contain > information that is privileged, confidential, legally privileged, and/or > exempt from disclosure under applicable law. If you are not the intended > recipient, you are hereby notified that any disclosure, copying, > distribution, or use of the information contained herein (including any > reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any > attachments are believed to be free of any virus or other defect that might > affect any computer system into which it is received and opened, it is the > responsibility of the recipient to ensure that it is virus free and no > responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and > affiliates, as applicable, for any loss or damage arising in any way from > its use. If you received this transmission in error, please immediately > contact the sender and destroy the material in its entirety, whether in > electronic or hard copy format. Thank you. Please refer to > http://www.jpmorgan.com/pages/disclosures for disclosures relating to > European legal entities. > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From david.berthelot at gmail.com Thu Jun 23 16:05:44 2011 From: david.berthelot at gmail.com (David Berthelot) Date: Thu, 23 Jun 2011 07:05:44 -0700 Subject: [Baypiggies] sorting a table by column In-Reply-To: References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> Message-ID: Alternatively to bisect module which has log2(n) insertion cost, you could look into B+trees which have an insertion cost of logB(n): http://en.wikipedia.org/wiki/B%2B_tree There's a Python implementation linked on that page. I used it before and it seemed to have quite some problems while the performance over bisect was non-existent (for my particular needs). But that being said, it's worth checking. On Thu, Jun 23, 2011 at 6:47 AM, David Berthelot wrote: > Looks like a typical SQL problem. > > If I was to solve it in Python, assuming that due to computational > time motivations the data cannot be resorted completely with: > sort(key=itemgetter(column)) > > Then I would keep the table unsorted and I would create an index > structure per column. > index = [[] for c in xrange(columns)] > > When a row is added to the data table, I would add it to the index > lists in sorted manner using the bisect module: > row_id = len(data) > data.append(row) > for c in xrange(columns): > ?bisect.insort_right(index[c],(data[row_id][c],row_id)) > > To lookup the table in sorted order according to column c, you would > get the table indexes: > ilist = map(itemgetter(1),index[c]) > for x in ilist: > ?print data[x] > > I just typed this on top of my head, so it's more to give the general > principle than a robust implementation obviously. > > Similarly you could implement multi-column indexes, by replacing the > tuple (data[row_id][x],row_id) with > (data[row_id][col_1],data[row_id][col_2],...,data[row_id][col_n],row_id) > assuming you desire a multi-column index on col_1,...,col_n > > On Thu, Jun 23, 2011 at 6:03 AM, Tripathi, Bibha > wrote: >> a huge table like an excel sheet, saved and accumulating more data more >> rows, may be more tables >> >> user chooses which column to sort on >> >> >> >> what's the best python data structure to use? and which sorting method to >> make it look like real time as the user enters her choice of column to sort >> on? >> >> >> >> thanks. >> >> This communication is for informational purposes only. It is not intended as >> an offer or solicitation for the purchase or sale of any financial >> instrument or as an official confirmation of any transaction. All market >> prices, data and other information are not warranted as to completeness or >> accuracy and are subject to change without notice. Any comments or >> statements made herein do not necessarily reflect those of JPMorgan Chase & >> Co., its subsidiaries and affiliates. This transmission may contain >> information that is privileged, confidential, legally privileged, and/or >> exempt from disclosure under applicable law. If you are not the intended >> recipient, you are hereby notified that any disclosure, copying, >> distribution, or use of the information contained herein (including any >> reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any >> attachments are believed to be free of any virus or other defect that might >> affect any computer system into which it is received and opened, it is the >> responsibility of the recipient to ensure that it is virus free and no >> responsibility is accepted by JPMorgan Chase & Co., its subsidiaries and >> affiliates, as applicable, for any loss or damage arising in any way from >> its use. If you received this transmission in error, please immediately >> contact the sender and destroy the material in its entirety, whether in >> electronic or hard copy format. Thank you. Please refer to >> http://www.jpmorgan.com/pages/disclosures for disclosures relating to >> European legal entities. >> >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies >> > From Chris.Clark at ingres.com Thu Jun 23 19:31:44 2011 From: Chris.Clark at ingres.com (Chris Clark) Date: Thu, 23 Jun 2011 10:31:44 -0700 Subject: [Baypiggies] sorting a table by column In-Reply-To: References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> Message-ID: <4E037880.9040100@ingres.com> Tripathi, Bibha wrote: > > a huge table like an excel sheet, saved and accumulating more data > more rows, may be more tables > user chooses which column to sort on > > what's the best python data structure to use? It probably depends on what "huge" means. If the "table" data fits in memory (either physical or virtual) it probably isn't that big and doing operations in Python (e.g. using list comprehension) is probably appropriate. If it doesn't fit in memory you can't easily use list comprehension and need to look into loops and generators, see http://danielrech.net/2011/python-generators-presentation-by-david-beazley/ (I think Alex might have done something similar at PyCon a few years ago too). There are some third party libs on PyPi that are worth checking out. I keep wanting to find an excuse to kick the tires on http://pypi.python.org/pypi/blist/ but I've not had cause to do so yet. Search PyPi for "tree" and there are a lot of hits. You can even use good old Schwartzian transforms (aka decorate-sort-undecorate) to handle changes in columns if for some reason there isn't a key argument to sort() provided by the structure you choose. David Berthelot wrote: > Looks like a typical SQL problem. > Agreed, without more information this sounds like a classic "ORDERY BY" clause on a SELECT statement. Relational database really excel, lower case "e", rather than upper case "E" :-), at this sort of thing..... If you check my email address domain name, of course I'm going to say that ;-) Databases do a lot of the heavy lifting for you. If you're doing some sort of BI analytics a database is probably your best bet (I'm taking a guess you are based on your email address domain name). Shameless promotion, take a gander at http://www.thevirtualcircle.com/2011/02/vectorwise-theres-a-disturbance-in-the-force/ and http://www.ingres.com/products/vectorwise (I don't work on Vectorwise but I'm always blown away at how fast it is). If you are avoiding a traditional DBMS for performance reasons VW may well surprise you.. Chris From cappy2112 at gmail.com Fri Jun 24 00:57:11 2011 From: cappy2112 at gmail.com (Tony Cappellini) Date: Thu, 23 Jun 2011 15:57:11 -0700 Subject: [Baypiggies] Looking for 1 reviewer to review an ebook copy of The Python Standard Library by Example" by Doug Hellmann. Message-ID: Pearson is looking for *** 1 *** reviewer to review an eBook version of The Python Standard Library by Example" by Doug Hellmann. Please reply OFF-LIST if interested. Thanks "The Python Standard Library by Example" (Doug Hellmann), published by Addison-Wesley Professional, June 2011, Copyright 2011 Pearson Education, Inc. Publisher page: www.informit.com/title/0321767349 Introduction 1 ** Chapter 1: Text (page 3) 1.1 string?Text Constants and Templates 1.2 textwrap?Formatting Text Paragraphs 1.3 re?Regular Expressions 1.4 difflib?Compare Sequences ** Chapter 2: Data Structures (page 69) 2.1 collections?Container Data Types 2.2 array?Sequence of Fixed-Type Data 2.3 heapq?Heap Sort Algorithm 2.4 bisect?Maintain Lists in Sorted Order 2.5 Queue?Thread-Safe FIFO Implementation 2.6 struct?Binary Data Structures 2.7 weakref?Impermanent References to Objects 2.8 copy?Duplicate Objects 2.9 pprint?Pretty-Print Data Structures **Chapter 3: Algorithms (page 129) 3.1 functools?Tools for Manipulating Functions 3.2 itertools?Iterator Functions 3.3 operator?Functional Interface to Built-in Operators 3.4 contextlib?Context Manager Utilities Chapter 4: Dates and Times (page 173) 4.1 time?Clock Time 173 4.2 datetime?Date and Time Value Manipulation 180 4.3 calendar?Work with Dates 191 **Chapter 5: Mathematics (page 197) 5.1 decimal?Fixed and Floating-Point Math 5.2 fractions?Rational Numbers 5.3 random?Pseudorandom Number Generators 5.4 math?Mathematical Functions ** Chapter 6: The File System (page 247) 6.1 os.path?Platform-Independent Manipulation of Filenames 6.2 glob?Filename Pattern Matching 6.3 linecache?Read Text Files Efficiently 6.4 tempfile?Temporary File System Objects 6.5 shutil?High-Level File Operations 6.6 mmap?Memory-Map Files 6.7 codecs?String Encoding and Decoding 6.8 StringIO?Text Buffers with a File-like API 6.9 fnmatch?UNIX-Style Glob Pattern Matching 6.10 dircache?Cache Directory Listings 6.11 filecmp?Compare Files ** Chapter 7: Data Persistence and Exchange (page 333) 7.1 pickle?Object Serialization 7.2 shelve?Persistent Storage of Objects 7.3 anydbm?DBM-Style Databases 7.4 whichdb?Identify DBM-Style Database Formats 7.5 sqlite3?Embedded Relational Database 7.6 xml.etree.ElementTree?XML Manipulation API 7.7 csv?Comma-Separated Value Files ** Chapter 8: Data Compression and Archiving (page 421) 8.1 zlib?GNU zlib Compression 8.2 gzip?Read and Write GNU Zip Files 8.3 bz2?bzip2 Compression 8.4 tarfile?Tar Archive Access 8.5 zipfile?ZIP Archive Access ** Chapter 9: Cryptography (page 469) 9.1 hashlib?Cryptographic Hashing 9.2 hmac?Cryptographic Message Signing and Verification ** Chapter 10: Processes and Threads (page 481) 10.1 subprocess?Spawning Additional Processes 10.2 signal?Asynchronous System Events 10.3 threading?Manage Concurrent Operations 10.4 multiprocessing?Manage Processes like Threads ** Chapter 11: Networking (page 561) 11.1 socket?Network Communication 11.2 select?Wait for I/O Efficiently 11.3 SocketServer?Creating Network Servers 11.4 asyncore?Asynchronous I/O 11.5 asynchat?Asynchronous Protocol Handler Chapter 12: The Internet (page 637) 12.1 urlparse?Split URLs into Components 12.2 BaseHTTPServer?Base Classes for Implementing Web Servers 12.3 urllib?Network Resource Access 12.4 urllib2?Network Resource Access 12.5 base64?Encode Binary Data with ASCII 12.6 robotparser?Internet Spider Access Control 12.7 Cookie?HTTP Cookies 12.8 uuid?Universally Unique Identifiers 12.9 json?JavaScript Object Notation 12.10 xmlrpclib?Client Library for XML-RPC 12.11 SimpleXMLRPCServer?An XML-RPC Server ** Chapter 13: Email (page 727) 13.1 smtplib?Simple Mail Transfer Protocol Client 13.2 smtpd?Sample Mail Servers 13.3 imaplib?IMAP4 Client Library 13.4 mailbox?Manipulate Email Archives **Chapter 14: Application Building Blocks (page 769) 14.1 getopt?Command-Line Option Parsing 14.2 optparse?Command-Line Option Parser 14.3 argparse?Command-Line Option and Argument Parsing 14.4 readline?The GNU Readline Library 14.5 getpass?Secure Password Prompt 14.6 cmd?Line-Oriented Command Processors 14.7 shlex?Parse Shell-Style Syntaxes 14.8 ConfigParser?Work with Configuration Files 14.9 logging?Report Status, Error, and Informational Messages 14.10 fileinput?Command-Line Filter Framework 14.11 atexit?Program Shutdown Callbacks 14.12 sched?Timed Event Scheduler ** Chapter 15: Internationalization and Localization (page 899) 15.1 gettext?Message Catalogs 15.2 locale?Cultural Localization API ** Chapter 16: Developer Tools (page 919) 16.1 pydoc?Online Help for Modules 16.2 doctest?Testing through Documentation 16.3 unittest?Automated Testing Framework 16.4 traceback?Exceptions and Stack Traces 16.5 cgitb?Detailed Traceback Reports 16.6 pdb?Interactive Debugger 16.7 trace?Follow Program Flow 16.8 profile and pstats?Performance Analysis 16.9 timeit?Time the Execution of Small Bits of Python Code 16.10 compileall?Byte-Compile Source Files 16.11 pyclbr?Class Browser ** Chapter 17: Runtime Features (page 1045) 17.1 site?Site-Wide Configuration 17.2 sys?System-Specific Configuration 17.3 os?Portable Access to Operating System Specific Features 17.4 platform?System Version Information 17.5 resource?System Resource Management 17.6 gc?Garbage Collector 17.7 sysconfig?Interpreter Compile-Time Configuration ** Chapter 18: Language Tools (page 1169) 18.1 warnings?Nonfatal Alerts 18.2 abc?Abstract Base Classes 18.3 dis?Python Bytecode Disassembler 18.4 inspect?Inspect Live Objects 18.5 exceptions?Built-in Exception Classes ** Chapter 19: Modules and Packages (page 1235) 19.1 imp?Python?s Import Mechanism 19.2 zipimport?Load Python Code from ZIP Archives 19.3 pkgutil?Package Utilities *Index of Python Modules (page 1259)* Index (page 1261) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cappy2112 at gmail.com Fri Jun 24 01:29:11 2011 From: cappy2112 at gmail.com (Tony Cappellini) Date: Thu, 23 Jun 2011 16:29:11 -0700 Subject: [Baypiggies] Reviewer found for The Python Standard Library by Example- no more replies are necessary Message-ID: Reviewer found for The Python Standard Library by Example- no more replies are necessary Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at systemateka.com Fri Jun 24 02:28:38 2011 From: jim at systemateka.com (jim) Date: Thu, 23 Jun 2011 17:28:38 -0700 Subject: [Baypiggies] Ride down and back from SF to Dan Robert's PyPy talk Message-ID: <1308875318.1681.19.camel@jim-LAPTOP> I'm at Noisebridge. It's about 5:30. I'll leave for the BayPIGgies meeting around 6:00 PM. You wanna ride down and back, ask via email between now and 6 PM. jim From jim at well.com Fri Jun 24 20:44:19 2011 From: jim at well.com (jim) Date: Fri, 24 Jun 2011 11:44:19 -0700 Subject: [Baypiggies] (off-topic) data wiring contractors? In-Reply-To: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com> References: <8380C5DF-65A7-4766-A249-C65B14E6D82F@zachary.com> Message-ID: <1308941059.1736.10.camel@jim-LAPTOP> If the job is primarily screwing equipment into racks and labelling and pulling cables to switches for local groups, Systemateka can do much of the work at $40 per hour. This assumes appliances and standard configuration for firewalls, gateways, and network configuration. Simple architecture (usually the case) is between $60 and $80 per hour. Some specialized jobs such as custom firewalls and special network configuration may be billed at $80 per hour or more. Without knowing the number of seats and boxes and so forth, there's no way to estimate the total cost of the job. On Mon, 2011-06-20 at 15:38 -0700, David Creemer wrote: > Hi Folks -- sorry for the mostly off topic post. > > Does anyone have any experience with local data wiring / networking contractors? My startup is looking into new office spaces, and I'm trying to get an idea of the costs associated with running cable, setting up patch-panels, etc. I'd very much appreciate any recommendations and information. > > Thanks! > -- David > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From ademan555 at gmail.com Sat Jun 25 21:56:18 2011 From: ademan555 at gmail.com (Dan Roberts) Date: Sat, 25 Jun 2011 12:56:18 -0700 Subject: [Baypiggies] PyPy 101 Talk Slides from Thursday Message-ID: Hi Baypiggies, At least a couple of people wanted to see slides from my presentation on Thursday. I've hosted them temporarily at http://codespeak.net/~dan/talk.pdf I'm also happy to answer any questions that weren't adequately answered during my talk, and of course over in #pypy on irc.freenode.net there are even more answers. Cheers everyone, Dan From spmcinerney at hotmail.com Sat Jun 25 22:42:16 2011 From: spmcinerney at hotmail.com (Stephen McInerney) Date: Sat, 25 Jun 2011 13:42:16 -0700 Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? Message-ID: What do people use for scraping on a website requiring (login form-based) authentication? BeautifulSoup: does not handle authentication or cookiesScrapy: does but more heavyweight paradigm to learn, incl. XPath Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python Thanks, Stephen -------------- next part -------------- An HTML attachment was scrubbed... URL: From tbibha at gmail.com Fri Jun 24 07:40:02 2011 From: tbibha at gmail.com (Bibha Tripathi) Date: Fri, 24 Jun 2011 06:40:02 +0100 Subject: [Baypiggies] sorting a table by column In-Reply-To: <4E037880.9040100@ingres.com> References: <3E758131F03CFC4B8076F6ECC9B5C2760F47C2D690@EMAZC217VS01.exchad.jpmchase.net> <4E037880.9040100@ingres.com> Message-ID: Has anyone used PYNQ? how does it perform compared with dict seart? Cheers, BT ~~~~~ "Every sound ends in music: The edge of every surface is tinged with prismatic rays."- RWE Sent from my iPhone On 23 Jun 2011, at 06:31 PM, Chris Clark wrote: > Tripathi, Bibha wrote: >> >> a huge table like an excel sheet, saved and accumulating more data more rows, may be more tables >> user chooses which column to sort on >> >> what's the best python data structure to use? > > It probably depends on what "huge" means. If the "table" data fits in memory (either physical or virtual) it probably isn't that big and doing operations in Python (e.g. using list comprehension) is probably appropriate. If it doesn't fit in memory you can't easily use list comprehension and need to look into loops and generators, see http://danielrech.net/2011/python-generators-presentation-by-david-beazley/ (I think Alex might have done something similar at PyCon a few years ago too). > > There are some third party libs on PyPi that are worth checking out. I keep wanting to find an excuse to kick the tires on http://pypi.python.org/pypi/blist/ but I've not had cause to do so yet. Search PyPi for "tree" and there are a lot of hits. > > You can even use good old Schwartzian transforms (aka decorate-sort-undecorate) to handle changes in columns if for some reason there isn't a key argument to sort() provided by the structure you choose. > > > David Berthelot wrote: >> Looks like a typical SQL problem. >> > > Agreed, without more information this sounds like a classic "ORDERY BY" clause on a SELECT statement. Relational database really excel, lower case "e", rather than upper case "E" :-), at this sort of thing..... If you check my email address domain name, of course I'm going to say that ;-) Databases do a lot of the heavy lifting for you. > > If you're doing some sort of BI analytics a database is probably your best bet (I'm taking a guess you are based on your email address domain name). Shameless promotion, take a gander at http://www.thevirtualcircle.com/2011/02/vectorwise-theres-a-disturbance-in-the-force/ and http://www.ingres.com/products/vectorwise (I don't work on Vectorwise but I'm always blown away at how fast it is). If you are avoiding a traditional DBMS for performance reasons VW may well surprise you.. > > Chris > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies From peter.borocz at gmail.com Sat Jun 25 23:38:27 2011 From: peter.borocz at gmail.com (Peter Borocz) Date: Sat, 25 Jun 2011 14:38:27 -0700 Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? In-Reply-To: References: Message-ID: While usually thought of only for testing, I've happily used twillfor the authentication/cookie/form-handling portion then beautifulsoup for the parsing. Twill can be configured to use beautifulsoup directly but with direct access to the underlying page, you can use any parsing library you like. PeterB On Sat, Jun 25, 2011 at 1:42 PM, Stephen McInerney wrote: > > What do people use for scraping on a website requiring (login form-based) > authentication? > > - BeautifulSoup: does not handle authentication or cookies > - Scrapy: does but more heavyweight paradigm to learn, incl. XPath > > > Some discussion: > http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python > > Thanks, > Stephen > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -- peter.borocz at gmail dot com -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen at glenjarvis.com Sun Jun 26 03:48:54 2011 From: glen at glenjarvis.com (Glen Jarvis) Date: Sat, 25 Jun 2011 18:48:54 -0700 Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? In-Reply-To: References: Message-ID: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com> Stephen, Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you. You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page. Cheers, Glen On Jun 25, 2011, at 1:42 PM, Stephen McInerney wrote: > > What do people use for scraping on a website requiring (login form-based) authentication? > BeautifulSoup: does not handle authentication or cookies > Scrapy: does but more heavyweight paradigm to learn, incl. XPath > > Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python > > Thanks, > Stephen > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaron at midnightresearch.com Sun Jun 26 04:14:19 2011 From: aaron at midnightresearch.com (Aaron Peterson) Date: Sat, 25 Jun 2011 19:14:19 -0700 Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? In-Reply-To: References: Message-ID: Hello: Mechanize is another good module for automating this kind of thing. HTH, Aaron On Jun 25, 2011 1:43 PM, "Stephen McInerney" wrote: > > > What do people use for scraping on a website requiring (login form-based) authentication? > BeautifulSoup: does not handle authentication or cookiesScrapy: does but more heavyweight paradigm to learn, incl. XPath > Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python > > Thanks, > Stephen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ademan555 at gmail.com Mon Jun 27 06:40:43 2011 From: ademan555 at gmail.com (Dan Roberts) Date: Sun, 26 Jun 2011 21:40:43 -0700 Subject: [Baypiggies] PyPy 101 Talk Slides from Thursday In-Reply-To: References: Message-ID: Well, like I said that was before my time. My impression was that it just didn't yield any benefits. http://doc.pypy.org/en/latest/project-ideas.html?highlight=llvm suggests that LLVM just wasn't ready to be used with PyPy. More recently we heard from the Unladen Swallow project that LLVM wasn't ready for that either. I confirmed this with one of the older PyPy developers, LLVM was just too buggy at the time, and PyPy has been burned by it multiple times. I personally don't know what LLVM would bring to the table. I'm far from an expert on LLVM, so I may be ignoring important features that it has, so feel free to chime in if you think there's something I'm missing. For the translation process (where currently we generate either C, Jasmin JVM assembler, or CLI bytecode, this is the offline process analagous to "compile time") I don't think LLVM would be beneficial, executables produced by GCC still tend to beat clang's binaries. However, for the JIT it might have worthwhile code generation features. PyPy implements a fairly effective set of optimizations on the JIT code it emits. It's possible that layering LLVM's optimizations would produce better code at runtime. I'm fairly confident in saying that no core PyPy developer would be interested in pursuing this again, however, the door is always wide open for people to try "crazy" things with PyPy. If someone is willing to devote time to bring LLVM support up to par, and didn't break a bunch of other things (no reason why it should), it would definitely be accepted. One could probably implement LLVM as a JIT backend similar to how different CPU architectures are supported, and that would be "fairly trivial". Anyways, the short answer is that it was too immature for several attempts. On Sat, Jun 25, 2011 at 1:26 PM, Tony Cappellini wrote: > Dan > When I asked you about pyp using the LLVM, you said they tried before you > got involved with the project, > but then just let that branch go to bitrot. > Do you know why they stopped using the LLVM? > It seems as though it would save you a lot of work- but if the performance > wasn't good enough, that would be reason enough. > I'm quite impressed with the speed that you demonstrated. > > On Sat, Jun 25, 2011 at 12:56 PM, Dan Roberts wrote: >> >> Hi Baypiggies, >> ? ?At least a couple of people wanted to see slides from my >> presentation on Thursday. I've hosted them temporarily at >> http://codespeak.net/~dan/talk.pdf I'm also happy to answer any >> questions that weren't adequately answered during my talk, and of >> course over in #pypy on irc.freenode.net there are even more answers. >> >> Cheers everyone, >> Dan >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies > > From kpguy1975 at gmail.com Mon Jun 27 23:06:58 2011 From: kpguy1975 at gmail.com (Vikram K) Date: Mon, 27 Jun 2011 17:06:58 -0400 Subject: [Baypiggies] nested list question Message-ID: Suppose i have the following nested list: >>> x [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] How do i obtain from nested list x (given above), the following nested list z: >>> z [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']] ------ In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwight_hubbard at yahoo.com Tue Jun 28 00:07:40 2011 From: dwight_hubbard at yahoo.com (Dwight Hubbard) Date: Mon, 27 Jun 2011 15:07:40 -0700 (PDT) Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? In-Reply-To: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com> References: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com> Message-ID: <1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com> For scraping with authentication I find the twill module is very good. >________________________________ >From: Glen Jarvis >To: Stephen McInerney >Cc: "" >Sent: Saturday, June 25, 2011 6:48 PM >Subject: Re: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? > > >Stephen, >?? ?Beautiful soup really just parses the HTML. It doesn't (have to) retrieve the page for you. > > >?? ?You can use the built-in httplib2, urllib libraries to retrieve the page (also with authentication) and then use BeautifulSoup to parse the page. > >Cheers, > > > > >Glen > >On Jun 25, 2011, at 1:42 PM, Stephen McInerney wrote: > > > >>What do people use for scraping on a website requiring (login form-based) authentication? >> >> * BeautifulSoup: does not handle authentication or cookies >> * Scrapy: does but more heavyweight paradigm to learn, incl. XPath >>Some discussion: http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python >> >>Thanks, >>Stephen >> >> >_______________________________________________ >>Baypiggies mailing list >>Baypiggies at python.org >>To change your subscription options or unsubscribe: >>http://mail.python.org/mailman/listinfo/baypiggies >_______________________________________________ >Baypiggies mailing list >Baypiggies at python.org >To change your subscription options or unsubscribe: >http://mail.python.org/mailman/listinfo/baypiggies > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at mischievous.org Tue Jun 28 00:29:30 2011 From: jason at mischievous.org (Jason Culverhouse) Date: Mon, 27 Jun 2011 15:29:30 -0700 Subject: [Baypiggies] nested list question In-Reply-To: References: Message-ID: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org> On Jun 27, 2011, at 2:06 PM, Vikram K wrote: > Suppose i have the following nested list: > > >>> x > [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] > > > How do i obtain from nested list x (given above), the following nested list z: > > >>> z > [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']] > How about: list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0))) or the whole nested list with just list(unique_everseen(x, operator.itemgetter(2))) where : unique_everseen is from http://docs.python.org/library/itertools.html If you data is already sorted by the key then unique_justseen might be more efficient? Jason > ------ > In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -------------- next part -------------- An HTML attachment was scrubbed... URL: From ryan at larrabure.org Tue Jun 28 17:25:18 2011 From: ryan at larrabure.org (Ryan Larrabure) Date: Tue, 28 Jun 2011 08:25:18 -0700 Subject: [Baypiggies] Scraping with authentication: Scrapy vs BeautifulSoup? In-Reply-To: <1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com> References: <68EB5F99-9A2F-4C33-BB33-0D86F813650E@glenjarvis.com> <1309212460.39951.YahooMailNeo@web112520.mail.gq1.yahoo.com> Message-ID: If you're scraping HTML, all reasonable roads seem to lead to xpath. I'd use httplib2 and lxml. Avoid mechanize. It's form handling is very poor (it'll read forms stored inline within javascript tags). On Mon, Jun 27, 2011 at 3:07 PM, Dwight Hubbard wrote: > For scraping with authentication I find the twill module is very good. > > ________________________________ > From: Glen Jarvis > To: Stephen McInerney > Cc: "" > Sent: Saturday, June 25, 2011 6:48 PM > Subject: Re: [Baypiggies] Scraping with authentication: Scrapy vs > BeautifulSoup? > > Stephen, > ?? ?Beautiful soup really just parses the HTML. It doesn't (have to) > retrieve the page for you. > ?? ?You can use the built-in httplib2, urllib libraries to retrieve the page > (also with authentication) and then use BeautifulSoup to parse the page. > Cheers, > > Glen > On Jun 25, 2011, at 1:42 PM, Stephen McInerney > wrote: > > > What do people use for scraping on a website requiring (login form-based) > authentication? > > BeautifulSoup: does not handle authentication or cookies > Scrapy: does but more heavyweight paradigm to learn, incl. XPath > > Some discussion: > http://stackoverflow.com/questions/4328271/best-way-for-a-beginner-to-learn-screen-scraping-with-python > > Thanks, > Stephen > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From kpguy1975 at gmail.com Tue Jun 28 17:43:46 2011 From: kpguy1975 at gmail.com (Vikram K) Date: Tue, 28 Jun 2011 11:43:46 -0400 Subject: [Baypiggies] nested list question In-Reply-To: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org> References: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org> Message-ID: Thanks Jason. Could you (or someone else) suggest some approach for the following: >>> x [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] How do i obtain from nested list x (given above), the following nested list z: >>> z [[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762', '1','chr14_23354066', 'MISSENSE', 'heterozygous']] In list x, the first element is loci, second element is allele, third element is chromosome_positionofchange, fourth is type of change. Based on the value of the second and third element a new element has to be created --'homozygous' if both allele 1 and allele 2 have the change and 'heterozygous' if only one allele has the change. On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse wrote: > On Jun 27, 2011, at 2:06 PM, Vikram K wrote: > > Suppose i have the following nested list: > > >>> x > [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', > 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', > 'MISSENSE']] > > > How do i obtain from nested list x (given above), the following nested list > z: > > >>> z > [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']] > > > How about: > > list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0))) > > or the whole nested list with just > > list(unique_everseen(x, operator.itemgetter(2))) > > where : > > unique_everseen is from > http://docs.python.org/library/itertools.html > > If you data is already sorted by the key then > unique_justseen > > might be more efficient? > > Jason > > ------ > In other words, if the third element of an element of x is the same, then i > wish to combine it into a single element. > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brian.curtin at gmail.com Wed Jun 29 05:11:24 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Tue, 28 Jun 2011 22:11:24 -0500 Subject: [Baypiggies] Python User Group International Survey Message-ID: The PSF is happy to launch today an international survey of Pythonuser group organizers to help it better serve the large and ever-expanding international Python user community. The survey contains questions on user group organization, events, demographics, and growth. There are some questions with numerical answers, and while your best guess is fine, you may find it helpful to gather some statistics on your user group membership before starting the survey (example statistics include the number of active members and the size and topics for recent user group events). We expect this survey to take around 30 minutes to complete. We appreciate your time and honesty in answering these questions. The PSF blog post announcing the survey: http://pyfound.blogspot.com/2011/06/tell-us-about-your-user-group.html The survey was written by Jessica McKellar (http://jesstess.com), organizer for the Boston Python Meetup (http://meetup.bostonpython.com), and Jesse Noller (http://jessenoller.com/), PSF board member and PyCon chair with input and feedback from survey specialists and others. https://www.surveymonkey.com/s/BWLG8SZ The survey was pretested with a handful of user group organizers, and their answers were phenomenal. Organizers have tons to say about these topics, and we hope to get a lot of great, actionable data for strengthening the relationship between the PSF and Python user groups out of this effort. Outreach, education, diversity and community building are critical for Python as a community, and the Foundation - this data should greatly assist in our targeting our resources and furthering the mission of the Foundation in all ways. Thank you The Python Software Foundation Jessica McKellar Jesse Noller -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at mischievous.org Wed Jun 29 07:06:39 2011 From: jason at mischievous.org (Jason Culverhouse) Date: Tue, 28 Jun 2011 22:06:39 -0700 Subject: [Baypiggies] nested list question In-Reply-To: References: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org> Message-ID: <83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org> On Jun 28, 2011, at 8:43 AM, Vikram K wrote: > Thanks Jason. Could you (or someone else) suggest some approach for the following: > > >>> x > [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] > > > How do i obtain from nested list x (given above), the following nested list z: > > >>> z > [[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762', '1','chr14_23354066', 'MISSENSE', 'heterozygous']] > > In list x, the first element is loci, second element is allele, third element is chromosome_positionofchange, fourth is type of change. Based on the value of the second and third element a new element has to be created --'homozygous' if both allele 1 and allele 2 have the change and 'heterozygous' if only one allele has the change. > > Just for kicks... Is this an employment test? Does anyone have a better way to code the inside of the for loop? ---- from operator import itemgetter from itertools import groupby from somewhere import unique_justseen # http://docs.python.org/library/itertools.html key_func = itemgetter(0,2,3) output = [] # you need to sort to make group by work properly for k, v in groupby(sorted(x, key=key_func), key_func): #these are sorted to unique_justseen is a good option # as long as there are not that many allele inner = list(unique_justseen(v)) output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) and 'homozygous' or 'heterozygous']) print output [['18467762', '1', 'chr14_23354066', 'MISSENSE', 'homozygous'], ['19600894', '1/2', 'chr15_76136768', 'MISSENSE', 'homozygous']] Jason > > On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse wrote: > On Jun 27, 2011, at 2:06 PM, Vikram K wrote: > >> Suppose i have the following nested list: >> >> >>> x >> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] >> >> >> How do i obtain from nested list x (given above), the following nested list z: >> >> >>> z >> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']] >> > > How about: > > list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0))) > > or the whole nested list with just > > list(unique_everseen(x, operator.itemgetter(2))) > > where : > > unique_everseen is from > http://docs.python.org/library/itertools.html > > If you data is already sorted by the key then > unique_justseen > > might be more efficient? > > Jason > >> ------ >> In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies > > From jason at mischievous.org Wed Jun 29 07:20:40 2011 From: jason at mischievous.org (Jason Culverhouse) Date: Tue, 28 Jun 2011 22:20:40 -0700 Subject: [Baypiggies] nested list question In-Reply-To: <83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org> References: <16E5CAD9-33A1-4E82-A55A-B98BE4802335@mischievous.org> <83932618-CB73-4EBA-874B-6310B7EFFFDE@mischievous.org> Message-ID: On Jun 28, 2011, at 10:06 PM, Jason Culverhouse wrote: > > On Jun 28, 2011, at 8:43 AM, Vikram K wrote: > >> Thanks Jason. Could you (or someone else) suggest some approach for the following: >> >>>>> x >> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] >> >> >> How do i obtain from nested list x (given above), the following nested list z: >> >>>>> z >> [[19600894','1/2','chr15_76136768', 'MISSENSE', 'homozygous'], ['18467762', '1','chr14_23354066', 'MISSENSE', 'heterozygous']] >> >> In list x, the first element is loci, second element is allele, third element is chromosome_positionofchange, fourth is type of change. Based on the value of the second and third element a new element has to be created --'homozygous' if both allele 1 and allele 2 have the change and 'heterozygous' if only one allele has the change. >> >> > > Just for kicks... Is this an employment test? > > Does anyone have a better way to code the inside of the for loop? > ---- > from operator import itemgetter > from itertools import groupby > > from somewhere import unique_justseen # http://docs.python.org/library/itertools.html > > key_func = itemgetter(0,2,3) > > output = [] > # you need to sort to make group by work properly > for k, v in groupby(sorted(x, key=key_func), key_func): > #these are sorted to unique_justseen is a good option > # as long as there are not that many allele > inner = list(unique_justseen(v)) > output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) and 'homozygous' or 'heterozygous']) A "minor fix" to paste the correct 'homozygous' or 'heterozygous' computation below.... output.append([k[0], '/'.join(i[1] for i in inner), k[1], k[2], len(inner) > 1 and 'homozygous' or 'heterozygous']) > print output > > [['18467762', '1', 'chr14_23354066', 'MISSENSE', 'homozygous'], ['19600894', '1/2', 'chr15_76136768', 'MISSENSE', 'homozygous']] > > Jason > > > >> >> On Mon, Jun 27, 2011 at 6:29 PM, Jason Culverhouse wrote: >> On Jun 27, 2011, at 2:06 PM, Vikram K wrote: >> >>> Suppose i have the following nested list: >>> >>>>>> x >>> [['19600894', '1', 'chr15_76136768', 'MISSENSE'], ['19600894', '2', 'chr15_76136768', 'MISSENSE'], ['18467762', '1', 'chr14_23354066', 'MISSENSE']] >>> >>> >>> How do i obtain from nested list x (given above), the following nested list z: >>> >>>>>> z >>> [['chr15_76136768', 'MISSENSE'], ['chr14_23354066', 'MISSENSE']] >>> >> >> How about: >> >> list(unique_everseen((y[2:4] for y in x), operator.itemgetter(0))) >> >> or the whole nested list with just >> >> list(unique_everseen(x, operator.itemgetter(2))) >> >> where : >> >> unique_everseen is from >> http://docs.python.org/library/itertools.html >> >> If you data is already sorted by the key then >> unique_justseen >> >> might be more efficient? >> >> Jason >> >>> ------ >>> In other words, if the third element of an element of x is the same, then i wish to combine it into a single element. >>> _______________________________________________ >>> Baypiggies mailing list >>> Baypiggies at python.org >>> To change your subscription options or unsubscribe: >>> http://mail.python.org/mailman/listinfo/baypiggies >> >> > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies