From roderick at sanfransystems.com Mon Nov 1 18:29:52 2010 From: roderick at sanfransystems.com (Roderick Llewellyn) Date: Mon, 01 Nov 2010 10:29:52 -0700 Subject: [Baypiggies] Elance, Guru, Contract Work etc. In-Reply-To: References: Message-ID: <4CCEF910.8010705@sanfransystems.com> I avoid sites like these like the plague. This is priceline.com for programmers, where the advantage is totally with the one offering the job. You're competing with the entire world for these jobs, so the whole game is to drive down your compensation to the lowest possible level on the planet. Do you really want to be paid the going wage in Bangalore? No problem if that reflects your cost of living (i.e., you live in Bangalore). But since this is BayPiggies, you probably live in the Bay Area. So you're paying the highest cost of living after Manhattan, and getting paid the lowest wages on Earth. That's a problem! Your client cares not that you live in the Bay Area. I agree with the caveat against fixed-price bids. These are VERY dangerous in programming. It's one thing to ask a construction contractor for a fixed-price bid on building a deck. There are not many unknowns there. In programming, once you have negotiated a fixed price, the client has a huge incentive to make endless change orders. Naturally you can refuse to take them, asking for an hourly rate on each one. But since a fixed-price bid usually means you get paid little or nothing until the job is done, if client is unsatisfied with your negotiating stance, he will probably not pay you at all. And generally forget the courts; they are so complicated, take so long, and are so advantageous to the side with more money and patience that any contract you sign is almost meaningless anyway. I have major experience here I assure you! So be wary of any contract longer than a month or two which won't pay you until completion. Of course, you could always arrange to meet under a bridge, you bringing your software, client bringing his money, and both taking no more than three armed guards.... I'm sure you've seen that movie too! If you take tiny jobs, like write an ascii-to-integer converter kind of thing, you will spend far more time looking for work, negotiating, phone calls, etc., than you will spend actually doing work. Since you will often not be willing to take the Bangalore-level wage that will be offered, you will not get or take most jobs. Look instead for longer-term contracts. Try to find something where you have unusual skills or abilities. If the job is to download a LAMP suite and get it running, writing 100 lines of glue code in the process, you're competing against every kid on the planet, because almost everybody can do that. It's not really even software engineering. If on the other hand you know how to optimize MySQL queries better than Joe the Plumber... oops I meant Programmer, you have a better chance. My two cents! - Rod L. From ken.barclay at att.net Mon Nov 1 18:56:49 2010 From: ken.barclay at att.net (ken barclay) Date: Mon, 1 Nov 2010 10:56:49 -0700 (PDT) Subject: [Baypiggies] mrjob - distributed computing in Python made easy In-Reply-To: Message-ID: <257230.74237.qm@web180713.mail.sp1.yahoo.com> +1 --- On Sat, 10/30/10, Alexandre Conrad wrote: From: Alexandre Conrad Subject: Re: [Baypiggies] mrjob - distributed computing in Python made easy To: "Jimmy Retzlaff" Cc: baypiggies at python.org Date: Saturday, October 30, 2010, 12:14 AM +1 2010/10/29 Jimmy Retzlaff : > Yelp has just open sourced mrjob. It's a package that makes doing > MapReduce in Python almost trivial. In as little as 10 lines of code > you can do MapReduce on your own computer. Add a handfull of lines of > configuration and you can be running your code unchanged in parallel > on dozens of machine on Amazon Elastic Map Reduce for a a few dollars > (a dozen "small" machines for an hour would be about $1). Dave, the > primary author just wrote up a post on our engineering blog: > > http://engineeringblog.yelp.com/2010/10/mrjob-distributed-computing-for-everybody.html > > I'm putting in a proposal to give a talk on mrjob at PyCon 2011. > Whether that's accepted or not, maybe I can give the same talk at > BayPIGgies in January or February if there's interest... > > Jimmy > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -- Alex | twitter.com/alexconrad _______________________________________________ Baypiggies mailing list Baypiggies at python.org To change your subscription options or unsubscribe: http://mail.python.org/mailman/listinfo/baypiggies -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen at glenjarvis.com Mon Nov 1 19:49:51 2010 From: glen at glenjarvis.com (Glen Jarvis) Date: Mon, 1 Nov 2010 11:49:51 -0700 Subject: [Baypiggies] Elance, Guru, Contract Work etc. In-Reply-To: <4CCEF910.8010705@sanfransystems.com> References: <4CCEF910.8010705@sanfransystems.com> Message-ID: Although there's a lot of merit in what you say, it's not universal to sites like this. I found people who paid a lot more because they knew I lived within the silicon valley area and there's a perception of a lot higher quality. So, yes, some times -- maybe even most of the time -- you'll find customers like this. But, it's not always the case and it'd be silly to exclude the possibility of finding those clients. My Madrid customer is a perfect example. And, he originally came from Elance until we built up a working relationship and work outside of Elance. Cheers, Glen El Nov 1, 2010, a las 10:29 AM, Roderick Llewellyn escribi?: > I avoid sites like these like the plague. This is priceline.com for programmers, where the advantage is totally with the one offering the job. You're competing with the entire world for these jobs, so the whole game is to drive down your compensation to the lowest possible level on the planet. Do you really want to be paid the going wage in Bangalore? No problem if that reflects your cost of living (i.e., you live in Bangalore). But since this is BayPiggies, you probably live in the Bay Area. So you're paying the highest cost of living after Manhattan, and getting paid the lowest wages on Earth. That's a problem! Your client cares not that you live in the Bay Area. > > I agree with the caveat against fixed-price bids. These are VERY dangerous in programming. It's one thing to ask a construction contractor for a fixed-price bid on building a deck. There are not many unknowns there. In programming, once you have negotiated a fixed price, the client has a huge incentive to make endless change orders. Naturally you can refuse to take them, asking for an hourly rate on each one. But since a fixed-price bid usually means you get paid little or nothing until the job is done, if client is unsatisfied with your negotiating stance, he will probably not pay you at all. And generally forget the courts; they are so complicated, take so long, and are so advantageous to the side with more money and patience that any contract you sign is almost meaningless anyway. I have major experience here I assure you! So be wary of any contract longer than a month or two which won't pay you until completion. Of course, you could always arrange to meet under a bridge, you bringing your software, client bringing his money, and both taking no more than three armed guards.... I'm sure you've seen that movie too! > > If you take tiny jobs, like write an ascii-to-integer converter kind of thing, you will spend far more time looking for work, negotiating, phone calls, etc., than you will spend actually doing work. Since you will often not be willing to take the Bangalore-level wage that will be offered, you will not get or take most jobs. Look instead for longer-term contracts. Try to find something where you have unusual skills or abilities. If the job is to download a LAMP suite and get it running, writing 100 lines of glue code in the process, you're competing against every kid on the planet, because almost everybody can do that. It's not really even software engineering. If on the other hand you know how to optimize MySQL queries better than Joe the Plumber... oops I meant Programmer, you have a better chance. > > My two cents! - Rod L. > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies From akleider at sonic.net Thu Nov 4 03:13:02 2010 From: akleider at sonic.net (akleider at sonic.net) Date: Wed, 3 Nov 2010 19:13:02 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <1288150869.1875.4.camel@jim-laptop> References: <1288150869.1875.4.camel@jim-laptop> Message-ID: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> I've just discovered the the ConfigParser module converts it's dictionary key:value pairs all into lower case. Why? Is there a way to get the same functionality (another module perhaps) that respects case? alexK From itz at buug.org Thu Nov 4 04:28:02 2010 From: itz at buug.org (Ian Zimmerman) Date: Wed, 03 Nov 2010 20:28:02 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> (akleider@sonic.net's message of "Wed, 3 Nov 2010 19:13:02 -0700") References: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> Message-ID: <87y699hinx.fsf@matica.localdomain> >>>>> "akleider" == akleider writes: akleider> I've just discovered the the ConfigParser module converts it's akleider> dictionary key:value pairs all into lower case. Why? Is akleider> there a way to get the same functionality (another module akleider> perhaps) that respects case? Copied from the documentation: class ConfigParser.ConfigParser([defaults[, dict_type]]) Derived class of RawConfigParser that implements the magical interpolation feature and adds optional arguments to the get() and items() methods. The values in defaults must be appropriate for the %()s string interpolation. Note that __name__ is an intrinsic default; its value is the section name, and will override any value provided in defaults. All option names used in interpolation will be passed through the optionxform() method just like any other option name reference. For example, using the default implementation of optionxform() (which converts option names to lower case), the values foo %(bar)s and foo %(BAR)s are equivalent. So, you need to override the optionxform method either by subclassing or replacing it in the object instance. By the way, don't use the Reply All/Followup feature of your mailer unless that is what you're really doing. Address books are your friends. -- Ian Zimmerman gpg public key: 1024D/C6FF61AD fingerprint: 66DC D68F 5C1B 4D71 2EE5 BD03 8A00 786C C6FF 61AD Ham is for reading, not for eating. From rohan.talip at gmail.com Thu Nov 4 04:28:43 2010 From: rohan.talip at gmail.com (Rohan Talip) Date: Wed, 3 Nov 2010 20:28:43 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> References: <1288150869.1875.4.camel@jim-laptop> <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> Message-ID: It is possible to make ConfigParser case sensitive. Search for optionxform on http://docs.python.org/library/configparser.html i.e. cfgparser = ConfigParser() ... cfgparser.optionxform = str Rohan On Wed, Nov 3, 2010 at 7:13 PM, wrote: > I've just discovered the the ConfigParser module converts it's dictionary > key:value pairs all into lower case. > Why? > Is there a way to get the same functionality (another module perhaps) that > respects case? > > alexK > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brent.tubbs at gmail.com Thu Nov 4 04:44:21 2010 From: brent.tubbs at gmail.com (Brent Tubbs) Date: Wed, 3 Nov 2010 20:44:21 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> References: <1288150869.1875.4.camel@jim-laptop> <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> Message-ID: As to the "why": I suspect the case insensitivity is to preserve compatibility with often-case-insensitive .ini files. At least one .ini file spec (http://www.cloanto.com/specs/ini/) says that section and key names are case-insensitive. Not that I'm an expert on .ini files; your question just got me googling, which took me to http://en.wikipedia.org/wiki/INI_file, which led me to the spec. On Wed, Nov 3, 2010 at 7:13 PM, wrote: > I've just discovered the the ConfigParser module converts it's dictionary > key:value pairs all into lower case. > Why? > Is there a way to get the same functionality (another module perhaps) that > respects case? > > alexK > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From max at theslimmers.net Thu Nov 4 06:45:20 2010 From: max at theslimmers.net (Max Slimmer) Date: Wed, 3 Nov 2010 22:45:20 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> References: <1288150869.1875.4.camel@jim-laptop> <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> Message-ID: You already have the answer as to how to keep configparser case unchanged. You also asked about alternatives. I use ConfigObj, it keeps values (keys) in the order defined and allows nested definitions along with a few other friendly features. If you don't mind it not being in the standard distro, it is pretty cool. max On Wed, Nov 3, 2010 at 7:13 PM, wrote: > I've just discovered the the ConfigParser module converts it's dictionary > key:value pairs all into lower case. > Why? > Is there a way to get the same functionality (another module perhaps) that > respects case? > > alexK > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From akleider at sonic.net Sat Nov 6 00:00:03 2010 From: akleider at sonic.net (akleider at sonic.net) Date: Fri, 5 Nov 2010 16:00:03 -0700 Subject: [Baypiggies] ConfigParser In-Reply-To: <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> References: <1288150869.1875.4.camel@jim-laptop> <3afbf52910d35c8a157180514aac6bda.squirrel@webmail.sonic.net> Message-ID: <4d3ba20b019bdf707fb0b14c0188b4d8.squirrel@webmail.sonic.net> Thanks to Ian, Rohan, Brent, and Max for answering. Addition of the cfgparser.optionxform = str line solved my problem. > I've just discovered the the ConfigParser module converts it's dictionary > key:value pairs all into lower case. > Why? > Is there a way to get the same functionality (another module perhaps) that > respects case? > > alexK > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > From cappy2112 at gmail.com Thu Nov 11 20:06:32 2010 From: cappy2112 at gmail.com (Tony Cappellini) Date: Thu, 11 Nov 2010 11:06:32 -0800 Subject: [Baypiggies] Advancements in PyPy Message-ID: Other than the memory consumption issue, this is very encouraging. http://tinyurl.com/2flcuk9 From welch at quietplease.com Fri Nov 12 00:18:09 2010 From: welch at quietplease.com (will welch) Date: Thu, 11 Nov 2010 15:18:09 -0800 Subject: [Baypiggies] Job Posting: Riverbed Technologies, Engineering Operations Ninja (python/pylons/unix) Message-ID: Riverbed builds boxes that make your internet go fast. I'm on an internal team that is building data collection/ mining infrastructure to study how our appliances fare out in the wild -- how they are used, how they fail. We're archiving and mining streams of data from thousands of appliances every day. We need another set of hands. Much of this is built in python/pylons, and there are a few unixy and C++ bits laying around as well. One guy even writes excel macros (whatever it takes...). We're wanting someone who's smart, flexible, self-directed, and who probably doesn't fit well into an org chart. Here's the req: http://tinyurl.com/22pjryy -- Will Welch From jjinux at gmail.com Fri Nov 12 02:02:07 2010 From: jjinux at gmail.com (Shannon -jj Behrens) Date: Thu, 11 Nov 2010 17:02:07 -0800 Subject: [Baypiggies] jobs at Twilio Message-ID: Hey Guys, As I mentioned at the last meeting, I joined a company called Twilio. We make it easy for normal web developers to write voice and SMS enabled applications. If you don't know what I mean, try calling my app: (888) 877-7418. It's a lot of fun. (By the way, I built that app using TDD, and I didn't actually try calling it until it was mostly done.) Anyway, we're hiring, so if you want to come hang out with me, check us out. Jeff Lindsay, the SHDH house guy, is here too. DevOps Engineer Senior Software Engineer Core Team Software Engineering Leader, Organizer, Mentor Customer Advocate Developer Evangelist Product Manager We do use Python--a ton of it. We're in San Francisco. I'm not a recruiter ;) Contact "Joanna Samuels" if you're interested in applying. Here's are the actual job postings: http://www.twilio.com/company/jobs By the way, we just received our second round of funding, but we're also making lots of money with a real business model. I'm really enjoying myself here. Happy Hacking! -jj -- In this life we cannot do great things. We can only do small things with great love. -- Mother Teresa http://jjinux.blogspot.com/ From fperez.net at gmail.com Mon Nov 15 09:23:31 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Mon, 15 Nov 2010 00:23:31 -0800 Subject: [Baypiggies] [ANN:Py4Science@Cal] Talk November 17: soma-workflow: an interface to parallel computing resources Message-ID: Hi folks, we have a shifted-by-a-week talk, this time. *This coming Wednesday, November 17*, a visitor from the french NeuroSpin research institute will be presenting at our meeting. Please forward this to any interested colleagues. Cheers, f *Py4Science at Cal meeting* Speaker: Soizic Laguitton, CEA NeuroSpin, France Title: soma-workflow: an interface to parallel computing resources Abstract: The research led in NeuroSpin involves a high demand for computing resources. Various resources are available, such as desktops, clusters or computing centers. However, the use of high performance resources remains scarce because of the lack of unified and simple tool to access them. Within the BrainVISA project we are developing a Python interface to parallel computing resources. The application, named soma-workflow, connects to existing distributed resource management systems using implementations of DRMAA (Distributed Resource Management Application API). It provides a simple access to local or remote resources through a Python API or a graphical user interface. Soma-workflow was designed to manage data transfers and to cope with user disconnections. During the talk, I will explain the motivations and choices we made in the development of soma-workflow, I will also present an overview of its architecture, API and GUI. When: Wednesday, November 17, 2010, 2pm. Where: 508-20 Evans Hall (Redwood Center conference room) For more information: https://cirl.berkeley.edu/view/Py4Science/WebHome From jim at systemateka.com Mon Nov 15 22:30:10 2010 From: jim at systemateka.com (jim) Date: Mon, 15 Nov 2010 13:30:10 -0800 Subject: [Baypiggies] BayPIGgies meeting Thursday, November 18, 2010: Embedding Python as a Realtime Audio Scripting Engine Message-ID: <1289856610.1991.136.camel@jim-laptop> This meeting's talk is "Embedding Python as a Realtime Audio Scripting Engine" by Patrick Stinson Topics include * separation and communication between the application and the scripting engine * why Python is "safe" for audio work, including empirical performance metrics and caveats related to multithreaded processing as p performance requirements increase. I will share my experiences using the standard CPython implementation to research and develop a state-of-the-art scripting engine for the Play commercial sampling engine Speaker: Patrick Stinson Patrick Stinson has a BSc in Computer Science from the University of London and currently lives in the North Lake Tahoe area. He started out working with CPython and Zope/Plone in his home town of Anchorage, Alaska and has most recently developed the user interface and scripting engine for the Hollywood-Based "Play" music platform. Play is a commercial audio engine intended for building software musical instruments. It runs in popular audio plugin formats, and provides a scripting engine that allows studio musicians to create complex musical effects and sequencing behavior. It uses Qt for the GUI, juce for audio support, and python for the scripting engine. LINKS: Find more information here: http://www.soundsonline.com/ ......................................... Meetings usually start with a Newbie Nugget, a short discussion of an essential Python feature, especially for those new to Python. Tonight's Newbie Nugget: none. LOCATION Symantec Corporation Symantec Vcafe 350 Ellis Street Mountain View, CA 94043 http://maps.google.com/maps/ms?oe=utf-8&client=firefox-a&ie=UTF8&fb=1&split=1&gl=us&ei=w6i_Sfr6MZmQsQOzlv0v&hl=en&t=h&msa=0&msid=116202735295394761637.00046550c09ff3d96bff1&ll=37.397693,-122.053707&spn=0.002902,0.004828&z=18 BayPIGgies meeting information is available at http://www.baypiggies.net/ ------------------------ Agenda ------------------------ ..... 7:30 PM ........................... General hubbub, inventory end-of-meeting announcements, any first-minute announcements. ..... 7:35 PM to 7:35 PM ................ Tonight's Newbie Nugget: none. ..... 7:35 PM to 8:25 PM (or so) ................ The talk: Embedding Python as a Realtime Audio Scripting Engine ..... 8:25 PM to 8:55 PM (or so) ................ Questions and Answers ..... 8:55 PM to 9:30 PM (or so) ................ Mapping and Random Access Mapping is a rapid-fire audience announcement of issues, hiring, events, and other topics. Random Access follows people immediately to allow follow up on the announcements and other interests. From Patrick.Newman at Reardencommerce.com Tue Nov 16 01:56:13 2010 From: Patrick.Newman at Reardencommerce.com (Patrick Newman) Date: Tue, 16 Nov 2010 00:56:13 +0000 Subject: [Baypiggies] Job posting: Internal Applications @ Rearden Commerce Message-ID: <5F067080F466964C86001C94958183DA089F68@fccorpmail03.mygazoo.com> I'm leading the Internal Applications team at Rearden Commerce. We're currently in the midst of an exciting project, building our company's new application hosting & deployment platform, and I'm looking to add a senior engineer to our team. We use Python extensively in our projects, most notably on the web front end to the system (implemented in web.py) and the api that performs actions on each target machine. Here are the attributes I am valuing most highly in this search: * Great knowledge of Python 2.5 and 2.6 * Linux systems administration experience * Solid shell scripting abilities * Strong understanding of web design patterns * A mindset for testing More on the job and company is available on our job listing: https://careers-reardencommerce.icims.com/jobs/1517/job If you're interested you can reply to me or apply online. Thanks, Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From cappy2112 at gmail.com Tue Nov 16 22:35:05 2010 From: cappy2112 at gmail.com (Tony Cappellini) Date: Tue, 16 Nov 2010 13:35:05 -0800 Subject: [Baypiggies] Running large radio telescope software on top of PyPy and twisted Message-ID: This should be of interest to PyPy followers as well as the scientific python community http://tinyurl.com/33sznhy From kpguy1975 at gmail.com Wed Nov 17 21:40:21 2010 From: kpguy1975 at gmail.com (Vikram K) Date: Wed, 17 Nov 2010 15:40:21 -0500 Subject: [Baypiggies] reading files quickly and efficiently Message-ID: I need to work on a file whose size is around 6.5 GB. This file consists of a protein header information and then the corresponding protein sequence. Here are a few samples lines of this file: ----------- >gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] gi|116513137|ref|YP_812044.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris SK11] gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] gi|281492845|ref|YP_003354825.1| 50S ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein S18 gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein S18 gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 [Lactococcus lactis subsp. lactis Il1403] gi|116108791|gb|ABJ73931.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. cremoris SK11] gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris MG1363] gi|281376497|gb|ADA65983.1| SSU ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] gi|300072039|gb|ADJ61439.1| 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ N >gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4] gi|1705556|sp|P54670.1|CAF1_DICDI RecName: Full=Calfumirin-1; Short=CAF-1 gi|793761|dbj|BAA06266.1| calfumirin-1 [Dictyostelium discoideum] gi|60470106|gb|EAL68086.1| hypothetical protein DDB_G0277827 [Dictyostelium discoideum AX4] MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK VQKLLNPDQ >gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4] gi|60470987|gb|EAL68957.1| hypothetical protein DDB_G0276911 [Dictyostelium discoideum AX4] MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYE DFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKR ----------- My problem is that i need to filter this file so as to extract the relevant proteins that are of my interest based on some keywords to be applied on the header line. As a preliminary step, i wrote the following code to calculate the total number of lines in the file: f = open ('nr') count = 0 for i in f.readlines(): line = f.next().strip() count = count + 1 f.close() print count On running this program, i get the following error: Traceback (most recent call last): File "C:\Users\K\Downloads\nr\nr.py", line 34, in for i in f.readlines(): MemoryError A slightly modified version of the above program works fine for the first 10 or 100 or 1000 lines of the file nr: ---- Any suggestions on how i can work around this 'Memory Error' problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From recursive.cookie.jar at gmail.com Wed Nov 17 21:44:32 2010 From: recursive.cookie.jar at gmail.com (Zachary Collins) Date: Wed, 17 Nov 2010 15:44:32 -0500 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: Yes. The read lines function will load the whole file into memory before doing the line split. How about just using read with a small buffer size and incrementally counting newlines that way? 2010/11/17 Vikram K : > I need to work on a file whose size is around 6.5 GB.? This file consists of > a protein header information and then the corresponding protein sequence. > Here are a few samples lines of this file: > > ----------- >>gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus lactis >> subsp. lactis Il1403] gi|116513137|ref|YP_812044.1| 30S ribosomal protein >> S18 [Lactococcus lactis subsp. cremoris SK11] >> gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus >> lactis subsp. cremoris MG1363] gi|281492845|ref|YP_003354825.1| 50S >> ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] >> gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 >> gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein S18 >> gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein S18 >> gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 [Lactococcus >> lactis subsp. lactis Il1403] gi|116108791|gb|ABJ73931.1| SSU ribosomal >> protein S18P [Lactococcus lactis subsp. cremoris SK11] >> gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis >> subsp. cremoris MG1363] gi|281376497|gb|ADA65983.1| SSU ribosomal protein >> S18P [Lactococcus lactis subsp. lactis KF147] gi|300072039|gb|ADJ61439.1| >> 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] > MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ > N >>gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 >> [Dictyostelium discoideum AX4] gi|1705556|sp|P54670.1|CAF1_DICDI RecName: >> Full=Calfumirin-1; Short=CAF-1 gi|793761|dbj|BAA06266.1| calfumirin-1 >> [Dictyostelium discoideum] gi|60470106|gb|EAL68086.1| hypothetical protein >> DDB_G0277827 [Dictyostelium discoideum AX4] > MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY > KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK > VQKLLNPDQ >>gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 >> [Dictyostelium discoideum AX4] gi|60470987|gb|EAL68957.1| hypothetical >> protein DDB_G0276911 [Dictyostelium discoideum AX4] > MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYE > DFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKR > > ----------- > My problem is that i need to filter this file so as to extract the relevant > proteins that are of my interest based on some keywords to be applied on the > header line. As a preliminary step, i wrote the following code to calculate > the total number of lines in the file: > > f = open ('nr') > count = 0 > for i in f.readlines(): > ??? line = f.next().strip() > ??? count = count + 1 > f.close() > print count > > On running this program, i get the following error: > > Traceback (most recent call last): > ? File "C:\Users\K\Downloads\nr\nr.py", line 34, in > ??? for i in f.readlines(): > MemoryError > > A slightly modified version of the above program works fine for the first 10 > or 100 or 1000 lines of the file nr: > > > ---- > > Any suggestions on how i can work around this 'Memory Error' problem? > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From tungwaiyip at yahoo.com Wed Nov 17 21:54:42 2010 From: tungwaiyip at yahoo.com (Tung Wai Yip) Date: Wed, 17 Nov 2010 12:54:42 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: readlines() will read the entire file in memory. Use f directly as a iterator # not tested! f = open ('nr') count = 0 for line in f: count = count + 1 f.close() print count Wai Yip > I need to work on a file whose size is around 6.5 GB. This file > consists of > a protein header information and then the corresponding protein sequence. > Here are a few samples lines of this file: > > ----------- >> gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus >> lactis > subsp. lactis Il1403] gi|116513137|ref|YP_812044.1| 30S ribosomal protein > S18 [Lactococcus lactis subsp. cremoris SK11] > gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus > lactis subsp. cremoris MG1363] gi|281492845|ref|YP_003354825.1| 50S > ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] > gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein > S18 > gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein > S18 > gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein > S18 > gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 > [Lactococcus > lactis subsp. lactis Il1403] gi|116108791|gb|ABJ73931.1| SSU ribosomal > protein S18P [Lactococcus lactis subsp. cremoris SK11] > gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus > lactis > subsp. cremoris MG1363] gi|281376497|gb|ADA65983.1| SSU ribosomal protein > S18P [Lactococcus lactis subsp. lactis KF147] gi|300072039|gb|ADJ61439.1| > 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] > MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ > N >> gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 > [Dictyostelium discoideum AX4] gi|1705556|sp|P54670.1|CAF1_DICDI RecName: > Full=Calfumirin-1; Short=CAF-1 gi|793761|dbj|BAA06266.1| calfumirin-1 > [Dictyostelium discoideum] gi|60470106|gb|EAL68086.1| hypothetical > protein > DDB_G0277827 [Dictyostelium discoideum AX4] > MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY > KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK > VQKLLNPDQ >> gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 > [Dictyostelium discoideum AX4] gi|60470987|gb|EAL68957.1| hypothetical > protein DDB_G0276911 [Dictyostelium discoideum AX4] > MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYE > DFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKR > > ----------- > My problem is that i need to filter this file so as to extract the > relevant > proteins that are of my interest based on some keywords to be applied on > the > header line. As a preliminary step, i wrote the following code to > calculate > the total number of lines in the file: > > f = open ('nr') > count = 0 > for i in f.readlines(): > line = f.next().strip() > count = count + 1 > f.close() > print count > > On running this program, i get the following error: > > Traceback (most recent call last): > File "C:\Users\K\Downloads\nr\nr.py", line 34, in > for i in f.readlines(): > MemoryError > > A slightly modified version of the above program works fine for the > first 10 > or 100 or 1000 lines of the file nr: > > > ---- > > Any suggestions on how i can work around this 'Memory Error' problem? From alexandre.conrad at gmail.com Wed Nov 17 22:12:39 2010 From: alexandre.conrad at gmail.com (Alexandre Conrad) Date: Wed, 17 Nov 2010 13:12:39 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: 2010/11/17 Tung Wai Yip : > readlines() will read the entire file in memory. Use f directly as a > iterator I was about to suggest that as well. It should work as long as not everything is on one line. :) -- Alex | twitter.com/alexconrad From glen at glenjarvis.com Wed Nov 17 22:13:00 2010 From: glen at glenjarvis.com (Glen Jarvis) Date: Wed, 17 Nov 2010 13:13:00 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: BioPython also will do all of this for you -- too: >>> from Bio import SeqIO >>> record = SeqIO.read("NC_005816.fna", "fasta") >>> record SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', SingleLetterAlphabet()), id='gi|45478711|ref|NC_005816.1|', name='gi|45478711|ref|NC_005816.1|', description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... sequence', dbxrefs=[]) You can also look for particular fields (record.id, record.description, and record.sequence): Look at this tutorial: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16 Cheers, Glen On Wed, Nov 17, 2010 at 12:54 PM, Tung Wai Yip wrote: > readlines() will read the entire file in memory. Use f directly as a > iterator > > # not tested! > > f = open ('nr') > count = 0 > for line in f: > > count = count + 1 > f.close() > print count > > Wai Yip > > > > I need to work on a file whose size is around 6.5 GB. This file consists >> of >> a protein header information and then the corresponding protein sequence. >> Here are a few samples lines of this file: >> >> ----------- >> >>> gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus >>> lactis >>> >> subsp. lactis Il1403] gi|116513137|ref|YP_812044.1| 30S ribosomal protein >> S18 [Lactococcus lactis subsp. cremoris SK11] >> gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus >> lactis subsp. cremoris MG1363] gi|281492845|ref|YP_003354825.1| 50S >> ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] >> gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 >> gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein >> S18 >> gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein >> S18 >> gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 >> [Lactococcus >> lactis subsp. lactis Il1403] gi|116108791|gb|ABJ73931.1| SSU ribosomal >> protein S18P [Lactococcus lactis subsp. cremoris SK11] >> gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis >> subsp. cremoris MG1363] gi|281376497|gb|ADA65983.1| SSU ribosomal protein >> S18P [Lactococcus lactis subsp. lactis KF147] gi|300072039|gb|ADJ61439.1| >> 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] >> >> MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ >> N >> >>> gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 >>> >> [Dictyostelium discoideum AX4] gi|1705556|sp|P54670.1|CAF1_DICDI RecName: >> Full=Calfumirin-1; Short=CAF-1 gi|793761|dbj|BAA06266.1| calfumirin-1 >> [Dictyostelium discoideum] gi|60470106|gb|EAL68086.1| hypothetical protein >> DDB_G0277827 [Dictyostelium discoideum AX4] >> >> MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY >> >> KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK >> VQKLLNPDQ >> >>> gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 >>> >> [Dictyostelium discoideum AX4] gi|60470987|gb|EAL68957.1| hypothetical >> protein DDB_G0276911 [Dictyostelium discoideum AX4] >> >> MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYE >> >> DFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKR >> >> ----------- >> My problem is that i need to filter this file so as to extract the >> relevant >> proteins that are of my interest based on some keywords to be applied on >> the >> header line. As a preliminary step, i wrote the following code to >> calculate >> the total number of lines in the file: >> >> f = open ('nr') >> count = 0 >> for i in f.readlines(): >> line = f.next().strip() >> count = count + 1 >> f.close() >> print count >> >> On running this program, i get the following error: >> >> Traceback (most recent call last): >> File "C:\Users\K\Downloads\nr\nr.py", line 34, in >> for i in f.readlines(): >> MemoryError >> >> A slightly modified version of the above program works fine for the first >> 10 >> or 100 or 1000 lines of the file nr: >> >> >> ---- >> >> Any suggestions on how i can work around this 'Memory Error' problem? >> > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -- Whatever you can do or imagine, begin it; boldness has beauty, magic, and power in it. -- Goethe -------------- next part -------------- An HTML attachment was scrubbed... URL: From tony at tcapp.com Wed Nov 17 22:18:34 2010 From: tony at tcapp.com (Tony Cappellini) Date: Wed, 17 Nov 2010 13:18:34 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: Don't read the entire file into memory. readlines() does that. Take a look at Dave Beazely's slides on generators and how he processes multi-GB sized files. http://www.dabeaz.com/generators/ On Wed, Nov 17, 2010 at 12:40 PM, Vikram K wrote: > I need to work on a file whose size is around 6.5 GB.? This file consists of > a protein header information and then the corresponding protein sequence. > Here are a few samples lines of this file: > > ----------- >>gi|15674171|ref|NP_268346.1| 30S ribosomal protein S18 [Lactococcus lactis >> subsp. lactis Il1403] gi|116513137|ref|YP_812044.1| 30S ribosomal protein >> S18 [Lactococcus lactis subsp. cremoris SK11] >> gi|125625229|ref|YP_001033712.1| 30S ribosomal protein S18 [Lactococcus >> lactis subsp. cremoris MG1363] gi|281492845|ref|YP_003354825.1| 50S >> ribosomal protein S18P [Lactococcus lactis subsp. lactis KF147] >> gi|13878750|sp|Q9CDN0.1|RS18_LACLA RecName: Full=30S ribosomal protein S18 >> gi|122939895|sp|Q02VU1.1|RS18_LACLS RecName: Full=30S ribosomal protein S18 >> gi|166220956|sp|A2RNZ2.1|RS18_LACLM RecName: Full=30S ribosomal protein S18 >> gi|12725253|gb|AAK06287.1|AE006448_5 30S ribosomal protein S18 [Lactococcus >> lactis subsp. lactis Il1403] gi|116108791|gb|ABJ73931.1| SSU ribosomal >> protein S18P [Lactococcus lactis subsp. cremoris SK11] >> gi|124494037|emb|CAL99037.1| 30S ribosomal protein S18 [Lactococcus lactis >> subsp. cremoris MG1363] gi|281376497|gb|ADA65983.1| SSU ribosomal protein >> S18P [Lactococcus lactis subsp. lactis KF147] gi|300072039|gb|ADJ61439.1| >> 30S ribosomal protein S18 [Lactococcus lactis subsp. cremoris NZ9000] > MAQQRRGGFKRRKKVDFIAANKIEVVDYKDTELLKRFISERGKILPRRVTGTSAKNQRKVVNAIKRARVMALLPFVAEDQ > N >>gi|66816243|ref|XP_642131.1| hypothetical protein DDB_G0277827 >> [Dictyostelium discoideum AX4] gi|1705556|sp|P54670.1|CAF1_DICDI RecName: >> Full=Calfumirin-1; Short=CAF-1 gi|793761|dbj|BAA06266.1| calfumirin-1 >> [Dictyostelium discoideum] gi|60470106|gb|EAL68086.1| hypothetical protein >> DDB_G0277827 [Dictyostelium discoideum AX4] > MASTQNIVEEVQKMLDTYDTNKDGEITKAEAVEYFKGKKAFNPERSAIYLFQVYDKDNDGKITIKELAGDIDFDKALKEY > KEKQAKSKQQEAEVEEDIEAFILRHNKDDNTDITKDELIQGFKETGAKDPEKSANFILTEMDTNKDGTITVKELRVYYQK > VQKLLNPDQ >>gi|66818355|ref|XP_642837.1| hypothetical protein DDB_G0276911 >> [Dictyostelium discoideum AX4] gi|60470987|gb|EAL68957.1| hypothetical >> protein DDB_G0276911 [Dictyostelium discoideum AX4] > MKTKSSNNIKKIYYISSILVGIYLCWQIIIQIIFLMDNSIAILEAIGMVVFISVYSLAVAINGWILVGRMKKSSKKAQYE > DFYKKMILKSKILLSTIIIVIIVVVVQDIVINFILPQNPQPYVYMIISNFIVGIADSFQMIMVIFVMGELSFKNYFKFKR > > ----------- > My problem is that i need to filter this file so as to extract the relevant > proteins that are of my interest based on some keywords to be applied on the > header line. As a preliminary step, i wrote the following code to calculate > the total number of lines in the file: > > f = open ('nr') > count = 0 > for i in f.readlines(): > ??? line = f.next().strip() > ??? count = count + 1 > f.close() > print count > > On running this program, i get the following error: > > Traceback (most recent call last): > ? File "C:\Users\K\Downloads\nr\nr.py", line 34, in > ??? for i in f.readlines(): > MemoryError > > A slightly modified version of the above program works fine for the first 10 > or 100 or 1000 lines of the file nr: > > > ---- > > Any suggestions on how i can work around this 'Memory Error' problem? > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > From bpederse at gmail.com Wed Nov 17 22:24:32 2010 From: bpederse at gmail.com (Brent Pedersen) Date: Wed, 17 Nov 2010 13:24:32 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: On Wed, Nov 17, 2010 at 1:13 PM, Glen Jarvis wrote: > BioPython also will do all of this for you -- too: >>>> from Bio import SeqIO > >>>> record = SeqIO.read("NC_005816.fna", "fasta") >>>> record > SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', > SingleLetterAlphabet()), id='gi|45478711|ref|NC_005816.1|', > name='gi|45478711|ref|NC_005816.1|', > description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus > ... sequence', > dbxrefs=[]) > > You can also look for particular fields (record.id, record.description, and > record.sequence): > > Look at this tutorial: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16 > > Cheers, > > Glen i agree with glen that you should use a library. however, that example is for a single-entry fasta file. if you want random access to a multi-fasta, use the SeqIO.index in biopython: http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc56 if you just want an iterator, use SeqIO.parse http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 -brent From kpguy1975 at gmail.com Wed Nov 17 22:28:04 2010 From: kpguy1975 at gmail.com (Vikram K) Date: Wed, 17 Nov 2010 16:28:04 -0500 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: My problem is solved thanks to you guys. Thank you so much. On Wed, Nov 17, 2010 at 3:40 PM, Vikram K wrote: > I need to work on a file whose size is around 6.5 GB. This file consists > of a protein header information and then the corresponding protein sequence. > Here are a few samples lines of this file: > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From glen at glenjarvis.com Wed Nov 17 22:35:02 2010 From: glen at glenjarvis.com (Glen Jarvis) Date: Wed, 17 Nov 2010 13:35:02 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: Oops.. I meant SeqIO.. Thanks Brent!!! I was doing that quickly... Cheers, Glen On Wed, Nov 17, 2010 at 1:24 PM, Brent Pedersen wrote: > On Wed, Nov 17, 2010 at 1:13 PM, Glen Jarvis wrote: > > BioPython also will do all of this for you -- too: > >>>> from Bio import SeqIO > > > >>>> record = SeqIO.read("NC_005816.fna", "fasta") > >>>> record > > > SeqRecord(seq=Seq('TGTAACGAACGGTGCAATAGTGATCCACACCCAACGCCTGAAATCAGATCCAGG...CTG', > > SingleLetterAlphabet()), id='gi|45478711|ref|NC_005816.1|', > > name='gi|45478711|ref|NC_005816.1|', > > description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus > > ... sequence', > > dbxrefs=[]) > > > > You can also look for particular fields (record.id, record.description, > and > > record.sequence): > > > > Look at this tutorial: > > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc16 > > > > Cheers, > > > > Glen > > i agree with glen that you should use a library. however, that example > is for a single-entry fasta file. if you want random access to a > multi-fasta, use the SeqIO.index in biopython: > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc56 > > if you just want an iterator, use SeqIO.parse > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc11 > > -brent > -- Whatever you can do or imagine, begin it; boldness has beauty, magic, and power in it. -- Goethe -------------- next part -------------- An HTML attachment was scrubbed... URL: From mvoorhie at yahoo.com Wed Nov 17 22:30:08 2010 From: mvoorhie at yahoo.com (Mark Voorhies) Date: Wed, 17 Nov 2010 13:30:08 -0800 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: References: Message-ID: <201011171330.08869.mvoorhie@yahoo.com> On Wednesday, November 17, 2010 01:18:34 pm Tony Cappellini wrote: > Don't read the entire file into memory. > > readlines() does that. > > Take a look at Dave Beazely's slides on generators and how he > processes multi-GB sized files. > http://www.dabeaz.com/generators/ > For NR, it can also be convenient to convert the FASTA to BLAST database format (via formatdb or downloading the pre-generated databases from NCBI) and extract sequences with fastacmd (formatdb and fastacmd are both included in the NCBI BLAST package). --Mark From wescpy at gmail.com Thu Nov 18 03:56:30 2010 From: wescpy at gmail.com (wesley chun) Date: Wed, 17 Nov 2010 21:56:30 -0500 Subject: [Baypiggies] reading files quickly and efficiently In-Reply-To: <201011171330.08869.mvoorhie@yahoo.com> References: <201011171330.08869.mvoorhie@yahoo.com> Message-ID: ok, so i think you all got the thing working for the OP using other libraries. so as we don't lose this opportunity to learn about various ways to improve our Python code, let's take a look at the existing source and see if we can further improve on it and/or make it more "Pythonic:" ORIGINAL: f = open ('nr') count = 0 for i in f.readlines(): line = f.next().strip() count = count + 1 f.close() print count 1. i know i'm a stickler (hey, *some*one's gotta do it), but i like to put the 'r' in my open() statements. call it a habit, but it is what it is. 2. you don't need a counter when you can sum() up the total number of lines merely by iterating through it. i'll use a generator expression to save memory. 3. strip()ping the leading and trailing whitespace adds no value to your counting and makes things run slower and uses more CPU, so remove that. 4. file.next() contradicts f.readlines() but is the right idea however it's unnecessary as the for-loop automagically calls next() on your behalf 5. f.readlines() reads in all the lines of a file, so you don't want to do that as it will chew up all your memory. instead, iterate through the file. given the above, i think you can do everything you want in 3 lines of readable Python code: MODIFIED: f = open ('nr', 'r') print sum(1 for line in f) f.close() can anyone else improve on this? yeah, if you're wasteful and don't properly close the file, you can reduce this down to a single line (less desired/not recommended). since this is so bad, you might as well remove the 'r' also: print sum(1 for line in open('nr')) # very bad... don't try this at home (or work!) (this progression is pretty much what you would do if you were playing "code golf" where engineers at a former employer of mine would try to have the lowest character count.) hope this helps! -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "Core Python Programming", Prentice Hall, (c)2007,2001 "Python Fundamentals", Prentice Hall, (c)2009 ? ? http://corepython.com "Python Web Development with Django", Addison Wesley, (c) 2009 ? ? http://withdjango.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com From nad at acm.org Thu Nov 18 04:43:39 2010 From: nad at acm.org (Ned Deily) Date: Wed, 17 Nov 2010 19:43:39 -0800 Subject: [Baypiggies] reading files quickly and efficiently References: <201011171330.08869.mvoorhie@yahoo.com> Message-ID: In article , wesley chun wrote: > MODIFIED: > f = open ('nr', 'r') > print sum(1 for line in f) > f.close() > > can anyone else improve on this? with open('nr', 'r') as f: print(sum(1 for line in f)) should work on any Python from 2.6 to 3.2, and 2.5 with from __future__ import with_statement -- Ned Deily, nad at acm.org From kpguy1975 at gmail.com Thu Nov 18 20:06:37 2010 From: kpguy1975 at gmail.com (Vikram K) Date: Thu, 18 Nov 2010 14:06:37 -0500 Subject: [Baypiggies] urllib2 query Message-ID: consider the following ftp site: ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ I wish to write a program using urllib2 which can automatically download all the contents of one of the folders in this site (say, for example, mRNA_Prot) into a folder in my computer. Subsequently i want my program to be able to automatically update the downloaded folder in my computer in accordance with any changes made in future in the folder in the ftp site. Any suggestions on how i should proceed? I am using a windows 7 computer. Will i also have to use the scheduled tasks option in system tools for solving this problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stephen.cattaneo at gmail.com Thu Nov 18 20:09:28 2010 From: stephen.cattaneo at gmail.com (Stephen Cattaneo) Date: Thu, 18 Nov 2010 11:09:28 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: Does it need to be urllib2? I would suggest pexpect for doing this. -Steve On Thu, Nov 18, 2010 at 11:06 AM, Vikram K wrote: > consider the following ftp site: > ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ > > I wish to write a program using urllib2 which can automatically download all > the contents of one of the folders in this site (say, for example, > mRNA_Prot) into a folder in my computer. Subsequently i want my program to > be able to automatically update the downloaded folder in my computer in > accordance with any changes made in future in the folder in the ftp site. > Any suggestions on how i should proceed? > I am using a windows 7 computer. Will i also have to use the scheduled tasks > option in system tools for solving this problem? > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -- --- Failures are finger posts on the road to achievement. ?-- C.S. Lewis From hyperneato at gmail.com Thu Nov 18 20:18:54 2010 From: hyperneato at gmail.com (Isaac) Date: Thu, 18 Nov 2010 11:18:54 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: Setup a caching proxy and acquire the last-modified header from the http proxy server. Then, here is a demonstration of a conditional GET request: http://www.artima.com/forums/flat.jsp?forum=122&thread=15024 download the file from the proxy if the last-modified header is newer than yours on disk. HTH On Thu, Nov 18, 2010 at 11:06 AM, Vikram K wrote: > consider the following ftp site: > ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ > > I wish to write a program using urllib2 which can automatically download > all the contents of one of the folders in this site (say, for example, > mRNA_Prot) into a folder in my computer. Subsequently i want my program to > be able to automatically update the downloaded folder in my computer in > accordance with any changes made in future in the folder in the ftp site. > Any suggestions on how i should proceed? > I am using a windows 7 computer. Will i also have to use the scheduled > tasks option in system tools for solving this problem? > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason at mischievous.org Thu Nov 18 20:21:24 2010 From: jason at mischievous.org (Jason Culverhouse) Date: Thu, 18 Nov 2010 11:21:24 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: Vikram, I'm sure you just want to download the files then process them later. You can just use wget http://www.gnu.org/software/wget/wget.html it already has the support for timestamp checking etc. wget --mirror ftp://ftp.ncbi.nih.gov/refseq/H_sapiens -o /my/local/mirror You can just schedule that to run weekly? Jason On Nov 18, 2010, at 11:18 AM, Isaac wrote: > Setup a caching proxy and acquire the last-modified header from the http proxy server. Then, here is a demonstration of a conditional GET request: > > http://www.artima.com/forums/flat.jsp?forum=122&thread=15024 > > download the file from the proxy if the last-modified header is newer than yours on disk. > > HTH > > > On Thu, Nov 18, 2010 at 11:06 AM, Vikram K wrote: > consider the following ftp site: > ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ > > I wish to write a program using urllib2 which can automatically download all the contents of one of the folders in this site (say, for example, mRNA_Prot) into a folder in my computer. Subsequently i want my program to be able to automatically update the downloaded folder in my computer in accordance with any changes made in future in the folder in the ftp site. Any suggestions on how i should proceed? > I am using a windows 7 computer. Will i also have to use the scheduled tasks option in system tools for solving this problem? > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -------------- next part -------------- An HTML attachment was scrubbed... URL: From fred at kas-group.com Thu Nov 18 20:14:53 2010 From: fred at kas-group.com (Fred C) Date: Thu, 18 Nov 2010 11:14:53 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: <1EC31DF4-A6D3-4E24-A641-0C4654A2BC58@kas-group.com> Sorry Stephen pycurl is way more appropriate for this than pexpect. http://pypi.python.org/pypi/pycurl/7.18.1 -fred- On Nov 18, 2010, at 11:09 AM, Stephen Cattaneo wrote: > Does it need to be urllib2? I would suggest pexpect for doing this. > > -Steve > > On Thu, Nov 18, 2010 at 11:06 AM, Vikram K wrote: >> consider the following ftp site: >> ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ >> >> I wish to write a program using urllib2 which can automatically download all >> the contents of one of the folders in this site (say, for example, >> mRNA_Prot) into a folder in my computer. Subsequently i want my program to >> be able to automatically update the downloaded folder in my computer in >> accordance with any changes made in future in the folder in the ftp site. >> Any suggestions on how i should proceed? >> I am using a windows 7 computer. Will i also have to use the scheduled tasks >> option in system tools for solving this problem? >> >> _______________________________________________ >> Baypiggies mailing list >> Baypiggies at python.org >> To change your subscription options or unsubscribe: >> http://mail.python.org/mailman/listinfo/baypiggies >> > > > > -- > --- > Failures are finger posts on the road to achievement. > > -- C.S. Lewis > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -- Writing Java code is like using a sledgehammer to hang a picture frame. Fred C. fred at bsdhost.net http://kiq.me/JP From cappy2112 at gmail.com Thu Nov 18 22:34:21 2010 From: cappy2112 at gmail.com (Tony Cappellini) Date: Thu, 18 Nov 2010 13:34:21 -0800 Subject: [Baypiggies] Does anyone want to review "MySQL for Python" Message-ID: The publisher has contacted me to find a reviewer for "MySQL for Python" https://www.packtpub.com/mysql-for-python-database-access-made-easy/book If you're interested in reviewing this book, please email me privately. Thanks From glen at glenjarvis.com Fri Nov 19 02:51:46 2010 From: glen at glenjarvis.com (Glen Jarvis) Date: Thu, 18 Nov 2010 17:51:46 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: Vikrim, Although there don't seem to be a shortage of alternative (even some non-python suggestions), I want yo throw one more out there. We do this automatic downloading and updating on bioinformatics data a lot. We started using rsync for pdb data from their FTP site and it has made a world of difference. We can update more often and each update takes less resources -- and finishes very quickly. Cheers, Glen El Nov 18, 2010, a las 11:06 AM, Vikram K escribi?: > consider the following ftp site: > ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ > > I wish to write a program using urllib2 which can automatically download all the contents of one of the folders in this site (say, for example, mRNA_Prot) into a folder in my computer. Subsequently i want my program to be able to automatically update the downloaded folder in my computer in accordance with any changes made in future in the folder in the ftp site. Any suggestions on how i should proceed? > I am using a windows 7 computer. Will i also have to use the scheduled tasks option in system tools for solving this problem? > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at systemateka.com Fri Nov 19 03:41:51 2010 From: jim at systemateka.com (jim) Date: Thu, 18 Nov 2010 18:41:51 -0800 Subject: [Baypiggies] jim's in san francisco Message-ID: <1290134511.1969.1.camel@jim-laptop> every street i took was filled with cars, the radio said the freeways were clogged up. i went home. enjoy the talk. From damonmc at gmail.com Fri Nov 19 04:02:20 2010 From: damonmc at gmail.com (Damon McCormick) Date: Thu, 18 Nov 2010 19:02:20 -0800 Subject: [Baypiggies] urllib2 query In-Reply-To: References: Message-ID: Vikram, Glen's suggestion would definitely be the way to go. You wouldn't want to use ftp if they provide an rsync server that mirrors the contents of the ftp server. With rsync you get efficient, compressed transfers of only the files that have changed (and only the *parts* of those files that have changed), automatically, and without relying on time stamps. -Damon On Thu, Nov 18, 2010 at 5:51 PM, Glen Jarvis wrote: > Vikrim, > > Although there don't seem to be a shortage of alternative (even some > non-python suggestions), I want yo throw one more out there. > > We do this automatic downloading and updating on bioinformatics data a > lot. We started using rsync for pdb data from their FTP site and it has made > a world of difference. We can update more often and each update takes less > resources -- and finishes very quickly. > > Cheers, > > > > Glen > > El Nov 18, 2010, a las 11:06 AM, Vikram K escribi?: > > consider the following ftp site: > > ftp://ftp.ncbi.nih.gov/refseq/H_sapiens/ > > I wish to write a program using urllib2 which can automatically download > all the contents of one of the folders in this site (say, for example, > mRNA_Prot) into a folder in my computer. Subsequently i want my program to > be able to automatically update the downloaded folder in my computer in > accordance with any changes made in future in the folder in the ftp site. > Any suggestions on how i should proceed? > I am using a windows 7 computer. Will i also have to use the scheduled > tasks option in system tools for solving this problem? > > _______________________________________________ > > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > > > _______________________________________________ > Baypiggies mailing list > Baypiggies at python.org > To change your subscription options or unsubscribe: > http://mail.python.org/mailman/listinfo/baypiggies > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peteballotta at gmail.com Sat Nov 20 03:23:02 2010 From: peteballotta at gmail.com (Pete Ballotta) Date: Fri, 19 Nov 2010 18:23:02 -0800 Subject: [Baypiggies] Python jobs at MTV Networks in SF Message-ID: Calling all Python engineers, We're aggressively hiring for multiple full-time positions with our social applications group here at MTV Networks in San Francisco. Our offices include a diverse group of engineering teams including the creative minds behind Shockwave, AddictingGames, Nickelodeon's platform team, and our new social games team that develops exclusively in Python and AS3. We offer great benefits, paid vacation, 401k, company parties, stocked kitchens, and lightning quick Macbook Pro's. Web Applications Developer Are you an Engineer with solid skills in Python and experience building highly scalable web applications? Have you worked in a highly dynamic iterative development team to do this work? What you get to do every day: ? Develop new technology for high availability, high demand consumer facing web applications that are changing the face of entertainment and gaming ? Translate technical requirements into specifications by working closely with Project Managers as you create new features and functionality for our products ? Own and be accountable for multiple development projects in a dynamic iterative release cycle ? Define updates, issues and communicate same to key stakeholders in the process ? Manage the quality of your code using standard coding guidelines and version control to track changes ? Manage resource allocation to meet deadlines and shifting priorities What you bring to the role in your professional toolkit: ? BS in computer science or equivalent required and professional experience doing hands-on development with object-oriented methodologies ? 3+ years of work in a production environment with consumer facing web development using Python on Linux/Unix platform with Django and Pylon frameworks ? Solid understanding of Java 1.5 features to include Collections, Generics, MVC Frameworks ? Previous work with browser-based technology to include JavaScript, CSS, DOM, HTML and front-end J2EE technologies - JSP, JSTL and Taglibs ? Well-versed in the Semantic Web and metadata ? Experience developing applications that are applicable to mobile devices a plus ? Solid skills with dynamic scripting languages and cross browser compatibility testing ? Experience with source control tools ? Subversion, ideal code practices, optimizing for scale and performance and the latest in web software architecture ? Use of Hibernate, SQL, and with relational database architecture ? Experience writing thread-safe code ? Experience with gaming or social networking applications ? A creative approach to problem solving in a dynamic team-oriented environment with limited supervision To apply and/or learn more about our team, please send your resume to pete.ballotta at mtvnmix.com From lsivitz at vonchurch.com Mon Nov 22 22:39:28 2010 From: lsivitz at vonchurch.com (Lumen Sivitz) Date: Mon, 22 Nov 2010 13:39:28 -0800 Subject: [Baypiggies] Python in Back End development for GAMES Message-ID: Hello Baypiggies! I was referred to this list by one of your own-- seems like an ideal place to spread the word. I work for VonChurch (you can find more information @ www.vonchurch.com). I am currently working with a client looking for someone for the following role: Backend/Infrastructure Engineer: Primary Skill-Sets: Python, HTML5, Django, Linux Server (Ubuntu), Amazon EC2 (Amazon's Cloud Computing Service) w/autoscaling, Amazon S3 (Simple Storage Service), memcached, mySQL This person is: -Lead on Infrastructure & Scalability -Server side logic for game -50% Python Development -30% Linux/Shell Scripting -20% High Level Design The company is located in San Francisco. If you're interested, I'd love to hear from you to discuss the position in more depth! Thanks, -- *Lumen Sivitz* *Junior Partner* *VonChurch**, Inc.* *Phone: (415)229-7699 **LSivitz at VonChurch.com** * *www.VonChurch.com* *Work Hard | Play Harder* -------------- next part -------------- An HTML attachment was scrubbed... URL: From bdbaddog at gmail.com Tue Nov 23 21:06:43 2010 From: bdbaddog at gmail.com (William Deegan) Date: Tue, 23 Nov 2010 12:06:43 -0800 Subject: [Baypiggies] Suggestions on some python client/server development. Message-ID: Greetings, I need to replace an existing c/c++ system which handles requests for constrained resources. Currently there's a server which reads a list of tokens and # available. token_a 1 token_b 55 token_c 9 There's a client which waits until a token of the requested token type is available and then runs the command line provided: wait_for_token -token token_a -maxwait 50 /bin/echo "Got the token" The user can specify the token name, the maximum amount of time to wait for the token, and the command to run. The token is not released until the command line specified completes. It is possible that the wait_for_token could be killed via control-c or a SIGKILL, in which case the token still must be released in some fashion. Currently the client holds a socket open to the server, so the token is released when the process is killed because the socket gets dropped on the client side. I'd like to add (among other things) a web based, or minimally a html page written by the server with the current list of who has each token, and who's waiting, and what time the started waiting and obtained the token. I'm thinking that perhaps a server (and maybe client) based on twisted might make sense? Or perhaps some web app framework? (and REST?) It needs to be able to handle up to 200 requests per second, but most likely < 10-20 request per second. The commands typically run for minutes so sub-second response times are not necessary. I've not done anything like this in python yet, so I figured I'd float it out to the group for some suggestions on where to start and/or which technologies/packages might be useful to avoid just replicating exactly the c++ code.. Thanks, Bill From keith at dartworks.biz Tue Nov 23 21:44:29 2010 From: keith at dartworks.biz (Keith Dart) Date: Tue, 23 Nov 2010 12:44:29 -0800 Subject: [Baypiggies] Suggestions on some python client/server development. In-Reply-To: References: Message-ID: <20101123124429.4ef35186@dartworks.biz> === On Tue, 11/23, William Deegan wrote: === > I've not done anything like this in python yet, so I figured I'd float > it out to the group for some suggestions on where to start and/or > which technologies/packages might be useful to avoid just replicating > exactly the c++ code.. === Well, there are many options. But the way I'd probably do it is use Javascript XHR on the browser side to fetch the data from the server. The server for that data just serves JSON serialized data so it's lightweight and fast. One example of this is here: http://mochi.github.com/mochikit/examples/ajax_tables/index.html The backend can be twisted, with simplejson. I also have a proxy object already written that makes the JSON serialization transparent. Server side: http://code.google.com/p/pycopia/source/browse/trunk/WWW/pycopia/WWW/json.py Client side: http://code.google.com/p/pycopia/source/browse/trunk/WWW/media/js/proxy.js It's part of the pycopia-WWW web application framework. -- Keith Dart -- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Keith Dart public key: ID: 19017044 ===================================================================== From bdbaddog at gmail.com Tue Nov 23 21:59:50 2010 From: bdbaddog at gmail.com (William Deegan) Date: Tue, 23 Nov 2010 12:59:50 -0800 Subject: [Baypiggies] Suggestions on some python client/server development. In-Reply-To: <20101123124429.4ef35186@dartworks.biz> References: <20101123124429.4ef35186@dartworks.biz> Message-ID: Keith, I'm not sure you groked this from my original message. The client is not a browser, but a command line tool. -Bill On Tue, Nov 23, 2010 at 12:44 PM, Keith Dart wrote: > === On Tue, 11/23, William Deegan wrote: === >> I've not done anything like this in python yet, so I figured I'd float >> it out to the group for some suggestions on where to start and/or >> which technologies/packages might be useful to avoid just replicating >> exactly the c++ code.. > > === > > Well, there are many options. But the way I'd probably do it is use > Javascript XHR on the browser side to fetch the data from the server. > The server for that data just serves JSON serialized data so it's > lightweight and fast. > > One example of this is here: > > http://mochi.github.com/mochikit/examples/ajax_tables/index.html > > The backend can be twisted, with simplejson. > > I also have a proxy object already written that makes the JSON > serialization transparent. > > Server side: > http://code.google.com/p/pycopia/source/browse/trunk/WWW/pycopia/WWW/json.py > > Client side: > http://code.google.com/p/pycopia/source/browse/trunk/WWW/media/js/proxy.js > > It's part of the pycopia-WWW web application framework. > > > > -- Keith Dart > > -- > > -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > ? Keith Dart > ? public key: ID: 19017044 > ? > ? ===================================================================== > From keith at dartworks.biz Tue Nov 23 22:15:25 2010 From: keith at dartworks.biz (Keith Dart) Date: Tue, 23 Nov 2010 13:15:25 -0800 Subject: [Baypiggies] Suggestions on some python client/server development. In-Reply-To: References: <20101123124429.4ef35186@dartworks.biz> Message-ID: <20101123131525.4e5a1118@dartworks.biz> === On Tue, 11/23, William Deegan wrote: === > I'm not sure you groked this from my original message. > The client is not a browser, but a command line tool. === Oh, it seemed to me you wanted a browser interface to view server status. In that case, it's almost the same thing, just substitute Python for client side. You might also take a look at Pyro: http://www.xs4all.nl/~irmen/pyro3/ It's for python-python RPC. -- Keith Dart -- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Keith Dart public key: ID: 19017044 ===================================================================== From rachel at chomp.com Mon Nov 29 22:16:53 2010 From: rachel at chomp.com (Rachel Morrissey) Date: Mon, 29 Nov 2010 13:16:53 -0800 Subject: [Baypiggies] Job Opportunity @ Chomp! Message-ID: Hi All, My name is Rachel and I am a team member at Chomp (not a recruiter! :) ). We are currently looking for a Python engineer to take over the ownership of our crawler development. I'd love to chat with you further about what we're working on if you're interested. I can be reached at rachel at chomp.com. Thanks! *Job Description* *Responsibilities:* - owning design and development of the Chomp crawler - ensuring timeliness of crawled information - extending the crawler to other app platforms - ability to work in areas outside of your usual comfort zone and get things done quickly *Requirements:* - lots of experience writing raw Python code - experience with task-queues and multi-processor applications - regex wizard - experience with building a web crawler highly desirable - will consider contract or full time, telecommute is ok A little more about Chomp: Chomp is an app discovery product that helps users find the best mobile apps. It combines a unique search engine that lets you search for apps based on what they do rather than what they're called, with intelligent recommendations based on your browsing and purchase history. We're solving some really challenging problems in information retrieval, natural language processing and machine learning. Apple is currently seeing more than 1 billion apps being downloaded on iPhone & iPod touch every 60 days. As a result, we think that what we are building is going to be an incredibly important product for the future of app discovery not just on iPhone but on all app platforms that matter. If you want to try Chomp out, please download it from the iPhone app store. Chomp is backed by top tier Silicon Valley VC's and has assembled a killer team coming from companies such as Google, Yahoo, Yelp and Admob. -- Rachel C. Morrissey Office Manager @ Chomp! 138 Tenth St, SF CA 94103 650 799 0598 facebook.com/chomp twitter.com/chomp -------------- next part -------------- An HTML attachment was scrubbed... URL: From damonmc at gmail.com Tue Nov 30 07:18:30 2010 From: damonmc at gmail.com (Damon McCormick) Date: Tue, 30 Nov 2010 07:18:30 +0100 Subject: [Baypiggies] Advancements in PyPy In-Reply-To: References: Message-ID: Looks like the PyPy team has addressed a large part of their memory consumption issue now: http://morepypy.blogspot.com/2010/11/improving-memory-behaviour-to-make-self.html -Damon On Thu, Nov 11, 2010 at 8:06 PM, Tony Cappellini wrote: > Other than the memory consumption issue, this is very encouraging. > http://tinyurl.com/2flcuk9 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: