From ryanroser at gmail.com Fri Aug 5 19:42:33 2011 From: ryanroser at gmail.com (Ryan Roser) Date: Fri, 5 Aug 2011 10:42:33 -0700 Subject: [portland] Python dictionary performance as memory usage increases Message-ID: Hi, I'm trying to improve the performance of some of my code. I've noticed that one of the bottlenecks involves making a large dictionary where the values are lists. Making a large dictionary is fast, repeatedly creating lists is fast, but things slow down if I set the lists as values for the dictionary. Interestingly, this slowdown only occurs if there is already data in memory in Python, and things get increasingly slow as the amount of memory used increases. I have a toy example demonstrating the behavior below. Do you know why this is happening? Is there a problem with my test? Does Python do something special when storing lists as values in dictionaries? Is there a workaround or an alternative data structure that doesn't exhibit slowdown as Python's memory usage increases? Thanks for the help, Ryan #################################### ## A test script #################################### import time import random x = range(100000) def test(): # Creating a dictionary with an entry for each element in x # is fast, and so is repeatedly creating a list start = time.time() d = dict() for i in x: tmp = [] tmp.append('something') d[i] = 1 print 'dict w/o lists:', time.time() - start # but assigning the list to the dictionary gets very slow # if memory is not empty start = time.time() d = dict() for i in x: tmp = [] tmp.append('something') d[i] = tmp print 'dict w lists: ', time.time() - start print 'runtimes with memory empty' test() print 'loading data' data = [random.random() for i in xrange(30000000)] # ~1gb of mem print 'runtimes with memory occupied' test() #################################### Results: $ python2.4 tester.py runtimes with memory empty dict w/o lists: 0.0506901741028 dict w lists: 0.0766770839691 loading data runtimes with memory occupied dict w/o lists: 0.0391671657562 dict w lists: 2.18966984749 $ python2.6 tester.py runtimes with memory empty dict w/o lists: 0.0479600429535 dict w lists: 0.0784649848938 loading data runtimes with memory occupied dict w/o lists: 0.0361380577087 dict w lists: 2.49754095078 $ python2.7 tester.py runtimes with memory empty dict w/o lists: 0.0464890003204 dict w lists: 0.0735650062561 loading data runtimes with memory occupied dict w/o lists: 0.0356121063232 dict w lists: 2.49307012558 ######## Python versions and machine info ######### Machine has 32 gb of ram, 8 cores Python 2.4.3 (#1, Sep 3 2009, 15:37:37) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 ActivePython 2.6.5.14 (ActiveState Software Inc.) based on Python 2.6.5 (r265:79063, Jul 5 2010, 10:31:13) [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 Python 2.7.1 (r271:86832, May 25 2011, 13:34:05) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 $ uname -a Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux -------------- next part -------------- An HTML attachment was scrubbed... URL: From joe at burks-family.us Fri Aug 5 20:32:27 2011 From: joe at burks-family.us (Joseph Burks) Date: Fri, 5 Aug 2011 11:32:27 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: References: Message-ID: The first thing that comes to mind is that in the slow case, you are using memory substantially differently and probably making the garbage collector work pretty hard. The integer "1" is immutable, so it is cached by the Python VM. You don't have 100K instances of the "1" object, just 1 instance with 100K references. However in the slow case, every list created is a new object kept for all 100K iterations of the loop. Compare the disassembly of the two lines that store to the dict: d[i] = 1 disassembles to: 53 LOAD_CONST 2 (1) 56 LOAD_FAST 1 (d) 59 LOAD_FAST 2 (i) 62 STORE_SUBSCR and d[i] = tmp disassembles to: 139 LOAD_FAST 3 (tmp) 142 LOAD_FAST 1 (d) 145 LOAD_FAST 2 (i) 148 STORE_SUBSCR In the first case you are storing a cached constant value and in the second you are storing a newly created object. Anyway, that's my best first guess. I don't have a system quite that beefy to test on at the moment to profile more deeply. On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser wrote: > Hi, > > I'm trying to improve the performance of some of my code. I've noticed > that > one of the bottlenecks involves making a large dictionary where the values > are lists. Making a large dictionary is fast, repeatedly creating lists is > fast, but things slow down if I set the lists as values for the dictionary. > Interestingly, this slowdown only occurs if there is already data in > memory > in Python, and things get increasingly slow as the amount of memory used > increases. > > I have a toy example demonstrating the behavior below. Do you know why this > is happening? Is there a problem with my test? Does Python do something > special when storing lists as values in dictionaries? Is there a > workaround > or an alternative data structure that doesn't exhibit slowdown as Python's > memory usage increases? > > Thanks for the help, > > Ryan > > > > #################################### > ## A test script > #################################### > import time > import random > > x = range(100000) > def test(): > # Creating a dictionary with an entry for each element in x > # is fast, and so is repeatedly creating a list > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = 1 > print 'dict w/o lists:', time.time() - start > > # but assigning the list to the dictionary gets very slow > # if memory is not empty > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = tmp > print 'dict w lists: ', time.time() - start > > print 'runtimes with memory empty' > test() > print 'loading data' > data = [random.random() for i in xrange(30000000)] # ~1gb of mem > print 'runtimes with memory occupied' > test() > #################################### > > > Results: > > $ python2.4 tester.py > runtimes with memory empty > dict w/o lists: 0.0506901741028 > dict w lists: 0.0766770839691 > loading data > runtimes with memory occupied > dict w/o lists: 0.0391671657562 > dict w lists: 2.18966984749 > > $ python2.6 tester.py > runtimes with memory empty > dict w/o lists: 0.0479600429535 > dict w lists: 0.0784649848938 > loading data > runtimes with memory occupied > dict w/o lists: 0.0361380577087 > dict w lists: 2.49754095078 > > $ python2.7 tester.py > runtimes with memory empty > dict w/o lists: 0.0464890003204 > dict w lists: 0.0735650062561 > loading data > runtimes with memory occupied > dict w/o lists: 0.0356121063232 > dict w lists: 2.49307012558 > > > ######## Python versions and machine info ######### > Machine has 32 gb of ram, 8 cores > > Python 2.4.3 (#1, Sep 3 2009, 15:37:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 > > ActivePython 2.6.5.14 (ActiveState Software Inc.) based on > Python 2.6.5 (r265:79063, Jul 5 2010, 10:31:13) > [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 > > Python 2.7.1 (r271:86832, May 25 2011, 13:34:05) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > > $ uname -a > Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: < > http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html > > > _______________________________________________ > Portland mailing list > Portland at python.org > http://mail.python.org/mailman/listinfo/portland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From monk at netjunky.com Fri Aug 5 20:34:37 2011 From: monk at netjunky.com (Jonathan Karon) Date: Fri, 5 Aug 2011 11:34:37 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: References: Message-ID: <9F9F230E-2B64-4316-AC41-DC84463E50B0@netjunky.com> Hi Ryan, a few thoughts: On the surface your code seems reasonable. I'm not a python internals expert by any means, but it's quite possible that one or more optimizations by the compiler are messing with you. Since you don't retain a reference to tmp outside of the first loop, the garbage collector could be cacheing and re-using the allocated list for each iteration, which saves a massive amount of allocation work. (It could also be optimizing out the list operations entirely...) The way you are allocating a chunk of memory to test under load creates 30,000,000 distinct blocks of memory. This is going to slow down new memory block allocation due to the way memory management works. It's not a python-specific thing, it applies to any general-purpose memory allocation strategy -- the more blocks of memory you have allocated the more time it takes to allocate new ones. ~jonathan On Aug 5, 2011, at 10:42 AM, Ryan Roser wrote: > Hi, > > I'm trying to improve the performance of some of my code. I've noticed that > one of the bottlenecks involves making a large dictionary where the values > are lists. Making a large dictionary is fast, repeatedly creating lists is > fast, but things slow down if I set the lists as values for the dictionary. > Interestingly, this slowdown only occurs if there is already data in memory > in Python, and things get increasingly slow as the amount of memory used > increases. > > I have a toy example demonstrating the behavior below. Do you know why this > is happening? Is there a problem with my test? Does Python do something > special when storing lists as values in dictionaries? Is there a workaround > or an alternative data structure that doesn't exhibit slowdown as Python's > memory usage increases? > > Thanks for the help, > > Ryan > > > > #################################### > ## A test script > #################################### > import time > import random > > x = range(100000) > def test(): > # Creating a dictionary with an entry for each element in x > # is fast, and so is repeatedly creating a list > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = 1 > print 'dict w/o lists:', time.time() - start > > # but assigning the list to the dictionary gets very slow > # if memory is not empty > start = time.time() > d = dict() > for i in x: > tmp = [] > tmp.append('something') > d[i] = tmp > print 'dict w lists: ', time.time() - start > > print 'runtimes with memory empty' > test() > print 'loading data' > data = [random.random() for i in xrange(30000000)] # ~1gb of mem > print 'runtimes with memory occupied' > test() > #################################### > > > Results: > > $ python2.4 tester.py > runtimes with memory empty > dict w/o lists: 0.0506901741028 > dict w lists: 0.0766770839691 > loading data > runtimes with memory occupied > dict w/o lists: 0.0391671657562 > dict w lists: 2.18966984749 > > $ python2.6 tester.py > runtimes with memory empty > dict w/o lists: 0.0479600429535 > dict w lists: 0.0784649848938 > loading data > runtimes with memory occupied > dict w/o lists: 0.0361380577087 > dict w lists: 2.49754095078 > > $ python2.7 tester.py > runtimes with memory empty > dict w/o lists: 0.0464890003204 > dict w lists: 0.0735650062561 > loading data > runtimes with memory occupied > dict w/o lists: 0.0356121063232 > dict w lists: 2.49307012558 > > > ######## Python versions and machine info ######### > Machine has 32 gb of ram, 8 cores > > Python 2.4.3 (#1, Sep 3 2009, 15:37:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 > > ActivePython 2.6.5.14 (ActiveState Software Inc.) based on > Python 2.6.5 (r265:79063, Jul 5 2010, 10:31:13) > [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 > > Python 2.7.1 (r271:86832, May 25 2011, 13:34:05) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > > $ uname -a > Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > _______________________________________________ > Portland mailing list > Portland at python.org > http://mail.python.org/mailman/listinfo/portland From georgedorn at gmail.com Fri Aug 5 20:48:32 2011 From: georgedorn at gmail.com (Sam Thompson) Date: Fri, 5 Aug 2011 11:48:32 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: References: Message-ID: This has everything to do with the garbage collector and the overhead for allocating more memory to the python process. I ran some more tests, including running the 'dict w lists' code multiple times: python tester.py runtimes with memory empty dict w/o lists: 0.0260519981384 dict w lists: 0.0418920516968 dict w lists: 0.0497941970825 loading data runtimes with memory occupied dict w/o lists: 0.0242760181427 dict w lists: 1.14558005333 dict w lists: 0.52930688858 It would appear that the first algorithm to create a bunch of lists with occupied memory incurs some extra overhead, probably due to memory allocation by the OS to the python process. Further runs don't have this problem as the previous run can use the GC'd memory from the prior run. Interestingly, pypy doesn't appear to exhibit this behavior to the same degree, probably because its GC algorithm differs from cpython's: pypy tester.py runtimes with memory empty dict w/o lists: 0.217184782028 dict w lists: 0.0508198738098 dict w lists: 0.0368840694427 loading data runtimes with memory occupied dict w/o lists: 0.0227448940277 dict w lists: 0.0329740047455 dict w lists: 0.0272088050842 In investigating this, I found yet another strange behavior. Allocate a bunch of memory, create a huge number of lists, delete them, and the next algorithm run involving lists is somehow even faster than the constants. Code is here: http://pastebin.com/GdMNkM2f And the results: runtimes with memory empty dict w/o lists: 0.0253469944 dict w lists: 0.0402500629425 dict w lists: 0.0478730201721 dict w lists: 0.0460600852966 loading data freeing some memory runtimes with memory occupied dict w/o lists: 0.0804419517517 dict w lists: 0.0254349708557 <---- What!? dict w lists: 0.523415803909 dict w lists: 0.418542861938 On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser wrote: > Hi, > > I'm trying to improve the performance of some of my code. ?I've noticed that > one of the bottlenecks involves making a large dictionary where the values > are lists. ?Making a large dictionary is fast, repeatedly creating lists is > fast, but things slow down if I set the lists as values for the dictionary. > ?Interestingly, this slowdown only occurs if there is already data in memory > in Python, and things get increasingly slow as the amount of memory used > increases. > > I have a toy example demonstrating the behavior below. Do you know why this > is happening? ?Is there a problem with my test? ?Does Python do something > special when storing lists as values in dictionaries? ?Is there a workaround > or an alternative data structure that doesn't exhibit slowdown as Python's > memory usage increases? > > Thanks for the help, > > Ryan > > > > #################################### > ## ?A test script > #################################### > import time > import random > > x = range(100000) > def test(): > ? ?# Creating a dictionary with an entry for each element in x > ? ?# is fast, and so is repeatedly creating a list > ? ?start = time.time() > ? ?d = dict() > ? ?for i in x: > ? ? ? ?tmp = [] > ? ? ? ?tmp.append('something') > ? ? ? ?d[i] = 1 > ? ?print 'dict w/o lists:', time.time() - start > > ? ?# but assigning the list to the dictionary gets very slow > ? ?# if memory is not empty > ? ?start = time.time() > ? ?d = dict() > ? ?for i in x: > ? ? ? ?tmp = [] > ? ? ? ?tmp.append('something') > ? ? ? ?d[i] = tmp > ? ?print 'dict w lists: ?', time.time() - start > > print 'runtimes with memory empty' > test() > print 'loading data' > data = [random.random() for i in xrange(30000000)] # ~1gb of mem > print 'runtimes with memory occupied' > test() > #################################### > > > Results: > > $ python2.4 tester.py > runtimes with memory empty > dict w/o lists: 0.0506901741028 > dict w lists: ? 0.0766770839691 > loading data > runtimes with memory occupied > dict w/o lists: 0.0391671657562 > dict w lists: ? 2.18966984749 > > $ python2.6 tester.py > runtimes with memory empty > dict w/o lists: 0.0479600429535 > dict w lists: ? 0.0784649848938 > loading data > runtimes with memory occupied > dict w/o lists: 0.0361380577087 > dict w lists: ? 2.49754095078 > > $ python2.7 tester.py > runtimes with memory empty > dict w/o lists: 0.0464890003204 > dict w lists: ? 0.0735650062561 > loading data > runtimes with memory occupied > dict w/o lists: 0.0356121063232 > dict w lists: ? 2.49307012558 > > > ######## Python versions and machine info ######### > Machine has 32 gb of ram, 8 cores > > Python 2.4.3 (#1, Sep ?3 2009, 15:37:37) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2 > > ActivePython 2.6.5.14 (ActiveState Software Inc.) based on > Python 2.6.5 (r265:79063, Jul ?5 2010, 10:31:13) > [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2 > > Python 2.7.1 (r271:86832, May 25 2011, 13:34:05) > [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 > > $ uname -a > Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 > x86_64 x86_64 x86_64 GNU/Linux > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > _______________________________________________ > Portland mailing list > Portland at python.org > http://mail.python.org/mailman/listinfo/portland > From ethan at stoneleaf.us Fri Aug 5 21:11:19 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Aug 2011 12:11:19 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: References: Message-ID: <4E3C4057.2060201@stoneleaf.us> Joseph Burks wrote: > I don't have a system quite that beefy to test on at the moment to profile > more deeply. No kidding -- I tried the test code, and his slow case of 2.xxx took me 243.xxx ! ~Ethan~ From ryanroser at gmail.com Fri Aug 5 21:05:40 2011 From: ryanroser at gmail.com (Ryan Roser) Date: Fri, 5 Aug 2011 12:05:40 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: <4E3C4057.2060201@stoneleaf.us> References: <4E3C4057.2060201@stoneleaf.us> Message-ID: Sam, I think you're right. The garbage collector is causing the slowdown. If I disable the garbage collector for the "memory occupied" test, the run time is very similar. - Ryan ##### The edit: ... import gc print 'runtimes with memory occupied' gc.disable() test() gc.enable() ... ##### Performance: $ python2.6 tester2.py runtimes with memory empty dict w/o lists: 0.0598680973053 dict w lists: 0.079540014267 loading data runtimes with memory occupied dict w/o lists: 0.0467381477356 dict w lists: 0.0416531562805 On Fri, Aug 5, 2011 at 12:11 PM, Ethan Furman wrote: > Joseph Burks wrote: > >> I don't have a system quite that beefy to test on at the moment to profile >> more deeply. >> > > > No kidding -- I tried the test code, and his slow case of 2.xxx took me > 243.xxx ! > > ~Ethan~ > > ______________________________**_________________ > Portland mailing list > Portland at python.org > http://mail.python.org/**mailman/listinfo/portland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Sat Aug 6 00:27:39 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Fri, 05 Aug 2011 15:27:39 -0700 Subject: [portland] Python dictionary performance as memory usage increases In-Reply-To: <4E3C4057.2060201@stoneleaf.us> References: <4E3C4057.2060201@stoneleaf.us> Message-ID: <4E3C6E5B.5060802@stoneleaf.us> Ethan Furman wrote: > Joseph Burks wrote: >> I don't have a system quite that beefy to test on at the moment to >> profile >> more deeply. > > > No kidding -- I tried the test code, and his slow case of 2.xxx took me > 243.xxx ! Hmmm -- well, relieved and somewhat embarrassed to say that something else was bogging my system down -- my slow case is actually 3.xxx. ~Ethan~ From markgross at thegnar.org Mon Aug 8 22:37:28 2011 From: markgross at thegnar.org (mark gross) Date: Mon, 8 Aug 2011 13:37:28 -0700 Subject: [portland] Volunteer(s) for Django page. Message-ID: <20110808203728.GA9265@gvim.org> I'm trying to help https://sites.google.com/site/bikes4humanity/ get a web app done and even though its a simple application I'm distracted by other shiny things and can't give it the attention it needs by myself. The web app is envisioned to be a type of http://www.kickstarter.com/ or, http://www.donorschoose.org/, knock off where needy people looking for a refurbished bike can request and track the progress of the donation and fix up process. Boys and Girls club members would likely be the requesters. There is a google groups page and a bitbucket stub project with more details... such as they are. http://groups.google.com/group/team_web-b4hpdx https://bitbucket.org/markgross/connect2b4h I'll be at tomorrow nights PDXPython meeting if you have any questions or interest in the project. --mark From michelle at pdxpython.org Tue Aug 9 21:20:56 2011 From: michelle at pdxpython.org (Michelle Rowley) Date: Tue, 9 Aug 2011 12:20:56 -0700 Subject: [portland] PDX Python meeting tonight @ 6:30pm Message-ID: <18AB20CB-E56F-48A8-BA32-2486DB7DA23C@pdxpython.org> Hey Pythoneers, Just a friendly reminder that we're meeting tonight at the Urban Airship HQ for another installation of PDX Python. On deck tonight is Michel Pelletier himself with Michel's Module of the Month: operator. Next up, Eric Holscher will debut/practice one of his DjangoCon 2011 talks: Safely Deploying on the Cutting Edge. We'll round out the evening with lightning talks, so bring your 5-minute hacks, thoughts and rants to share! After the meeting we'll head over to Bailey's Taproom to grab a beverage and continue the Pythonic parley. Hope to see you there, Michelle --- Urban Airship is at 334 NW 11th Ave, in the Pearl District: http://goo.gl/maps/U6mC The main door will probably be locked, but the back door, which leads directly to the event space, will be propped open. The back door is right around the corner on NW Flanders, next to the loading dock: http://goo.gl/maps/Ikbh Adam will put up signs, and if you get lost you can call him at 503-866-0663. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rshepard at appl-ecosys.com Thu Aug 11 23:31:34 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Thu, 11 Aug 2011 14:31:34 -0700 (PDT) Subject: [portland] List Indexing Confusion Message-ID: Once again it's been too long since I've written a script; every time I mean to finish a model I've been writing a more critical business need has pushed the coding back. I'm now faced with translating a couple of dozen spreadsheets (saved as .csv files) into the proper format for insertion into a database table. My brain refuses to get the row indexing correct. Here is the first few rows of a typical file: CVS Arsenic:Zinc:Nitrate Nitrogen:pH:Chloride:Sulfate:Total Dissolved Solids 1993-11-22:0.008:0.014:0.021:7.560:2.060:39.3:293.0 I want an output file that looks like CVS|1993-11-22|Arsenic|0.008 CVS|1993-11-22|Zinc|0.014 etc. The mal-functioning script is: ------------------------- #!/usr/bin/env python import sys,csv filename = sys.argv[1] try: infile = open(filename, 'r') except: print "Can't open ", filename,"!" sys.exit(1) indata = csv.reader(infile, delimiter=':') loc = indata.next() # only one field on first line parmlist = indata.next() # the list of chemicals outfile = open('out.csv', 'w') outdata = csv.writer(outfile, delimiter = '|', lineterminator = '\n') i = 0 j = 0 for row in indata: outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]]) i += 1 j += 1 infile.close() outfile.close() -------------------------- List indexing is not that difficult so I am embarrassed to admit that I don't see what I'm doing incorrectly. A clue would be very helpful. Rich From ryanroser at gmail.com Fri Aug 12 00:00:27 2011 From: ryanroser at gmail.com (Ryan Roser) Date: Thu, 11 Aug 2011 15:00:27 -0700 Subject: [portland] List Indexing Confusion In-Reply-To: References: Message-ID: I think the problem is with how you're referencing the row from indata. I'm not quite sure what you're trying to do with i and j. I'd get rid of i and j and replace the for loop with the following: for row in indata: for parm, rowval in zip(parmlist,row[1:]): outdata.writerow([loc, row[0], parm, rowval]) (I didn't try the code out, so there may be a typo or some other error.) Ryan On Thu, Aug 11, 2011 at 2:31 PM, Rich Shepard wrote: > Once again it's been too long since I've written a script; every time I > mean to finish a model I've been writing a more critical business need has > pushed the coding back. > > I'm now faced with translating a couple of dozen spreadsheets (saved as > .csv files) into the proper format for insertion into a database table. My > brain refuses to get the row indexing correct. > > Here is the first few rows of a typical file: > > CVS > Arsenic:Zinc:Nitrate Nitrogen:pH:Chloride:Sulfate:**Total Dissolved Solids > 1993-11-22:0.008:0.014:0.021:**7.560:2.060:39.3:293.0 > > I want an output file that looks like > > CVS|1993-11-22|Arsenic|0.008 > CVS|1993-11-22|Zinc|0.014 > etc. > > The mal-functioning script is: > ------------------------- > #!/usr/bin/env python > > import sys,csv > > filename = sys.argv[1] > try: > infile = open(filename, 'r') > except: > print "Can't open ", filename,"!" > sys.exit(1) > indata = csv.reader(infile, delimiter=':') > > loc = indata.next() # only one field on first line > parmlist = indata.next() # the list of chemicals > > outfile = open('out.csv', 'w') > outdata = csv.writer(outfile, delimiter = '|', lineterminator = '\n') > > i = 0 > j = 0 > > for row in indata: > outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]]) > > i += 1 > j += 1 > > infile.close() > outfile.close() > -------------------------- > > List indexing is not that difficult so I am embarrassed to admit that I > don't see what I'm doing incorrectly. A clue would be very helpful. > > Rich > ______________________________**_________________ > Portland mailing list > Portland at python.org > http://mail.python.org/**mailman/listinfo/portland > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rshepard at appl-ecosys.com Fri Aug 12 00:11:13 2011 From: rshepard at appl-ecosys.com (Rich Shepard) Date: Thu, 11 Aug 2011 15:11:13 -0700 (PDT) Subject: [portland] List Indexing Confusion [RESOLVED] In-Reply-To: References: Message-ID: On Thu, 11 Aug 2011, Ryan Roser wrote: > I think the problem is with how you're referencing the row from indata. Ryan, Yep. That's what I needed to get straight. > ... I'd get rid of i and j and replace the for loop with the following: > > for row in indata: > for parm, rowval in zip(parmlist,row[1:]): > outdata.writerow([loc, row[0], parm, rowval]) That does solve the problem. Thanks very much, Rich From ethan at stoneleaf.us Fri Aug 12 00:30:47 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 11 Aug 2011 15:30:47 -0700 Subject: [portland] List Indexing Confusion In-Reply-To: References: Message-ID: <4E445817.7010507@stoneleaf.us> Rich Shepard wrote: > for row in indata: > outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]]) You're indexing into a string. row = [ '1993-11-22', '0.008', '0.014', '0.021', '7.560', '2.060', '39.3', 293.0 ] row[i] = '1993-11-221' # when i == 0 row[i][j] = '1' # when i == j == 0 You should be able to do most things in Python without resorting to manual indexing: for row in indata: date = row[0] for chemical, amount in zip(parmlist, row[1:]): outdata.writerow([date, chemical, amount]) ~Ethan~ From brian.curtin at gmail.com Tue Aug 16 00:39:21 2011 From: brian.curtin at gmail.com (Brian Curtin) Date: Mon, 15 Aug 2011 17:39:21 -0500 Subject: [portland] Looking for PyCon 2012 Speakers Message-ID: With PyCon 2012 efforts off to a great start, we?re looking for you, the people of the Python community, so show us what you?ve got. Our call for proposals (http://us.pycon.org/2012/cfp/) just went out and we want to include you in our 2012 conference schedule, taking place March 7-15, 2012 in Santa Clara, CA. The call covers tutorial, talk, and poster applications, and we?re expecting to blow the previous record of 250 applications out of the water. Put together your best 3-hour class proposals for one of the tutorial sessions on March 7 and 8. Submit your best talks on any range of topics for the conference days, March 9 through 11. The poster session will be in full swing on Sunday with a series of 4'x4' posters and an open floor for attendees to interact with presenters. Get your applications in early - we want to help you put together the best proposal possible, so we?re going to work with submitters as applications come in. See more details and submit your talks here: http://us.pycon.org/2012/speaker/ We?re also looking for feedback from your past PyCon experiences along with what you?re looking for in the future, by way of our 2012 Guidance Survey at https://www.surveymonkey.com/s/pycon2012_launch_survey. The attendees make the conference, so every response we get from you makes a difference in putting together the best conference we can. If you or your company is interested in sponsoring PyCon, we?d love to hear from you. Join our growing list with Diamond sponsors Google and Dropbox, and Platinum sponsors Microsoft, Nasuni, SurveyMonkey, and Gondor by Eldarion. CCP Games, Linode, Walt Disney Animation Studios, Canonical, DotCloud, Loggly, Revolution Systems, ZeOmega, bitly, ActiveState, JetBrains, Snoball, Caktus Consulting Group, and Disqus make up our Gold sponsors. The Silver sponsors so far are 10gen, GitHub, Olark, Wingware, net-ng, Imaginary Landscape, BigDoor, Fwix, AG Interactive, Bitbucket, The Open Bastion, Accense Technology, Cox Media Group, and myYearbook. See our sponsorship page at http://us.pycon.org/2012/sponsors/ for more details. The PyCon Organizers - http://us.pycon.org/2012 Jesse Noller - Chairman - jnoller at python.org Brian Curtin - Publicity Coordinator - brian at python.org From helm.shawn at gmail.com Tue Aug 16 06:17:41 2011 From: helm.shawn at gmail.com (Shawn Helm) Date: Mon, 15 Aug 2011 21:17:41 -0700 Subject: [portland] Python Scientific Computing Links Message-ID: Hi Portland python programmers, Here are some recently posted links on scientific computing with python. cheers, Shawn The Python Papers Volume 6 Issue 2 is complete and ready for harvest at http://ojs.pythonpapers.org/index.php/tpp/issue/view/24 TPP is an open access journal, so free for all to consume (and publish). The talks from SciPy have recently been published on their website: http://conference.scipy.org/scipy2011/talks.php Also here is a pretty comprehensive page listing python modules that can be used in Operations Research: https://software.sandia.gov/trac/coopr/wiki/Documentation/RelatedProjects There is also a python supercomputing conference coming up in November in Seattle http://bit.ly/pyhpc2011 http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/python_bof/sc11/PyHPC2011-Call-for-Paper.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From helm.shawn at gmail.com Thu Aug 18 22:38:13 2011 From: helm.shawn at gmail.com (Shawn Helm) Date: Thu, 18 Aug 2011 13:38:13 -0700 Subject: [portland] Standford AI and Database courses Message-ID: I just read about some free online classes that Standford's offering this Fall. Here's one on Artificial Intelligence and attracting over 90,000 folks registering. http://www.ai-class.com/ Then here is intro database class too. http://www.db-class.com/ I'm thinking about taking them -- I'd be interested to work with other people in Portland who also decide to take the classes. Thanks, Shawn -------------- next part -------------- An HTML attachment was scrubbed... URL: From ethan at stoneleaf.us Thu Aug 18 23:41:43 2011 From: ethan at stoneleaf.us (Ethan Furman) Date: Thu, 18 Aug 2011 14:41:43 -0700 Subject: [portland] Standford AI and Database courses In-Reply-To: References: Message-ID: <4E4D8717.3050200@stoneleaf.us> Shawn Helm wrote: > I just read about some free online classes that > Standford's offering this Fall. Sounds like fun! ~Ethan~ From notbot at gmail.com Fri Aug 19 07:32:34 2011 From: notbot at gmail.com (Michael Bunsen) Date: Thu, 18 Aug 2011 22:32:34 -0700 Subject: [portland] Standford AI and Database courses In-Reply-To: <4E4D8717.3050200@stoneleaf.us> References: <4E4D8717.3050200@stoneleaf.us> Message-ID: Yeah sounds fun. I'd be interested in having a study table or whathaveyou as well. 2011/8/18 Ethan Furman : > Shawn Helm wrote: >> >> ?I just read about some free online classes that >> Standford's offering this Fall. > > Sounds like fun! > > ~Ethan~ > _______________________________________________ > Portland mailing list > Portland at python.org > http://mail.python.org/mailman/listinfo/portland > From igal at pragmaticraft.com Sat Aug 20 19:18:33 2011 From: igal at pragmaticraft.com (Igal Koshevoy) Date: Sat, 20 Aug 2011 10:18:33 -0700 Subject: [portland] OT: Summer Coder's Social, tomorrow, 1-7pm at Laurelhurst Park Message-ID: Quick reminder that the 2011 Summer Coder's Social is this tomorrow! The Coder's Social is a popular event for local tech user group members to get together and have a fun BBQ in the park. This is a very casual event with food, socializing, outdoor activities and games, making it perfect for bringing along your less-geeky significant other and family. When: 8/21 from 1-7pm Where: Laurelhurst Park, Picnic Area A Calagator link (with details): http://calagator.org/events/1250460828 This event is BYOB potluck, so it'd be great if you could bring something and label it (e.g. vegan, eggs, dairy, gluten-free, bacon, etc) to make it easier to share. You can see out what some others are bringing (http://bit.ly/nxZXmr) and sign up to bring a dish of your own (http://bit.ly/olpthd). Please don't be discouraged if you don't see many signups, the event usually draws 50-100 people and many don't list what they're bringing to the spreadsheet. See you there! -igal From rachelsakry at gmail.com Sun Aug 21 18:22:45 2011 From: rachelsakry at gmail.com (Rachel Sakry) Date: Sun, 21 Aug 2011 09:22:45 -0700 Subject: [portland] Standford AI and Database courses In-Reply-To: References: <4E4D8717.3050200@stoneleaf.us> Message-ID: Count me in for the database class/study group. On Thu, Aug 18, 2011 at 10:32 PM, Michael Bunsen wrote: > Yeah sounds fun. I'd be interested in having a study table or > whathaveyou as well. > > > 2011/8/18 Ethan Furman : > > Shawn Helm wrote: > >> > >> I just read about some free online classes that > >> Standford's offering this Fall. > > > > Sounds like fun! > > > > ~Ethan~ > > _______________________________________________ > > Portland mailing list > > Portland at python.org > > http://mail.python.org/mailman/listinfo/portland > > > _______________________________________________ > Portland mailing list > Portland at python.org > http://mail.python.org/mailman/listinfo/portland > -------------- next part -------------- An HTML attachment was scrubbed... URL: