From ryanroser at gmail.com  Fri Aug  5 19:42:33 2011
From: ryanroser at gmail.com (Ryan Roser)
Date: Fri, 5 Aug 2011 10:42:33 -0700
Subject: [portland] Python dictionary performance as memory usage increases
Message-ID: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>

Hi,

I'm trying to improve the performance of some of my code.  I've noticed that
one of the bottlenecks involves making a large dictionary where the values
are lists.  Making a large dictionary is fast, repeatedly creating lists is
fast, but things slow down if I set the lists as values for the dictionary.
 Interestingly, this slowdown only occurs if there is already data in memory
in Python, and things get increasingly slow as the amount of memory used
increases.

I have a toy example demonstrating the behavior below. Do you know why this
is happening?  Is there a problem with my test?  Does Python do something
special when storing lists as values in dictionaries?  Is there a workaround
or an alternative data structure that doesn't exhibit slowdown as Python's
memory usage increases?

Thanks for the help,

Ryan


####################################
##  A test script
####################################
import time
import random

x = range(100000)
def test():
    # Creating a dictionary with an entry for each element in x
    # is fast, and so is repeatedly creating a list
    start = time.time()
    d = dict()
    for i in x:
        tmp = []
        tmp.append('something')
        d[i] = 1
    print 'dict w/o lists:', time.time() - start

    # but assigning the list to the dictionary gets very slow
    # if memory is not empty
    start = time.time()
    d = dict()
    for i in x:
        tmp = []
        tmp.append('something')
        d[i] = tmp
    print 'dict w lists:  ', time.time() - start

print 'runtimes with memory empty'
test()
print 'loading data'
data = [random.random() for i in xrange(30000000)] # ~1gb of mem
print 'runtimes with memory occupied'
test()
####################################


Results:

$ python2.4 tester.py
runtimes with memory empty
dict w/o lists: 0.0506901741028
dict w lists:   0.0766770839691
loading data
runtimes with memory occupied
dict w/o lists: 0.0391671657562
dict w lists:   2.18966984749

$ python2.6 tester.py
runtimes with memory empty
dict w/o lists: 0.0479600429535
dict w lists:   0.0784649848938
loading data
runtimes with memory occupied
dict w/o lists: 0.0361380577087
dict w lists:   2.49754095078

$ python2.7 tester.py
runtimes with memory empty
dict w/o lists: 0.0464890003204
dict w lists:   0.0735650062561
loading data
runtimes with memory occupied
dict w/o lists: 0.0356121063232
dict w lists:   2.49307012558


######## Python versions and machine info #########
Machine has 32 gb of ram, 8 cores

Python 2.4.3 (#1, Sep  3 2009, 15:37:37)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2

ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
Python 2.6.5 (r265:79063, Jul  5 2010, 10:31:13)
[GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2

Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2

$ uname -a
Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
x86_64 x86_64 x86_64 GNU/Linux
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html>

From joe at burks-family.us  Fri Aug  5 20:32:27 2011
From: joe at burks-family.us (Joseph Burks)
Date: Fri, 5 Aug 2011 11:32:27 -0700
Subject: [portland] Python dictionary performance as memory usage
	increases
In-Reply-To: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
Message-ID: <CAAuEX_ssR_ssM=h+06P5Vomo2MrkixLD3M7KNGLRR_WRgORBWA@mail.gmail.com>

The first thing that comes to mind is that in the slow case, you are using
memory substantially differently and probably making the garbage collector
work pretty hard. The integer "1" is immutable, so it is cached by the
Python VM. You don't have 100K instances of the "1" object, just 1 instance
with 100K references. However in the slow case, every list created is a new
object kept for all 100K iterations of the loop. Compare the disassembly of
the two lines that store to the dict:

d[i] = 1

disassembles to:

             53 LOAD_CONST               2 (1)
             56 LOAD_FAST                1 (d)
             59 LOAD_FAST                2 (i)
             62 STORE_SUBSCR

and

d[i] = tmp

disassembles to:

            139 LOAD_FAST                3 (tmp)
            142 LOAD_FAST                1 (d)
            145 LOAD_FAST                2 (i)
            148 STORE_SUBSCR

In the first case you are storing a cached constant value and in the second
you are storing a newly created object. Anyway, that's my best first guess.
I don't have a system quite that beefy to test on at the moment to profile
more deeply.

On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser <ryanroser at gmail.com> wrote:

> Hi,
>
> I'm trying to improve the performance of some of my code.  I've noticed
> that
> one of the bottlenecks involves making a large dictionary where the values
> are lists.  Making a large dictionary is fast, repeatedly creating lists is
> fast, but things slow down if I set the lists as values for the dictionary.
>  Interestingly, this slowdown only occurs if there is already data in
> memory
> in Python, and things get increasingly slow as the amount of memory used
> increases.
>
> I have a toy example demonstrating the behavior below. Do you know why this
> is happening?  Is there a problem with my test?  Does Python do something
> special when storing lists as values in dictionaries?  Is there a
> workaround
> or an alternative data structure that doesn't exhibit slowdown as Python's
> memory usage increases?
>
> Thanks for the help,
>
> Ryan
>
>
>
> ####################################
> ##  A test script
> ####################################
> import time
> import random
>
> x = range(100000)
> def test():
>    # Creating a dictionary with an entry for each element in x
>    # is fast, and so is repeatedly creating a list
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = 1
>    print 'dict w/o lists:', time.time() - start
>
>    # but assigning the list to the dictionary gets very slow
>    # if memory is not empty
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = tmp
>    print 'dict w lists:  ', time.time() - start
>
> print 'runtimes with memory empty'
> test()
> print 'loading data'
> data = [random.random() for i in xrange(30000000)] # ~1gb of mem
> print 'runtimes with memory occupied'
> test()
> ####################################
>
>
> Results:
>
> $ python2.4 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0506901741028
> dict w lists:   0.0766770839691
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0391671657562
> dict w lists:   2.18966984749
>
> $ python2.6 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0479600429535
> dict w lists:   0.0784649848938
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0361380577087
> dict w lists:   2.49754095078
>
> $ python2.7 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0464890003204
> dict w lists:   0.0735650062561
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0356121063232
> dict w lists:   2.49307012558
>
>
> ######## Python versions and machine info #########
> Machine has 32 gb of ram, 8 cores
>
> Python 2.4.3 (#1, Sep  3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
>
> ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
> Python 2.6.5 (r265:79063, Jul  5 2010, 10:31:13)
> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2
>
> Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
>
> $ uname -a
> Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html
> >
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110805/dcbcb71c/attachment.html>

From monk at netjunky.com  Fri Aug  5 20:34:37 2011
From: monk at netjunky.com (Jonathan Karon)
Date: Fri, 5 Aug 2011 11:34:37 -0700
Subject: [portland] Python dictionary performance as memory usage
	increases
In-Reply-To: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
Message-ID: <9F9F230E-2B64-4316-AC41-DC84463E50B0@netjunky.com>

Hi Ryan,

a few thoughts:

On the surface your code seems reasonable. 

I'm not a python internals expert by any means, but it's quite possible that one or more optimizations by the compiler are messing with you.

Since you don't retain a reference to tmp outside of the first loop, the garbage collector could be cacheing and re-using the allocated list for each iteration, which saves a massive amount of allocation work. (It could also be optimizing out the list operations entirely...)

The way you are allocating a chunk of memory to test under load creates 30,000,000 distinct blocks of memory. This is going to slow down new memory block allocation due to the way memory management works. It's not a python-specific thing, it applies to any general-purpose memory allocation strategy -- the more blocks of memory you have allocated the more time it takes to allocate new ones. 

~jonathan


On Aug 5, 2011, at 10:42 AM, Ryan Roser wrote:

> Hi,
> 
> I'm trying to improve the performance of some of my code.  I've noticed that
> one of the bottlenecks involves making a large dictionary where the values
> are lists.  Making a large dictionary is fast, repeatedly creating lists is
> fast, but things slow down if I set the lists as values for the dictionary.
> Interestingly, this slowdown only occurs if there is already data in memory
> in Python, and things get increasingly slow as the amount of memory used
> increases.
> 
> I have a toy example demonstrating the behavior below. Do you know why this
> is happening?  Is there a problem with my test?  Does Python do something
> special when storing lists as values in dictionaries?  Is there a workaround
> or an alternative data structure that doesn't exhibit slowdown as Python's
> memory usage increases?
> 
> Thanks for the help,
> 
> Ryan
> 
> 
> 
> ####################################
> ##  A test script
> ####################################
> import time
> import random
> 
> x = range(100000)
> def test():
>    # Creating a dictionary with an entry for each element in x
>    # is fast, and so is repeatedly creating a list
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = 1
>    print 'dict w/o lists:', time.time() - start
> 
>    # but assigning the list to the dictionary gets very slow
>    # if memory is not empty
>    start = time.time()
>    d = dict()
>    for i in x:
>        tmp = []
>        tmp.append('something')
>        d[i] = tmp
>    print 'dict w lists:  ', time.time() - start
> 
> print 'runtimes with memory empty'
> test()
> print 'loading data'
> data = [random.random() for i in xrange(30000000)] # ~1gb of mem
> print 'runtimes with memory occupied'
> test()
> ####################################
> 
> 
> Results:
> 
> $ python2.4 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0506901741028
> dict w lists:   0.0766770839691
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0391671657562
> dict w lists:   2.18966984749
> 
> $ python2.6 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0479600429535
> dict w lists:   0.0784649848938
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0361380577087
> dict w lists:   2.49754095078
> 
> $ python2.7 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0464890003204
> dict w lists:   0.0735650062561
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0356121063232
> dict w lists:   2.49307012558
> 
> 
> ######## Python versions and machine info #########
> Machine has 32 gb of ram, 8 cores
> 
> Python 2.4.3 (#1, Sep  3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
> 
> ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
> Python 2.6.5 (r265:79063, Jul  5 2010, 10:31:13)
> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2
> 
> Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
> 
> $ uname -a
> Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html>
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland


From georgedorn at gmail.com  Fri Aug  5 20:48:32 2011
From: georgedorn at gmail.com (Sam Thompson)
Date: Fri, 5 Aug 2011 11:48:32 -0700
Subject: [portland] Python dictionary performance as memory usage
	increases
In-Reply-To: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
Message-ID: <CAHGckhgvb1ehRivcuyGJnkE6i8PoXMm_a1Yc72nZWNdQBc2fLA@mail.gmail.com>

This has everything to do with the garbage collector and the overhead
for allocating more memory to the python process.
I ran some more tests, including running the 'dict w lists' code multiple times:

python tester.py
runtimes with memory empty
dict w/o lists: 0.0260519981384
dict w lists:   0.0418920516968
dict w lists:   0.0497941970825
loading data
runtimes with memory occupied
dict w/o lists: 0.0242760181427
dict w lists:   1.14558005333
dict w lists:   0.52930688858

It would appear that the first algorithm to create a bunch of lists
with occupied memory incurs some extra overhead, probably due to
memory allocation by the OS to the python process.  Further runs don't
have this problem as the previous run can use the GC'd memory from the
prior run.

Interestingly, pypy doesn't appear to exhibit this behavior to the
same degree, probably because its GC algorithm differs from cpython's:
pypy tester.py
runtimes with memory empty
dict w/o lists: 0.217184782028
dict w lists:   0.0508198738098
dict w lists:   0.0368840694427
loading data
runtimes with memory occupied
dict w/o lists: 0.0227448940277
dict w lists:   0.0329740047455
dict w lists:   0.0272088050842


In investigating this, I found yet another strange behavior.
Allocate a bunch of memory, create a huge number of lists, delete
them, and the next algorithm run involving lists is somehow even
faster than the constants.
Code is here: http://pastebin.com/GdMNkM2f

And the results:

runtimes with memory empty
dict w/o lists: 0.0253469944
dict w lists:   0.0402500629425
dict w lists:   0.0478730201721
dict w lists:   0.0460600852966
loading data
freeing some memory
runtimes with memory occupied
dict w/o lists: 0.0804419517517
dict w lists:   0.0254349708557      <---- What!?
dict w lists:   0.523415803909
dict w lists:   0.418542861938


On Fri, Aug 5, 2011 at 10:42 AM, Ryan Roser <ryanroser at gmail.com> wrote:
> Hi,
>
> I'm trying to improve the performance of some of my code. ?I've noticed that
> one of the bottlenecks involves making a large dictionary where the values
> are lists. ?Making a large dictionary is fast, repeatedly creating lists is
> fast, but things slow down if I set the lists as values for the dictionary.
> ?Interestingly, this slowdown only occurs if there is already data in memory
> in Python, and things get increasingly slow as the amount of memory used
> increases.
>
> I have a toy example demonstrating the behavior below. Do you know why this
> is happening? ?Is there a problem with my test? ?Does Python do something
> special when storing lists as values in dictionaries? ?Is there a workaround
> or an alternative data structure that doesn't exhibit slowdown as Python's
> memory usage increases?
>
> Thanks for the help,
>
> Ryan
>
>
>
> ####################################
> ## ?A test script
> ####################################
> import time
> import random
>
> x = range(100000)
> def test():
> ? ?# Creating a dictionary with an entry for each element in x
> ? ?# is fast, and so is repeatedly creating a list
> ? ?start = time.time()
> ? ?d = dict()
> ? ?for i in x:
> ? ? ? ?tmp = []
> ? ? ? ?tmp.append('something')
> ? ? ? ?d[i] = 1
> ? ?print 'dict w/o lists:', time.time() - start
>
> ? ?# but assigning the list to the dictionary gets very slow
> ? ?# if memory is not empty
> ? ?start = time.time()
> ? ?d = dict()
> ? ?for i in x:
> ? ? ? ?tmp = []
> ? ? ? ?tmp.append('something')
> ? ? ? ?d[i] = tmp
> ? ?print 'dict w lists: ?', time.time() - start
>
> print 'runtimes with memory empty'
> test()
> print 'loading data'
> data = [random.random() for i in xrange(30000000)] # ~1gb of mem
> print 'runtimes with memory occupied'
> test()
> ####################################
>
>
> Results:
>
> $ python2.4 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0506901741028
> dict w lists: ? 0.0766770839691
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0391671657562
> dict w lists: ? 2.18966984749
>
> $ python2.6 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0479600429535
> dict w lists: ? 0.0784649848938
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0361380577087
> dict w lists: ? 2.49754095078
>
> $ python2.7 tester.py
> runtimes with memory empty
> dict w/o lists: 0.0464890003204
> dict w lists: ? 0.0735650062561
> loading data
> runtimes with memory occupied
> dict w/o lists: 0.0356121063232
> dict w lists: ? 2.49307012558
>
>
> ######## Python versions and machine info #########
> Machine has 32 gb of ram, 8 cores
>
> Python 2.4.3 (#1, Sep ?3 2009, 15:37:37)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
>
> ActivePython 2.6.5.14 (ActiveState Software Inc.) based on
> Python 2.6.5 (r265:79063, Jul ?5 2010, 10:31:13)
> [GCC 4.0.0 20050519 (Red Hat 4.0.0-8)] on linux2
>
> Python 2.7.1 (r271:86832, May 25 2011, 13:34:05)
> [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
>
> $ uname -a
> Linux research-team10 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009
> x86_64 x86_64 x86_64 GNU/Linux
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://mail.python.org/pipermail/portland/attachments/20110805/95a97ef2/attachment.html>
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland
>

From ethan at stoneleaf.us  Fri Aug  5 21:11:19 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 05 Aug 2011 12:11:19 -0700
Subject: [portland] Python dictionary performance as memory
	usage	increases
In-Reply-To: <CAAuEX_ssR_ssM=h+06P5Vomo2MrkixLD3M7KNGLRR_WRgORBWA@mail.gmail.com>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
	<CAAuEX_ssR_ssM=h+06P5Vomo2MrkixLD3M7KNGLRR_WRgORBWA@mail.gmail.com>
Message-ID: <4E3C4057.2060201@stoneleaf.us>

Joseph Burks wrote:
> I don't have a system quite that beefy to test on at the moment to profile
> more deeply.


No kidding -- I tried the test code, and his slow case of 2.xxx took me 
243.xxx !

~Ethan~

From ryanroser at gmail.com  Fri Aug  5 21:05:40 2011
From: ryanroser at gmail.com (Ryan Roser)
Date: Fri, 5 Aug 2011 12:05:40 -0700
Subject: [portland] Python dictionary performance as memory usage
	increases
In-Reply-To: <4E3C4057.2060201@stoneleaf.us>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>
	<CAAuEX_ssR_ssM=h+06P5Vomo2MrkixLD3M7KNGLRR_WRgORBWA@mail.gmail.com>
	<4E3C4057.2060201@stoneleaf.us>
Message-ID: <CAB1uuZp_Bg5ND+zdsMoAxzH7fLDqgUC4r01ZC=QL12pTxofCvQ@mail.gmail.com>

Sam,

I think you're right.  The garbage collector is causing the slowdown.  If I
disable the garbage collector for the "memory occupied" test, the run time
is very similar.

- Ryan


##### The edit:
...
import gc
print 'runtimes with memory occupied'
gc.disable()
test()
gc.enable()
...

##### Performance:
$ python2.6 tester2.py
runtimes with memory empty
dict w/o lists: 0.0598680973053
dict w lists:   0.079540014267
loading data
runtimes with memory occupied
dict w/o lists: 0.0467381477356
dict w lists:   0.0416531562805


On Fri, Aug 5, 2011 at 12:11 PM, Ethan Furman <ethan at stoneleaf.us> wrote:

> Joseph Burks wrote:
>
>> I don't have a system quite that beefy to test on at the moment to profile
>> more deeply.
>>
>
>
> No kidding -- I tried the test code, and his slow case of 2.xxx took me
> 243.xxx !
>
> ~Ethan~
>
> ______________________________**_________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/**mailman/listinfo/portland<http://mail.python.org/mailman/listinfo/portland>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110805/27579e3a/attachment.html>

From ethan at stoneleaf.us  Sat Aug  6 00:27:39 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Fri, 05 Aug 2011 15:27:39 -0700
Subject: [portland] Python dictionary performance as
	memory	usage	increases
In-Reply-To: <4E3C4057.2060201@stoneleaf.us>
References: <CAB1uuZo-3a+LZzQwWYQ1g4Sc-LyoAExd0S_YFXeMxpKHZbyjrQ@mail.gmail.com>	<CAAuEX_ssR_ssM=h+06P5Vomo2MrkixLD3M7KNGLRR_WRgORBWA@mail.gmail.com>
	<4E3C4057.2060201@stoneleaf.us>
Message-ID: <4E3C6E5B.5060802@stoneleaf.us>

Ethan Furman wrote:
> Joseph Burks wrote:
>> I don't have a system quite that beefy to test on at the moment to 
>> profile
>> more deeply.
> 
> 
> No kidding -- I tried the test code, and his slow case of 2.xxx took me 
> 243.xxx !

Hmmm -- well, relieved and somewhat embarrassed to say that something 
else was bogging my system down -- my slow case is actually 3.xxx.

~Ethan~

From markgross at thegnar.org  Mon Aug  8 22:37:28 2011
From: markgross at thegnar.org (mark gross)
Date: Mon, 8 Aug 2011 13:37:28 -0700
Subject: [portland] Volunteer(s) for Django page.
Message-ID: <20110808203728.GA9265@gvim.org>

I'm trying to help https://sites.google.com/site/bikes4humanity/ get a
web app done and even though its a simple application I'm distracted by
other shiny things and can't give it the attention it needs by myself.

The web app is envisioned to be a type of http://www.kickstarter.com/
or, http://www.donorschoose.org/, knock off where needy people looking
for a refurbished bike can request and track the progress of the
donation and fix up process.  Boys and Girls club members would likely
be the requesters.

There is a google groups page and a bitbucket stub project with more
details... such as they are.

http://groups.google.com/group/team_web-b4hpdx

https://bitbucket.org/markgross/connect2b4h

I'll be at tomorrow nights PDXPython meeting if you have any questions
or interest in the project.

--mark


From michelle at pdxpython.org  Tue Aug  9 21:20:56 2011
From: michelle at pdxpython.org (Michelle Rowley)
Date: Tue, 9 Aug 2011 12:20:56 -0700
Subject: [portland] PDX Python meeting tonight @ 6:30pm
Message-ID: <18AB20CB-E56F-48A8-BA32-2486DB7DA23C@pdxpython.org>

Hey Pythoneers,

Just a friendly reminder that we're meeting tonight at the Urban Airship HQ for another installation of PDX Python. On deck tonight is Michel Pelletier himself with Michel's Module of the Month: operator. Next up, Eric Holscher will debut/practice one of his DjangoCon 2011 talks: Safely Deploying on the Cutting Edge. We'll round out the evening with lightning talks, so bring your 5-minute hacks, thoughts and rants to share!

After the meeting we'll head over to Bailey's Taproom to grab a beverage and continue the Pythonic parley.

Hope to see you there,
Michelle

---
Urban Airship is at 334 NW 11th Ave, in the Pearl District:
http://goo.gl/maps/U6mC

The main door will probably be locked, but the back door, which leads directly to the event space, will be propped open. The back door is right around the corner on NW Flanders, next to the loading dock:
http://goo.gl/maps/Ikbh

Adam will put up signs, and if you get lost you can call him at 503-866-0663.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110809/57e85b01/attachment.html>

From rshepard at appl-ecosys.com  Thu Aug 11 23:31:34 2011
From: rshepard at appl-ecosys.com (Rich Shepard)
Date: Thu, 11 Aug 2011 14:31:34 -0700 (PDT)
Subject: [portland] List Indexing Confusion
Message-ID: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>

   Once again it's been too long since I've written a script; every time I
mean to finish a model I've been writing a more critical business need has
pushed the coding back.

   I'm now faced with translating a couple of dozen spreadsheets (saved as
.csv files) into the proper format for insertion into a database table. My
brain refuses to get the row indexing correct.

   Here is the first few rows of a typical file:

CVS
Arsenic:Zinc:Nitrate Nitrogen:pH:Chloride:Sulfate:Total Dissolved Solids
1993-11-22:0.008:0.014:0.021:7.560:2.060:39.3:293.0

   I want an output file that looks like

CVS|1993-11-22|Arsenic|0.008
CVS|1993-11-22|Zinc|0.014
etc.

   The mal-functioning script is:
-------------------------
#!/usr/bin/env python

import sys,csv

filename = sys.argv[1]
try:
   infile = open(filename, 'r')
except:
   print "Can't open ", filename,"!"
   sys.exit(1)
indata = csv.reader(infile, delimiter=':')

loc = indata.next()      # only one field on first line
parmlist = indata.next() # the list of chemicals

outfile = open('out.csv', 'w')
outdata = csv.writer(outfile, delimiter = '|', lineterminator = '\n')

i = 0
j = 0

for row in indata:
   outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]])

   i += 1
   j += 1

infile.close()
outfile.close()
--------------------------

   List indexing is not that difficult so I am embarrassed to admit that I
don't see what I'm doing incorrectly. A clue would be very helpful.

Rich

From ryanroser at gmail.com  Fri Aug 12 00:00:27 2011
From: ryanroser at gmail.com (Ryan Roser)
Date: Thu, 11 Aug 2011 15:00:27 -0700
Subject: [portland] List Indexing Confusion
In-Reply-To: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>
References: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>
Message-ID: <CAB1uuZr0Xf1Dvzi_7SAvUAgh84agGK_FdBhcuOV_qhpEuxhfkg@mail.gmail.com>

I think the problem is with how you're referencing the row from indata.  I'm
not quite sure what you're trying to do with i and j.  I'd get rid of i and
j and replace the for loop with the following:

for row in indata:
    for parm, rowval in zip(parmlist,row[1:]):
        outdata.writerow([loc, row[0], parm, rowval])

(I didn't try the code out, so there may be a typo or some other error.)

Ryan


On Thu, Aug 11, 2011 at 2:31 PM, Rich Shepard <rshepard at appl-ecosys.com>wrote:

>  Once again it's been too long since I've written a script; every time I
> mean to finish a model I've been writing a more critical business need has
> pushed the coding back.
>
>  I'm now faced with translating a couple of dozen spreadsheets (saved as
> .csv files) into the proper format for insertion into a database table. My
> brain refuses to get the row indexing correct.
>
>  Here is the first few rows of a typical file:
>
> CVS
> Arsenic:Zinc:Nitrate Nitrogen:pH:Chloride:Sulfate:**Total Dissolved Solids
> 1993-11-22:0.008:0.014:0.021:**7.560:2.060:39.3:293.0
>
>  I want an output file that looks like
>
> CVS|1993-11-22|Arsenic|0.008
> CVS|1993-11-22|Zinc|0.014
> etc.
>
>  The mal-functioning script is:
> -------------------------
> #!/usr/bin/env python
>
> import sys,csv
>
> filename = sys.argv[1]
> try:
>  infile = open(filename, 'r')
> except:
>  print "Can't open ", filename,"!"
>  sys.exit(1)
> indata = csv.reader(infile, delimiter=':')
>
> loc = indata.next()      # only one field on first line
> parmlist = indata.next() # the list of chemicals
>
> outfile = open('out.csv', 'w')
> outdata = csv.writer(outfile, delimiter = '|', lineterminator = '\n')
>
> i = 0
> j = 0
>
> for row in indata:
>  outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]])
>
>  i += 1
>  j += 1
>
> infile.close()
> outfile.close()
> --------------------------
>
>  List indexing is not that difficult so I am embarrassed to admit that I
> don't see what I'm doing incorrectly. A clue would be very helpful.
>
> Rich
> ______________________________**_________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/**mailman/listinfo/portland<http://mail.python.org/mailman/listinfo/portland>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110811/bcb28ce9/attachment.html>

From rshepard at appl-ecosys.com  Fri Aug 12 00:11:13 2011
From: rshepard at appl-ecosys.com (Rich Shepard)
Date: Thu, 11 Aug 2011 15:11:13 -0700 (PDT)
Subject: [portland] List Indexing Confusion [RESOLVED]
In-Reply-To: <CAB1uuZr0Xf1Dvzi_7SAvUAgh84agGK_FdBhcuOV_qhpEuxhfkg@mail.gmail.com>
References: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>
	<CAB1uuZr0Xf1Dvzi_7SAvUAgh84agGK_FdBhcuOV_qhpEuxhfkg@mail.gmail.com>
Message-ID: <alpine.LNX.2.00.1108111509460.14240@salmo.appl-ecosys.com>

On Thu, 11 Aug 2011, Ryan Roser wrote:

> I think the problem is with how you're referencing the row from indata.

Ryan,

   Yep. That's what I needed to get straight.

> ... I'd get rid of i and j and replace the for loop with the following:
>
> for row in indata:
>    for parm, rowval in zip(parmlist,row[1:]):
>        outdata.writerow([loc, row[0], parm, rowval])

   That does solve the problem.

Thanks very much,

Rich

From ethan at stoneleaf.us  Fri Aug 12 00:30:47 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 11 Aug 2011 15:30:47 -0700
Subject: [portland] List Indexing Confusion
In-Reply-To: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>
References: <alpine.LNX.2.00.1108111421470.14240@salmo.appl-ecosys.com>
Message-ID: <4E445817.7010507@stoneleaf.us>

Rich Shepard wrote:
> for row in indata:
>   outdata.writerow([loc, row[i][j], parmlist[i], row[i][j+1]])

You're indexing into a string.

row = [
         '1993-11-22', '0.008', '0.014',
         '0.021', '7.560', '2.060',
         '39.3', 293.0
         ]

row[i] = '1993-11-221'  # when i == 0
row[i][j] = '1'         # when i == j == 0


You should be able to do most things in Python without resorting to 
manual indexing:

for row in indata:
     date = row[0]
     for chemical, amount in zip(parmlist, row[1:]):
         outdata.writerow([date, chemical, amount])


~Ethan~

From brian.curtin at gmail.com  Tue Aug 16 00:39:21 2011
From: brian.curtin at gmail.com (Brian Curtin)
Date: Mon, 15 Aug 2011 17:39:21 -0500
Subject: [portland] Looking for PyCon 2012 Speakers
Message-ID: <CAD+XWwqSoCuOu+bRJWSgHkQdVt-Ojox7fKcBkhAZO4v7V_DLjg@mail.gmail.com>

With PyCon 2012 efforts off to a great start, we?re looking for you,
the people of the Python community, so show us what you?ve got. Our
call for proposals (http://us.pycon.org/2012/cfp/) just went out and
we want to include you in our 2012 conference schedule, taking place
March 7-15, 2012 in Santa Clara, CA.

The call covers tutorial, talk, and poster applications, and we?re
expecting to blow the previous record of 250 applications out of the
water. Put together your best 3-hour class proposals for one of the
tutorial sessions on March 7 and 8. Submit your best talks on any
range of topics for the conference days, March 9 through 11. The
poster session will be in full swing on Sunday with a series of 4'x4'
posters and an open floor for attendees to interact with presenters.
Get your applications in early - we want to help you put together the
best proposal possible, so we?re going to work with submitters as
applications come in.

See more details and submit your talks here: http://us.pycon.org/2012/speaker/

We?re also looking for feedback from your past PyCon experiences along
with what you?re looking for in the future, by way of our 2012
Guidance Survey at
https://www.surveymonkey.com/s/pycon2012_launch_survey. The attendees
make the conference, so every response we get from you makes a
difference in putting together the best conference we can.

If you or your company is interested in sponsoring PyCon, we?d love to
hear from you. Join our growing list with Diamond sponsors Google and
Dropbox, and Platinum sponsors Microsoft, Nasuni, SurveyMonkey, and
Gondor by Eldarion. CCP Games, Linode, Walt Disney Animation Studios,
Canonical, DotCloud, Loggly, Revolution Systems, ZeOmega, bitly,
ActiveState, JetBrains, Snoball, Caktus Consulting Group, and Disqus
make up our Gold sponsors. The Silver sponsors so far are 10gen,
GitHub, Olark, Wingware, net-ng, Imaginary Landscape, BigDoor, Fwix,
AG Interactive, Bitbucket, The Open Bastion, Accense Technology, Cox
Media Group, and myYearbook. See our sponsorship page at
http://us.pycon.org/2012/sponsors/ for more details.

The PyCon Organizers - http://us.pycon.org/2012
Jesse Noller - Chairman - jnoller at python.org
Brian Curtin - Publicity Coordinator - brian at python.org

From helm.shawn at gmail.com  Tue Aug 16 06:17:41 2011
From: helm.shawn at gmail.com (Shawn Helm)
Date: Mon, 15 Aug 2011 21:17:41 -0700
Subject: [portland] Python Scientific Computing Links
Message-ID: <CANFYyjVoEn9E9c37mDBWvBoPX0oK-OeKyEn=isz5o7QdOrmeWw@mail.gmail.com>

Hi Portland python programmers,

Here are some recently posted links on scientific computing with python.

cheers, Shawn

The Python Papers Volume 6 Issue 2 is complete and ready for harvest at

    http://ojs.pythonpapers.org/index.php/tpp/issue/view/24<http://www.google.com/url?sa=D&q=http://ojs.pythonpapers.org/index.php/tpp/issue/view/24>
TPP is an open access journal, so free for all to consume (and publish).

The talks from SciPy have recently been published on their website:
    http://conference.scipy.org/scipy2011/talks.php

Also here is a pretty comprehensive page listing python modules that can be
used in Operations Research:

https://software.sandia.gov/trac/coopr/wiki/Documentation/RelatedProjects

There is also a python supercomputing conference coming up in November in
Seattle
    http://bit.ly/pyhpc2011

http://www.dlr.de/sc/Portaldata/15/Resources/dokumente/python_bof/sc11/PyHPC2011-Call-for-Paper.pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110815/19a609a6/attachment.html>

From helm.shawn at gmail.com  Thu Aug 18 22:38:13 2011
From: helm.shawn at gmail.com (Shawn Helm)
Date: Thu, 18 Aug 2011 13:38:13 -0700
Subject: [portland] Standford AI and Database courses
Message-ID: <CANFYyjWYe_oEeS9JvM_eh_Li7WRGo5xtitZTvJs+ctDLjJoMEw@mail.gmail.com>

 <pdxpug at postgresql.org>I just read about some free online classes that
Standford's offering this Fall.

Here's one on Artificial Intelligence and attracting over 90,000 folks
registering.
http://www.ai-class.com/<https://ch1prd0602.outlook.com/owa/redir.aspx?C=QHfYBxDS5k2sQzzU2pG69tWhgWx9Mc4IL3k0BdG5zvWInoD2g9bJQ0PCnwPf6ebKpDN0v5acxgU.&URL=http%3a%2f%2fwww.ai-class.com%2f>

Then here is intro database class too.

http://www.db-class.com/<https://ch1prd0602.outlook.com/owa/redir.aspx?C=QHfYBxDS5k2sQzzU2pG69tWhgWx9Mc4IL3k0BdG5zvWInoD2g9bJQ0PCnwPf6ebKpDN0v5acxgU.&URL=http%3a%2f%2fwww.db-class.com%2f>

I'm thinking about taking them -- I'd be interested to work with other
people in Portland who also decide to take the classes.

Thanks,
Shawn
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110818/a711ecf4/attachment.html>

From ethan at stoneleaf.us  Thu Aug 18 23:41:43 2011
From: ethan at stoneleaf.us (Ethan Furman)
Date: Thu, 18 Aug 2011 14:41:43 -0700
Subject: [portland] Standford AI and Database courses
In-Reply-To: <CANFYyjWYe_oEeS9JvM_eh_Li7WRGo5xtitZTvJs+ctDLjJoMEw@mail.gmail.com>
References: <CANFYyjWYe_oEeS9JvM_eh_Li7WRGo5xtitZTvJs+ctDLjJoMEw@mail.gmail.com>
Message-ID: <4E4D8717.3050200@stoneleaf.us>

Shawn Helm wrote:
>  <pdxpug at postgresql.org>I just read about some free online classes that
> Standford's offering this Fall.

Sounds like fun!

~Ethan~

From notbot at gmail.com  Fri Aug 19 07:32:34 2011
From: notbot at gmail.com (Michael Bunsen)
Date: Thu, 18 Aug 2011 22:32:34 -0700
Subject: [portland] Standford AI and Database courses
In-Reply-To: <4E4D8717.3050200@stoneleaf.us>
References: <CANFYyjWYe_oEeS9JvM_eh_Li7WRGo5xtitZTvJs+ctDLjJoMEw@mail.gmail.com>
	<4E4D8717.3050200@stoneleaf.us>
Message-ID: <CAKRz5Ei0dfhUnD7zWFQR7E11jjn=2rPNwQc4Y=hx59fUbL9AEQ@mail.gmail.com>

Yeah sounds fun. I'd be interested in having a study table or
whathaveyou as well.


2011/8/18 Ethan Furman <ethan at stoneleaf.us>:
> Shawn Helm wrote:
>>
>> ?<pdxpug at postgresql.org>I just read about some free online classes that
>> Standford's offering this Fall.
>
> Sounds like fun!
>
> ~Ethan~
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland
>

From igal at pragmaticraft.com  Sat Aug 20 19:18:33 2011
From: igal at pragmaticraft.com (Igal Koshevoy)
Date: Sat, 20 Aug 2011 10:18:33 -0700
Subject: [portland] OT: Summer Coder's Social, tomorrow,
	1-7pm at Laurelhurst Park
Message-ID: <CA+HfmFLyNyaca2z6h0vQtLKLp8+Tu=1nWenRgCAL4uXXL+9m-g@mail.gmail.com>

Quick reminder that the 2011 Summer Coder's Social is this tomorrow!

The Coder's Social is a popular event for local tech user group
members to get together and have a fun BBQ in the park. This is a very
casual event with food, socializing, outdoor activities and games,
making it perfect for bringing along your less-geeky significant other
and family.

When: 8/21 from 1-7pm
Where: Laurelhurst Park, Picnic Area A

Calagator link (with details):
  http://calagator.org/events/1250460828

This event is BYOB potluck, so it'd be great if you could bring
something and label it (e.g. vegan, eggs, dairy, gluten-free, bacon,
etc) to make it easier to share. You can see out what some others are
bringing (http://bit.ly/nxZXmr) and sign up to bring a dish of your
own (http://bit.ly/olpthd). Please don't be discouraged if you don't
see many signups, the event usually draws 50-100 people and many don't
list what they're bringing to the spreadsheet.

See you there!

-igal

From rachelsakry at gmail.com  Sun Aug 21 18:22:45 2011
From: rachelsakry at gmail.com (Rachel Sakry)
Date: Sun, 21 Aug 2011 09:22:45 -0700
Subject: [portland] Standford AI and Database courses
In-Reply-To: <CAKRz5Ei0dfhUnD7zWFQR7E11jjn=2rPNwQc4Y=hx59fUbL9AEQ@mail.gmail.com>
References: <CANFYyjWYe_oEeS9JvM_eh_Li7WRGo5xtitZTvJs+ctDLjJoMEw@mail.gmail.com>
	<4E4D8717.3050200@stoneleaf.us>
	<CAKRz5Ei0dfhUnD7zWFQR7E11jjn=2rPNwQc4Y=hx59fUbL9AEQ@mail.gmail.com>
Message-ID: <CAPZJ5LHYwUiDyhfHDLpZiFJmdoXUvoefdFXk8favPmfyJvaacw@mail.gmail.com>

Count me in for the database class/study group.


On Thu, Aug 18, 2011 at 10:32 PM, Michael Bunsen <notbot at gmail.com> wrote:

> Yeah sounds fun. I'd be interested in having a study table or
> whathaveyou as well.
>
>
> 2011/8/18 Ethan Furman <ethan at stoneleaf.us>:
> > Shawn Helm wrote:
> >>
> >>  <pdxpug at postgresql.org>I just read about some free online classes that
> >> Standford's offering this Fall.
> >
> > Sounds like fun!
> >
> > ~Ethan~
> > _______________________________________________
> > Portland mailing list
> > Portland at python.org
> > http://mail.python.org/mailman/listinfo/portland
> >
> _______________________________________________
> Portland mailing list
> Portland at python.org
> http://mail.python.org/mailman/listinfo/portland
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/portland/attachments/20110821/28a86fce/attachment.html>