Python-list Digest, Vol 88, Issue 69

williem75 at gmail.com williem75 at gmail.com
Wed Jan 26 16:25:26 EST 2011


Sent from my LG phone

python-list-request at python.org wrote:

>Send Python-list mailing list submissions to
>	python-list at python.org
>
>To subscribe or unsubscribe via the World Wide Web, visit
>	http://mail.python.org/mailman/listinfo/python-list
>or, via email, send a message with subject or body 'help' to
>	python-list-request at python.org
>
>You can reach the person managing the list at
>	python-list-owner at python.org
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Python-list digest..."
>
>Today's Topics:
>
>   1. Re: Python use growing fast (Alice Bevan?McGregor)
>   2. Re: order of importing modules (Chris Rebert)
>   3. Re: How to Buffer Serialized Objects to Disk (MRAB)
>   4. Re: How to Buffer Serialized Objects to Disk (Chris Rebert)
>   5. Re: How to Buffer Serialized Objects to Disk (Peter Otten)
>   6. Re: Best way to automatically copy out attachments from an
>      email (Chris Rebert)
>   7. Re: Parsing string for "<verb> <noun>" (Aahz)
>   8. Re: Nested structures question (Tim Harig)
>   9. Re: How to Buffer Serialized Objects to Disk (Scott McCarty)
>
>On 2011-01-10 19:49:47 -0800, Roy Smith said:
>
>> One of the surprising (to me, anyway) uses of JavaScript is as the 
>> scripting language for MongoDB (http://www.mongodb.org/).
>
>I just wish they'd drop spidermonkey and go with V8 or another, faster 
>and more modern engine.  :(
>
>	- Alice.
>
>
>
>
>> Dan Stromberg wrote:
>>> On Tue, Jan 11, 2011 at 4:30 PM, Catherine Moroney
>>> <Catherine.M.Moroney at jpl.nasa.gov> wrote:
>>>>
>>>> In what order does python import modules on a Linux system?  I have a
>>>> package that is both installed in /usr/lib64/python2.5/site-packages,
>>>> and a newer version of the same module in a working directory.
>>>>
>>>> I want to import the version from the working directory, but when I
>>>> print module.__file__ in the interpreter after importing the module,
>>>> I get the version that's in site-packages.
>>>>
>>>> I've played with the PYTHONPATH environmental variable by setting it
>>>> to just the path of the working directory, but when I import the module
>>>> I still pick up the version in site-packages.
>>>>
>>>> /usr/lib64 is in my PATH variable, but doesn't appear anywhere else.  I
>>>> don't want to remove /usr/lib64 from my PATH because that will break
>>>> a lot of stuff.
>>>>
>>>> Can I force python to import from my PYTHONPATH first, before looking
>>>> in the system directory?
>>>>
>>> Please import sys and inspect sys.path; this defines the search path
>>> for imports.
>>>
>>> By looking at sys.path, you can see where in the search order your
>>> $PYTHONPATH is going.
>>>
>On Wed, Jan 12, 2011 at 11:07 AM, Catherine Moroney
><Catherine.M.Moroney at jpl.nasa.gov> wrote:
>> I've looked at my sys.path variable and I see that it has
>> a whole bunch of site-package directories, followed by the
>> contents of my $PYTHONPATH variable, followed by a list of
>> misc site-package variables (see below).
><snip>
>> But, I'm curious as to where the first bunch of 'site-package'
>> entries come from.  The
>> /usr/lib64/python2.5/site-packages/pyhdfeos-1.0_r57_58-py2.5-linux-x86_64.egg
>> is not present in any of my environmental variables yet it shows up
>> as one of the first entries in sys.path.
>
>You probably have a .pth file somewhere that adds it (since it's an
>egg, probably site-packages/easy-install.pth).
>See http://docs.python.org/install/index.html#modifying-python-s-search-path
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>On 12/01/2011 21:05, Scott McCarty wrote:
>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am
>> just looking for a little kick in the right direction.
>>
>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the
>> main object types to and from disk.
>>
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory.
>> With a strait list it looks like I could build from xreadlines(), but
>> once you turn it into a more complex object, I don't quick know where to go.
>>
>> I understand how to pickle the entire data structure, but I need
>> something that will manage the memory/disk allocation?  Any thoughts?
>>
>To me it sounds like you need to use a database.
>
>
>On Wed, Jan 12, 2011 at 1:05 PM, Scott McCarty <scott.mccarty at gmail.com> wrote:
>> Sorry to ask this question. I have search the list archives and googled, but
>> I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>> I have a Python based log analysis program called petit
>> (http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use for
>> analysis techniques. At the very beginning I build up the list of objects
>> then would like to start pickling it while building to save memory. I want
>> to be able to process more entries than I have memory. With a strait list it
>> looks like I could build from xreadlines(), but once you turn it into a more
>> complex object, I don't quick know where to go.
>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation?  Any thoughts?
>
>You could subclass `list` and use sys.getsizeof()
>[http://docs.python.org/library/sys.html#sys.getsizeof ] to keep track
>of the size of the elements, and then start pickling them to disk once
>the total size reaches some preset limit.
>But like MRAB said, using a proper database, e.g. SQLite
>(http://docs.python.org/library/sqlite3.html ), wouldn't be a bad idea
>either.
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>Scott McCarty wrote:
>
>> Sorry to ask this question. I have search the list archives and googled,
>> but I don't even know what words to find what I am looking for, I am just
>> looking for a little kick in the right direction.
>> 
>> I have a Python based log analysis program called petit (
>> http://crunchtools.com/petit). I am trying to modify it to manage the main
>> object types to and from disk.
>> 
>> Essentially, I have one object which is a list of a bunch of "Entry"
>> objects. The Entry objects have date, time, date, etc fields which I use
>> for analysis techniques. At the very beginning I build up the list of
>> objects then would like to start pickling it while building to save
>> memory. I want to be able to process more entries than I have memory. With
>> a strait list it looks like I could build from xreadlines(), but once you
>> turn it into a more complex object, I don't quick know where to go.
>> 
>> I understand how to pickle the entire data structure, but I need something
>> that will manage the memory/disk allocation?  Any thoughts?
>
>You can write multiple pickled objects into a single file:
>
>import cPickle as pickle
>
>def dump(filename, items):
>    with open(filename, "wb") as out:
>        dump = pickle.Pickler(out).dump
>        for item in items:
>            dump(item)
>
>def load(filename):
>    with open(filename, "rb") as instream:
>        load = pickle.Unpickler(instream).load
>        while True:
>            try:
>                item = load()
>            except EOFError:
>                break
>            yield item
>
>if __name__ == "__main__":
>    filename = "tmp.pickle"
>    from collections import namedtuple
>    T = namedtuple("T", "alpha beta")
>    dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>    for item in load(filename):
>        print item
>
>To get random access you'd have to maintain a list containing the offsets of 
>the entries in the file.
>However, a simple database like SQLite is probably sufficient for the kind 
>of entries you have in mind, and it allows operations like aggregation, 
>sorting and grouping out of the box.
>
>Peter
>
>
>
>On Wed, Jan 12, 2011 at 10:59 AM, Matty Sarro <msarro at gmail.com> wrote:
>> As of now here is my situation:
>> I am working on a system to aggregate IT data and logs. A number of
>> important data are gathered by a third party system. The only
>> immediate way I have to access the data is to have their system
>> automatically email me updates in CSV format every hour. If I set up a
>> mail client on the server, this shouldn't be a huge issue.
>>
>> However, is there a way to automatically open the emails, and copy the
>> attachments to a directory based on the filename? Kind of a weird
>> project, I know. Just looking for some ideas hence posting this on two
>> lists.
>
>Parsing out email attachments:
>http://docs.python.org/library/email.parser.html
>http://docs.python.org/library/email.message.html#module-email.message
>
>Parsing the extension from a filename:
>http://docs.python.org/library/os.path.html#os.path.splitext
>
>Retrieving email from a mail server:
>http://docs.python.org/library/poplib.html
>http://docs.python.org/library/imaplib.html
>
>You could poll for new messages via a cron job or the `sched` module
>(http://docs.python.org/library/sched.html ). Or if the messages are
>being delivered locally, you could use inotify bindings or similar to
>watch the appropriate directory for incoming mail. Integration with a
>mail server itself is also a possibility, but I don't know much about
>that.
>
>Cheers,
>Chris
>--
>http://blog.rebertia.com
>
>
>In article <0d7143ca-45cf-44c3-9e8d-acb867c52037 at f30g2000yqa.googlegroups.com>,
>Daniel da Silva  <ddasilva at umd.edu> wrote:
>>
>>I have come across a task where I would like to scan a short 20-80
>>character line of text for instances of "<verb> <noun>". Ideally
>><verb> could be of any tense.
>
>In Soviet Russia, <noun> <verbs> you!
>-- 
>Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/
>
>"Think of it as evolution in action."  --Tony Rand
>
>
>In case you still need help:
>
>- # Set the initial values
>- the_number= random.randrange(100) + 1
>- tries = 0
>- guess = None
>-    
>- # Guessing loop
>- while guess != the_number and tries < 7:
>-     guess = int(raw_input("Take a guess: "))
>-     if guess > the_number:
>-         print "Lower..."
>-     elif guess < the_number:
>-         print "Higher..."
>-     tries += 1
>- 
>- # did the user guess correctly to make too many guesses?
>- if guess == the_number:
>-     print "You guessed it! The number was", the_number
>-     print "And it only took you", tries, "tries!\n"
>- else:
>-     print "Wow you suck! It should only take at most 7 tries!"
>- 
>- raw_input("Press Enter to exit the program.")
>
>
>Been digging ever since I posted this. I suspected that the response might
>be use a database. I am worried I am trying to reinvent the wheel. The
>problem is I don't want any dependencies and I also don't need persistence
>program runs. I kind of wanted to keep the use of petit very similar to cat,
>head, awk, etc. But, that said, I have realized that if I provide the
>analysis features as an API, you very well, might want persistence between
>runs.
>
>What about using an array inside a shelve?
>
>Just got done messing with this in python shell:
>
>import shelve
>
>d = shelve.open(filename="/root/test.shelf", protocol=-1)
>
>d["log"] = ()
>d["log"].append("test1")
>d["log"].append("test2")
>d["log"].append("test3")
>
>Then, always interacting with d["log"], for example:
>
>for i in d["log"]:
>    print i
>
>Thoughts?
>
>
>I know this won't manage memory, but it will keep the footprint down right?
>On Wed, Jan 12, 2011 at 5:04 PM, Peter Otten <__peter__ at web.de> wrote:
>
>> Scott McCarty wrote:
>>
>> > Sorry to ask this question. I have search the list archives and googled,
>> > but I don't even know what words to find what I am looking for, I am just
>> > looking for a little kick in the right direction.
>> >
>> > I have a Python based log analysis program called petit (
>> > http://crunchtools.com/petit). I am trying to modify it to manage the
>> main
>> > object types to and from disk.
>> >
>> > Essentially, I have one object which is a list of a bunch of "Entry"
>> > objects. The Entry objects have date, time, date, etc fields which I use
>> > for analysis techniques. At the very beginning I build up the list of
>> > objects then would like to start pickling it while building to save
>> > memory. I want to be able to process more entries than I have memory.
>> With
>> > a strait list it looks like I could build from xreadlines(), but once you
>> > turn it into a more complex object, I don't quick know where to go.
>> >
>> > I understand how to pickle the entire data structure, but I need
>> something
>> > that will manage the memory/disk allocation?  Any thoughts?
>>
>> You can write multiple pickled objects into a single file:
>>
>> import cPickle as pickle
>>
>> def dump(filename, items):
>>    with open(filename, "wb") as out:
>>        dump = pickle.Pickler(out).dump
>>        for item in items:
>>            dump(item)
>>
>> def load(filename):
>>    with open(filename, "rb") as instream:
>>        load = pickle.Unpickler(instream).load
>>        while True:
>>            try:
>>                item = load()
>>            except EOFError:
>>                break
>>            yield item
>>
>> if __name__ == "__main__":
>>    filename = "tmp.pickle"
>>    from collections import namedtuple
>>    T = namedtuple("T", "alpha beta")
>>    dump(filename, (T(a, b) for a, b in zip("abc", [1,2,3])))
>>    for item in load(filename):
>>        print item
>>
>> To get random access you'd have to maintain a list containing the offsets
>> of
>> the entries in the file.
>> However, a simple database like SQLite is probably sufficient for the kind
>> of entries you have in mind, and it allows operations like aggregation,
>> sorting and grouping out of the box.
>>
>> Peter
>>
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>>
>
>-- 
>http://mail.python.org/mailman/listinfo/python-list


More information about the Python-list mailing list