best way to handle this in Python

Rita rmorgan466 at gmail.com
Fri Jul 20 06:34:47 EDT 2012


Thats an interesting data structure Dennis. I will actually be running this
type of query many times preferable in an ad-hoc environment. That makes it
tough for sqlite3 since there will be several hundred thousand tuples.



On Fri, Jul 20, 2012 at 12:18 AM, Dennis Lee Bieber
<wlfraed at ix.netcom.com>wrote:

> {NOTE: preferences for comp.lang.python are to follow the RFC on
> "netiquette" -- that is, post comments /under/ quoted material, trimming
> what is not relevant... I've restructured this reply to match}
>
> On Thu, 19 Jul 2012 21:28:12 -0400, Rita <rmorgan466 at gmail.com>
> declaimed the following in gmane.comp.python.general:
>
> >
> >
> > On Thu, Jul 19, 2012 at 8:52 PM, Dave Angel <d at davea.name> wrote:
> >
> > > On 07/19/2012 07:51 PM, Rita wrote:
> > > > Hello,
> > > >
> > > > I have data in many files (/data/year/month/day/) which are named
> like
> > > > YearMonthDayHourMinute.gz.
> > > >
> > > > I would like to build a data structure which can easily handle
> querying
> > > the
> > > > data. So for example, if I want to query data from 3 weeks ago till
> > > today,
> > > > i can do it rather quickly.
> > > >
> > > > each YearMonthDayHourMinute.gz file look like this and they are
> about 4to
> > > > 6kb
> > > > red 34
> > > > green 44
> > > > blue 88
> > > > orange 4
> > > > black 3
> > > > while 153
> > > >
> > > > I would like to query them so I can generate a plot rather quickly
> but
> > > not
> > > > sure what is the best way to do this.
> > > >
> > > >
> > > >
> > >
> > > What part of your code is giving you difficulty?  You didn't post any
> > > code.  You don't specify the OS, nor version of your Python, nor what
> > > other programs you expect to use along with Python.
> > >
> > Using linux 2.6.31; Python 2.7.3.
> > I am not necessary looking for code just a pythonic way of doing it.
> > Eventually, I would like to graph the data using matplotlib
> >
> >
>         Which doesn't really answer the question. After all, since the
> source data is already in date/time-stamped files, a simple, sorted,
> "glob" of files within a desired span would answer the requirement.
>
>         But -- it would mean that you reparse the files for each processing
> run.
>
>         An alternative would be to run a pre-processor that parses the
> files
> into, say, an SQLite3 database (and which can determine, from the
> highest datetime entry in the database, which /new/ files need to be
> parsed on subsequent runs). Then do the query/plotting from a second
> program which retrieves data from the database.
>
>         But if this is a process that only needs to be run once, or at rare
> intervals, maybe you only need to parse the files into an in-memory data
> structure... Say a list of tuples of the form:
>
>         [       (datetime, {color: value, color2: value2, ...}),
> (datetime2,
> ...) ]
>
> --
>         Wulfraed                 Dennis Lee Bieber         AF6VN
>         wlfraed at ix.netcom.com    HTTP://wlfraed.home.netcom.com/
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



-- 
--- Get your facts first, then you can distort them as you please.--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20120720/3ea07b8d/attachment.html>


More information about the Python-list mailing list