efficient data loading with Python, is that possible possible?
igor.tatarinov at gmail.com
igor.tatarinov at gmail.com
Wed Dec 12 19:44:01 EST 2007
On Dec 12, 4:03 pm, John Machin <sjmac... at lexicon.net> wrote:
> Inside your function
> [you are doing all this inside a function, not at global level in a
> script, aren't you?], do this:
> from time import mktime, strptime # do this ONCE
> ...
> blahblah = int(mktime(strptime(s, "%m%d%y%H%M%S")))
>
> It would help if you told us what platform, what version of Python,
> how much memory, how much swap space, ...
>
> Cheers,
> John
I am using a global 'from time import ...'. I will try to do that
within the
function and see if it makes a difference.
The computer I am using has 8G of RAM. It's a Linux dual-core AMD or
something like that. Python 2.4
Here is some of my code. Tell me what's wrong with it :)
def loadFile(inputFile, loader):
# .zip files don't work with zlib
f = popen('zcat ' + inputFile)
for line in f:
loader.handleLine(line)
...
In Loader class:
def handleLine(self, line):
# filter out 'wrong' lines
if not self._dataFormat(line): return
# add a new output record
rec = self.result.addRecord()
for col in self._dataFormat.colFormats:
value = parseValue(line, col)
rec[col.attr] = value
And here is parseValue (will using a hash-based dispatch make it much
faster?):
def parseValue(line, col):
s = line[col.start:col.end+1]
# no switch in python
if col.format == ColumnFormat.DATE:
return Format.parseDate(s)
if col.format == ColumnFormat.UNSIGNED:
return Format.parseUnsigned(s)
if col.format == ColumnFormat.STRING:
# and-or trick (no x ? y:z in python 2.4)
return not col.strip and s or rstrip(s)
if col.format == ColumnFormat.BOOLEAN:
return s == col.arg and 'Y' or 'N'
if col.format == ColumnFormat.PRICE:
return Format.parseUnsigned(s)/100.
And here is Format.parseDate() as an example:
def parseDate(s):
# missing (infinite) value ?
if s.startswith('999999') or s.startswith('000000'): return -1
return int(mktime(strptime(s, "%y%m%d")))
Hopefully, this should be enough to tell what's wrong with my code.
Thanks again,
igor
More information about the Python-list
mailing list