How Can I Increase the Speed of a Large Number of Date Conversions

Josiah Carlson josiah.carlson at sbcglobal.net
Fri Jun 8 01:29:04 EDT 2007


Some Other Guy wrote:
> vdicarlo wrote:
>> I am a programming amateur and a Python newbie who needs to convert
>> about 100,000,000 strings of the form "1999-12-30" into ordinal dates
>> for sorting, comparison, and calculations. Though my script does a ton
>> of heavy calculational lifting (for which numpy and psyco are a
>> blessing) besides converting dates, it still seems to like to linger
>> in the datetime and time libraries.  (Maybe there's a hot module in
>> there with a cute little function and an impressive set of
>> attributes.)
> ...
>> dateTuple = time.strptime("2005-12-19", '%Y-%m-%d')
>>             dateTuple = dateTuple[:3]
>>             date = datetime.date(dateTuple[0], dateTuple[1],
>> dateTuple[2])
>>             ratingDateOrd = date.toordinal()
> 
> There's nothing terribly wrong with that, although strptime() is overkill
> if you already know the date format.  You could get the date like this:
> 
>    date = apply(datetime.date, map(int, "2005-12-19".split('-')))
> 
> But, more importantly... 100,000,000 individual dates would cover 274000
> years!  Do you really need that much??  You could just precompute a
> dictionary that maps a date string to the ordinal for the last 50 years
> or so. That's only 18250 entries, and can be computed in less than a second.
> Lookups after that will be near instantaneous:
> 
> 
>  import datetime
> 
>  days = 365*50 # about 50 years worth
>  dateToOrd = {} # dict. of date string to ordinal
...

Then there's the argument of "why bother using real dates?"  I mean, all 
that is necessary is a mapping of date -> number for sorting.  Who needs 
accuracy?

for date in inp:
     y, m, d = map(int, date.split('-'))
     ordinal = (y-1990)*372 + (m-1)*31 + d-1

Depending on the allowable range of years, one could perhaps adjust the 
1990 up, and get the range of date ordinals down to about 12 bits (if 
one packs netflix data properly, you can get everything in memory). 
With a bit of psyco, the above is pretty speedy.

  - Josiah



More information about the Python-list mailing list