Representing ambiguity in datetime?

Ron Adam rrr at ronadam.com
Tue May 17 22:12:11 EDT 2005


John Machin wrote:

> On Tue, 17 May 2005 17:38:30 -0500, Terry Hancock
> <hancock at anansispaceworks.com> wrote:
> 
> 
>>What do you do when a date or time is
>>incompletely specified?  ISTM, that as it is, there is no
>>formal way to store this --- you have to guess, and there's
>>no way to indicate that the guess is different from solid
>>information.  As a result, I have sometimes had to abandon
>>datetime, even though it seemed like the logical choice for
>>representing data.
>>
>>E.g. I might have information like "this paper was published
>>in May 1997".  There's no way to write that with datetime,
>>is there?  Even if I just use the "date" object instead of 
>>datetime, I still have to actually specify something like 
>>May 1, 1997 --- fabricating data, which is frequently
>>undesireable (later on, I might find information saying that
>>it was actually published May 23, 1997 and I might want
>>to update the earlier one, or simply evaluate them as 
>>"equal" since they are, to within the precision given --- 
>>for example, I might be trying to decide that two database
>>entries are really duplicate references to the same paper).
>>
>>I know that this is somewhat theoretically stated, but I 
>>have run into to concrete problems along the lines of
>>the above.
>>
>>I'd say this is analogous to how you might use "None"
>>rather than "0" to represent an integer if you don't know
>>it's value (rather than knowing that it is zero).  ISTM, you
>>ought to be able to specify a date as, e.g.:
>>
>>d = datetime.date(2005, 5, None)
>>
>>I realize there might be some complexity with deciding
>>how to handle datestamp math, but as this situation
>>occurs frequently in real life, it seems like it shouldn't
>>be avoided.
>>
>>How do other people deal with this kind of problem?
> 
> 
> Mostly, badly :-(
> 
> Real-life example: due to war-time disruption etc, in some countries
> it is common enough to find that the date of birth of someone born in
> the 1940s is not known precisely. E.g. on the Hong Kong identity card,
> it is possible to find only the year and month of birth, and sometimes
> even only the year. Depending on the purpose, legislation and
> convention will take the first day of the vague period or the last day
> when a calculation is required. Badly == entering into a database the
> "exact" date that was used for the purpose du jour, with no indication
> that the source was vague. Consequently a person can have DOB recorded
> as 1945-01-01 on one database and 1945-12-31 on another.
> 
> Suggested approach in Python (sketch): Don't try to get the datetime
> module to solve the problem. Define a fuzzydate class. Internal
> representation: I'd suggest earliest possible date and latest possible
> date. That way you have valid date instances for doing date
> arithmetic. May have different constructors depending on how the
> incoming vagueness is specified. 
> 
> HTH,
> John


This is a very common problem in genealogy research as well as other 
sciences that deal with history, such as geology, geography, and archeology.

I agree that some standard way of dealing with fuzzy dates would be a 
good thing.  I think looking at how others do it would be the way to 
start...

A google search found the following reference buried in a long reference 
page on mysql.

http://www.dreamlink.net/mysql/manual_Functions.html

> The reason the ranges for the month and day specifiers begin
with zero is that MySQL allows incomplete dates such as
'2004-00-00' to be stored as of MySQL 3.23.


So it seems using 0's for the missing day or month may be how to do it.

Cheers,
_Ron





More information about the Python-list mailing list