Representing ambiguity in datetime?
Ron Adam
rrr at ronadam.com
Tue May 17 22:12:11 EDT 2005
John Machin wrote:
> On Tue, 17 May 2005 17:38:30 -0500, Terry Hancock
> <hancock at anansispaceworks.com> wrote:
>
>
>>What do you do when a date or time is
>>incompletely specified? ISTM, that as it is, there is no
>>formal way to store this --- you have to guess, and there's
>>no way to indicate that the guess is different from solid
>>information. As a result, I have sometimes had to abandon
>>datetime, even though it seemed like the logical choice for
>>representing data.
>>
>>E.g. I might have information like "this paper was published
>>in May 1997". There's no way to write that with datetime,
>>is there? Even if I just use the "date" object instead of
>>datetime, I still have to actually specify something like
>>May 1, 1997 --- fabricating data, which is frequently
>>undesireable (later on, I might find information saying that
>>it was actually published May 23, 1997 and I might want
>>to update the earlier one, or simply evaluate them as
>>"equal" since they are, to within the precision given ---
>>for example, I might be trying to decide that two database
>>entries are really duplicate references to the same paper).
>>
>>I know that this is somewhat theoretically stated, but I
>>have run into to concrete problems along the lines of
>>the above.
>>
>>I'd say this is analogous to how you might use "None"
>>rather than "0" to represent an integer if you don't know
>>it's value (rather than knowing that it is zero). ISTM, you
>>ought to be able to specify a date as, e.g.:
>>
>>d = datetime.date(2005, 5, None)
>>
>>I realize there might be some complexity with deciding
>>how to handle datestamp math, but as this situation
>>occurs frequently in real life, it seems like it shouldn't
>>be avoided.
>>
>>How do other people deal with this kind of problem?
>
>
> Mostly, badly :-(
>
> Real-life example: due to war-time disruption etc, in some countries
> it is common enough to find that the date of birth of someone born in
> the 1940s is not known precisely. E.g. on the Hong Kong identity card,
> it is possible to find only the year and month of birth, and sometimes
> even only the year. Depending on the purpose, legislation and
> convention will take the first day of the vague period or the last day
> when a calculation is required. Badly == entering into a database the
> "exact" date that was used for the purpose du jour, with no indication
> that the source was vague. Consequently a person can have DOB recorded
> as 1945-01-01 on one database and 1945-12-31 on another.
>
> Suggested approach in Python (sketch): Don't try to get the datetime
> module to solve the problem. Define a fuzzydate class. Internal
> representation: I'd suggest earliest possible date and latest possible
> date. That way you have valid date instances for doing date
> arithmetic. May have different constructors depending on how the
> incoming vagueness is specified.
>
> HTH,
> John
This is a very common problem in genealogy research as well as other
sciences that deal with history, such as geology, geography, and archeology.
I agree that some standard way of dealing with fuzzy dates would be a
good thing. I think looking at how others do it would be the way to
start...
A google search found the following reference buried in a long reference
page on mysql.
http://www.dreamlink.net/mysql/manual_Functions.html
> The reason the ranges for the month and day specifiers begin
with zero is that MySQL allows incomplete dates such as
'2004-00-00' to be stored as of MySQL 3.23.
So it seems using 0's for the missing day or month may be how to do it.
Cheers,
_Ron
More information about the Python-list
mailing list