Parsing text into dates?
George Sakkis
gsakkis at rutgers.edu
Mon May 16 20:51:31 EDT 2005
"Thomas W" wrote:
> I'm developing a web-application where the user sometimes has to
enter
> dates in plain text, allthough a format may be provided to give
clues.
> On the server side this piece of text has to be parsed into a
datetime
> python-object. Does anybody have any pointers on this?
>
> Besides the actual parsing, my main concern is the different locale
> date formats and how to be able to parse those strange us-like
> "month/day/year" compared to the clever and intuitive european-style
> "day/month/year" etc.
>
> I've searched google, but haven't found any good referances that
helped
> me solve this problem, especially with regards to the locale date
> format issues.
>
> Best regards,
> Thomas
Although it is not a solution to the general localization problem, you
may try the mx.DateTimeFrom() factory function
(http://www.egenix.com/files/python/mxDateTime.html#DateTime) for the
parsing part. I had also written some time ago a more robust and
customized version of such parser. The ambiguous us/european style
dates are disambiguated by the provided optional argument USA (False by
default <wink>). Below is the doctest and the documentation (with
epydoc tags); mail me offlist if you'd like to check it out.
George
#=======================================================
def parseDateTime(string, USA=False, implyCurrentDate=False,
yearHeuristic=_20thcenturyHeuristic):
'''Tries to parse a string as a valid date and/or time.
It recognizes most common (and less common) date and time formats.
Examples:
>>> # doctest was run succesfully on...
>>> str(datetime.date.today())
'2005-05-16'
>>> str(parseDateTime('21:23:39.91'))
'21:23:39.910000'
>>> str(parseDateTime('16:15'))
'16:15:00'
>>> str(parseDateTime('10am'))
'10:00:00'
>>> str(parseDateTime('2:7:18.'))
'02:07:18'
>>> str(parseDateTime('08:32:40 PM'))
'20:32:40'
>>> str(parseDateTime('11:59pm'))
'23:59:00'
>>> str(parseDateTime('12:32:9'))
'12:32:09'
>>> str(parseDateTime('12:32:9', implyCurrentDate=True))
'2005-05-16 12:32:09'
>>> str(parseDateTime('93/7/18'))
'1993-07-18'
>>> str(parseDateTime('15.6.2001'))
'2001-06-15'
>>> str(parseDateTime('6.15.2001'))
'2001-06-15'
>>> str(parseDateTime('1980, November 20'))
'1980-11-20'
>>> str(parseDateTime('4 Mar 79'))
'1979-03-04'
>>> str(parseDateTime('July 4'))
'2005-07-04'
>>> str(parseDateTime('15/08'))
'2005-08-15'
>>> str(parseDateTime('5 Mar 3:45pm'))
'2005-03-05 15:45:00'
>>> str(parseDateTime('01 02 2003'))
'2003-02-01'
>>> str(parseDateTime('01 02 2003', USA=True))
'2003-01-02'
>>> str(parseDateTime('3/4/92'))
'1992-04-03'
>>> str(parseDateTime('3/4/92', USA=True))
'1992-03-04'
>>> str(parseDateTime('12:32:09 1-2-2003'))
'2003-02-01 12:32:09'
>>> str(parseDateTime('12:32:09 1-2-2003', USA=True))
'2003-01-02 12:32:09'
>>> str(parseDateTime('3:45pm 5 12 2001'))
'2001-12-05 15:45:00'
>>> str(parseDateTime('3:45pm 5 12 2001', USA=True))
'2001-05-12 15:45:00'
@param USA: Disambiguates strings that are valid dates in both
(month,
day, year) and (day, month, year) order (e.g. 05/03/2002). If
True,
the first format is assumed.
@param implyCurrentDate: If True and the date is not given, the
current
date is implied.
@param yearHeuristic: If not None, a callable f(year) that
transforms the
value of the given year. The default heuristic transforms
2-digit
years to 4-digit years assuming they are in the 20th century::
lambda year: (year >= 100 and year
or year >= 10 and 1900 + year
or None)
The heuristic should return None if the year is not considered
valid.
If yearHeuristic is None, no year transformation takes place.
@return:
- C{datetime.date} if only the date is recognized.
- C{datetime.time} if only the time is recognized and
implyCurrentDate
is False.
- C{datetime.datetime} if both date and time are recognized.
@raise ValueError: If the string cannot be parsed successfully.
'''
More information about the Python-list
mailing list