Parsing text into dates?

George Sakkis gsakkis at rutgers.edu
Mon May 16 20:51:31 EDT 2005


"Thomas W" wrote:

> I'm developing a web-application where the user sometimes has to
enter
> dates in plain text, allthough a format may be provided to give
clues.
> On the server side this piece of text has to be parsed into a
datetime
> python-object. Does anybody have any pointers on this?
>
> Besides the actual parsing, my main concern is the different locale
> date formats and how to be able to parse those strange us-like
> "month/day/year" compared to the clever and intuitive european-style
> "day/month/year" etc.
>
> I've searched google, but haven't found any good referances that
helped
> me solve this problem, especially with regards to the locale date
> format issues.
>
> Best regards,
> Thomas

Although it is not a solution to the general localization problem, you
may try the mx.DateTimeFrom() factory function
(http://www.egenix.com/files/python/mxDateTime.html#DateTime) for the
parsing part. I had also written some time ago a more robust and
customized version of such parser. The ambiguous us/european style
dates are disambiguated by the provided optional argument USA (False by
default <wink>). Below is the doctest and the documentation (with
epydoc tags); mail me offlist if you'd like to check it out.

George

#=======================================================

def parseDateTime(string, USA=False, implyCurrentDate=False,
                  yearHeuristic=_20thcenturyHeuristic):
    '''Tries to parse a string as a valid date and/or time.

    It recognizes most common (and less common) date and time formats.

    Examples:
        >>> # doctest was run succesfully on...
        >>> str(datetime.date.today())
        '2005-05-16'
        >>> str(parseDateTime('21:23:39.91'))
        '21:23:39.910000'
        >>> str(parseDateTime('16:15'))
        '16:15:00'
        >>> str(parseDateTime('10am'))
        '10:00:00'
        >>> str(parseDateTime('2:7:18.'))
        '02:07:18'
        >>> str(parseDateTime('08:32:40 PM'))
        '20:32:40'
        >>> str(parseDateTime('11:59pm'))
        '23:59:00'
        >>> str(parseDateTime('12:32:9'))
        '12:32:09'
        >>> str(parseDateTime('12:32:9', implyCurrentDate=True))
        '2005-05-16 12:32:09'
        >>> str(parseDateTime('93/7/18'))
        '1993-07-18'
        >>> str(parseDateTime('15.6.2001'))
        '2001-06-15'
        >>> str(parseDateTime('6.15.2001'))
        '2001-06-15'
        >>> str(parseDateTime('1980, November 20'))
        '1980-11-20'
        >>> str(parseDateTime('4 Mar 79'))
        '1979-03-04'
        >>> str(parseDateTime('July 4'))
        '2005-07-04'
        >>> str(parseDateTime('15/08'))
        '2005-08-15'
        >>> str(parseDateTime('5 Mar 3:45pm'))
        '2005-03-05 15:45:00'
        >>> str(parseDateTime('01 02 2003'))
        '2003-02-01'
        >>> str(parseDateTime('01 02 2003', USA=True))
        '2003-01-02'
        >>> str(parseDateTime('3/4/92'))
        '1992-04-03'
        >>> str(parseDateTime('3/4/92', USA=True))
        '1992-03-04'
        >>> str(parseDateTime('12:32:09 1-2-2003'))
        '2003-02-01 12:32:09'
        >>> str(parseDateTime('12:32:09 1-2-2003', USA=True))
        '2003-01-02 12:32:09'
        >>> str(parseDateTime('3:45pm 5 12 2001'))
        '2001-12-05 15:45:00'
        >>> str(parseDateTime('3:45pm 5 12 2001', USA=True))
        '2001-05-12 15:45:00'

    @param USA: Disambiguates strings that are valid dates in both
(month,
        day, year) and (day, month, year) order (e.g. 05/03/2002). If
True,
        the first format is assumed.
    @param implyCurrentDate: If True and the date is not given, the
current
        date is implied.
    @param yearHeuristic: If not None, a callable f(year) that
transforms the
        value of the given year. The default heuristic transforms
2-digit
        years to 4-digit years assuming they are in the 20th century::
            lambda year: (year >= 100 and year
                          or year >= 10 and 1900 + year
                          or None)
        The heuristic should return None if the year is not considered
valid.
        If yearHeuristic is None, no year transformation takes place.
    @return:
        - C{datetime.date} if only the date is recognized.
        - C{datetime.time} if only the time is recognized and
implyCurrentDate
            is False.
        - C{datetime.datetime} if both date and time are recognized.
    @raise ValueError: If the string cannot be parsed successfully.
    '''




More information about the Python-list mailing list