[Baypiggies] custom date parser

Chris Clark Chris.Clark at ingres.com
Wed Sep 3 21:29:22 CEST 2008


Also consider limiting the number of splits, to minimize processing, i.e.:

    From: ...raw_pubdate.split('-').....
    To: ...raw_pubdate.split('-', 3).....

I ended up using a regex for my apps:

    # yyyy-mm-dd (ISO 8601 style) but only the date (no time, and no 
week), fairly strict, e.g. expect 2 digits for month and day
    ISO_regex_str = r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})'

Chris

On 9/3/2008 11:22 AM, Alex Martelli wrote:
> Assuming you want exactly the functionality you've coded (as opposed
> to a more general fuzzy parse as Anna suggests), the key to making it
> more compact is seeing that what you're doing in your code is an
> unrolled loop -- and rolling it up again.  E.g., after the initial
> assignment to parts, the rest of the code could be:
>
> for i in range(1, 4):
>   try: return datetime.datetime(*parts)
>   except ValueError: parts[-i] = 1
> return None
>
> On Wed, Sep 3, 2008 at 10:12 AM, Aaron Maxwell <amax at redsymbol.net> wrote:
>   
>> Hi all,
>>
>> Below is a function that parses a date string in the form "YYYY-MM-DD"
>> and returns a datetime.date object, or None if it's bad input and
>> cannot be converted.  However, it does a couple of special tricks.
>> The data in certain cases is known to have a value for the day or
>> month that is not in the valid range; e.g., it may be 2007-11-31
>> (November traditionally only has 30 days), or 2002-14-23.  In this
>> situation, I want to keep the most signficant good field(s) and set
>> the lessors to 1, then return the date object from that - so the
>> results of the above would be date(2007, 11, 1) or date(2002, 1, 1)
>> respectively.
>>
>> The function below does this.  It uses a triply-nested try/except
>> block, and I can't shake the feeling that there is a shorter and
>> clearer implementation.  Any thoughts?
>>
>> Of course, one approach would be to manually check that the month and
>> day field before passing them to datetime.date.  I would rather reuse
>> the validation code in the date class, though, for obvious reasons.
>>
>> Thanks in advance,
>> Aaron
>>
>> {{{
>> import datetime
>> def parse_datefield(raw_pubdate):
>>    '''
>>    Parse a datefield
>>    Takes in a date string in the format YYYY-MM-DD.
>>    Returns a datetime.date object.
>>    '''
>>    # ... imagine validation/error checking code here ...
>>    parts = map(int, raw_pubdate.split('-'))
>>    try:
>>        d = datetime.date(*parts)
>>    except ValueError:
>>        # day out of range?
>>        parts[-1] = 1
>>        try:
>>            d = datetime.date(*parts)
>>        except ValueError:
>>            # month out of range?
>>            parts[-2] = 1
>>            try:
>>                d = datetime.date(*parts)
>>            except ValueError:
>>                # give up
>>                d = None
>>    return d
>> }}}
>>
>> --
>> Aaron Maxwell
>> http://redsymbol.net
>> _______________________________________________
>> Baypiggies mailing list
>> Baypiggies at python.org
>> To change your subscription options or unsubscribe:
>> http://mail.python.org/mailman/listinfo/baypiggies
>>
>>     
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> To change your subscription options or unsubscribe:
> http://mail.python.org/mailman/listinfo/baypiggies
>   



More information about the Baypiggies mailing list