pattern matching

Thu Feb 24 11:04:22 EST 2011

On Feb 24, 2:11 am, monkeys paw <mon... at joemoney.net> wrote:
> if I have a string such as '<td>01/12/2011</td>' and i want
> to reformat it as '20110112', how do i pull out the components
> of the string and reformat them into a YYYYDDMM format?
>
> I have:
>
> import re
>
> test = re.compile('\d\d\/')
> f = open('test.html')  # This file contains the html dates
> for line in f:
>      if test.search(line):
>          # I need to pull the date components here

I second using an html parser to extact the content of the TD's, but I
would also go one step further reformatting and do something such as:

>>> from time import strptime, strftime
>>> d = '01/12/2011'
>>> strftime('%Y%m%d', strptime(d, '%m/%d/%Y'))
'20110112'

That way you get some validation about the data, ie, if you get
'13/12/2011' you've probably got mixed data formats.

hth

Jon.