pattern matching

Dr Vangel 470e8b8c35950 at poster.grepler.com
Thu Feb 24 01:26:21 EST 2011


>
>if I have a string such as '<td>01/12/2011</td>' and i want
>to reformat it as '20110112', how do i pull out the components
>of the string and reformat them into a YYYYDDMM format?
>
>I have:
>
>import re
>
>test = re.compile('dd/')
>f = open('test.html')  # This file contains the html dates
>for line in f:
>     if test.search(line):
>         # I need to pull the date components here

I am no python guru but you could use beautifulsoup to parse html as its 
much easier

some untested pseudocode below. adapt to your needs.

from BeautifulSoup import BeautifulSoup

#read html data or whatever source
html_data = open('/yourwebsite/page.html','r').read() 

#Create the soup object from the HTML data
soup = new BeautifulSoup(html_data)
someData = soup.find('td',name='someTable') 
#Find the proper tag see beautifulsoup docs
value = someData.attrs[2][1] # the value of 3rd attrib of the tag , just 
an example

##end

now when you have the date in some str format the next thing is your date 
conversion. For this
re fer to dateutil parse http://labix.org/python-dateutil

hope it help.




----------------------------
posted via Grepler.com -- poster is authenticated.
begin 644 
end




More information about the Python-list mailing list