question on regular expressions
Robert Brewer
fumanchu at amor.org
Fri Dec 3 13:16:08 EST 2004
Darren Dale wrote:
> I'm stuck. I'm trying to make this:
>
> file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
> %5Cfolderx%5Cfoldery%5Cmydoc2.pdf
>
> (no linebreaks) look like this:
>
> ./mydoc1.pdf,./mydoc2.pdf
Regular expressions are much easier to write when you only have to worry
about single characters. So the first step might be to replace all of
the %5C's with \:
>>> a
'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C%5Cfolderx%5Cfoldery
%5Cmydoc2.pdf'
>>> a = a.replace("%5C", "\\")
>>> a
'file://C:\\folder1\\folder2\\mydoc1.pdf,file://C\\folderx\\foldery\\myd
oc2.pdf'
Then you can use something like:
>>> re.findall(r"([^\\]*\.[^,]*)(?:,|$)", a)
['mydoc1.pdf', 'mydoc2.pdf']
...or Sean Ross' suggestion about urllib.
Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org
More information about the Python-list
mailing list