question on regular expressions

Robert Brewer fumanchu at amor.org
Fri Dec 3 13:16:08 EST 2004


Darren Dale wrote:
> I'm stuck. I'm trying to make this:
> 
> file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C
> %5Cfolderx%5Cfoldery%5Cmydoc2.pdf
> 
> (no linebreaks) look like this:
> 
> ./mydoc1.pdf,./mydoc2.pdf

Regular expressions are much easier to write when you only have to worry
about single characters. So the first step might be to replace all of
the %5C's with \:

>>> a
'file://C:%5Cfolder1%5Cfolder2%5Cmydoc1.pdf,file://C%5Cfolderx%5Cfoldery
%5Cmydoc2.pdf'
>>> a = a.replace("%5C", "\\")
>>> a
'file://C:\\folder1\\folder2\\mydoc1.pdf,file://C\\folderx\\foldery\\myd
oc2.pdf'


Then you can use something like:

>>> re.findall(r"([^\\]*\.[^,]*)(?:,|$)", a)
['mydoc1.pdf', 'mydoc2.pdf']

...or Sean Ross' suggestion about urllib.


Robert Brewer
MIS
Amor Ministries
fumanchu at amor.org



More information about the Python-list mailing list