[Tutor] Regex help please.

SA sarmstrong13@mac.com
Tue, 13 Aug 2002 07:59:48 -0500


On 8/12/02 11:30 PM, "Sean 'Shaleh' Perry" <shalehperry@attbi.com> wrote:

> 
> On 12-Aug-2002 SA wrote:
>> Hi Everyone-
>> 
>> I am trying to match a string pattern like the following:
>> 
>> ?1234.htm
>> 
>> I would then like to extract the 1234 from the pattern and sub 1234.html for
>> the pattern.
>> 
>> 
>> Anyone know how to do this? Is there a simpler way than re?
>> 
> 
> so 1234.htm becomes 1234.html.htm?
> 
I'm sorry. That is not what I meant.

Let's say I have an html document.
In this document is a bunch of links.
Each of these links is comprised of similar but variable patterns:

?1234.htm
?56.htm
?0154398.htm

Notice the only difference in these characters is the number and length of
the number. If you have already guessed, these links were generated
dynamically with a ? Search pattern. (Hence the ? In each) What I would like
to do is search the whole html document for any link that begins with ?, has
a variable number in the middle, followed by a .htm. I would then like to
substitute the whole string (ie. ?1234.htm) for the number and a .html (ie.
1234.html).

For example:
?1234.htm would become 1234.html
?56.htm would become 56.html
?0154398.htm would become 0154398.html
And so on ...

So what I have done so far is the following:

import re
import os

list = os.listdir('.') #lists all html documents in this directory
input = open(list[0], "rb") #this will be changed to iterate over the list
text = input.read()
p = re.compile("\?(\d+).htm", re.M)
result = p.match(text)


Now the last two line were written to test the search pattern "\?(\d+).htm".
This will be changed to something like re.sub("\?(\d+).htm","\\1.html",text)
later to do onestep swapping.

But my problem is that I get the following output:
>>>print result
None


So it seems like it is not traversing the file text and matching the
pattern. 

So with this said, any ideas what I'm doing wrong?

Thanks in advance.
SA


-- 
"I can do everything on my Mac I used to on my PC. Plus a lot more ..."
-Me