How to grab a number from inside a .html file using regex

MRAB python at mrabarnett.plus.com
Sat Aug 7 15:17:59 EDT 2010


Νίκος wrote:
> On 7 Αύγ, 21:24, MRAB <pyt... at mrabarnett.plus.com> wrote:
> 
>> Use group capture:
>>
>>      found = re.match(r'<!-- (\d+) -->', firstline).group(1)
>>      print(page_id)
> 
> Worked like a charm! Thanks a lot!
> 
> So match method here not only searched for the string representation
> of the number but also convert it to integer as well?
> 
> r stand for retrieve the string here?
> 
> and group?
> 
> Wehn a regex searched a .txt file when is retrieving something for it
> always retrieve it as string right? or can get it as a number as well?

The 'r' prefix makes it a 'raw string literal'. That means that the
string literal won't treat backslashes as special. Before raw string
literals were added to the Python language I would have needed to write:

     '<!-- (\\d+) -->'

instead.

(Actually, that's not strictly true in this case, because \d doesn't
have a special meaning Python strings, but it's a good idea to use raw
string literals habitually when writing regexes in order to reduce the
chance of forgetting them when they _are_ necessary. Well, that's what I
think, anyway. :-))



More information about the Python-list mailing list