[Tutor] superscripts in a regex
Albert-Jan Roskam
fomcl at yahoo.com
Wed Jul 31 12:15:00 CEST 2013
Hi,
In the script below I want to filter out the digits and I do not want to retain the decimal grouping symbol, if there are any. The weird thing is that re.findall returns the expected result (group 1 with digits and optionally group2 too), but re.sub does not (it just returns the entire string). I tried using flags re.LOCALE, re.UNICODE, and re.DEBUG for solutions/clues, but no luck
# -*- coding: utf-8 -*-
#Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] on win32
import re
regex = "(^\d+)[.,]?(\d*)[ \w]+"
surfaces = ["79 m\xb2", "1.000 m\xb2", "2,000 m\xb2"]
for i, surface in enumerate(surfaces):
#surface = surface.replace("\xb2", "2") # solves it but maybe, some day, there will me (tm), (c), etc symbols!
print str(i).center(79, "-")
print re.sub(regex, r"\1\2", surface) # huh?!
print re.findall(regex, surface) # works as expected
It's a no-no to ask this (esp. because it concerns a builtin) but: is this a b-u-g?
Regards,
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
More information about the Tutor
mailing list