[Tutor] re question

Kent Johnson kent37 at tds.net
Sat Mar 26 12:51:28 CET 2005


I don't know why this isn't working for you but this worked for me at a DOS console:
  >>> s='850hPa±'
  >>> s
'850hPa\xf1'
  >>> import re
  >>> re.sub('\xf1', '*', s)
'850hPa*'
  >>> import sys
  >>> sys.stdout.encoding
'cp437'

and also in IDLE with a different encoding:
 >>> s='850hPa±'
 >>> s
'850hPa\xb1'
 >>> import re
 >>> re.sub('\xb1', '*', s)
'850hPa*'
 >>> import sys
 >>> sys.stdout.encoding
'cp1252'

So one guess is that the data is in a different encoding than what you expect? When you print the 
string and get '\xb1', is that in the same program that is doing the regex?

Another approach would be to just pull out the numbers and ignore everything else:
  >>> s='Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts'
  >>> l=s.split(',')
  >>> l
['Std Lvl:  850hPa', '     1503m', '  16.8C', '  15.7C', ' 205 @ 11kts']
  >>> [ re.search(r'[\d\.]+', i).group() for i in l]
['850', '1503', '16.8', '15.7', '205']

Kent

Ertl, John wrote:
> All
> 
> I have a string that has a bunch of numbers with the units attached to them.
> I want to strip off the units.  I am using a regular expression and sub to
> do this.  This works great for almost all of the cases.  
> 
> These are the type of lines:
> 
> SigWind:  857hPa,          ,  21.0C,  20.1C, 210 @  9kts
> SigWind:  850hPa±,         ,       ,       , 205 @ 11kts
> Std Lvl:  850hPa,     1503m,  16.8C,  15.7C, 205 @ 11kts
> 
> I am using the following cleanstring = re.compile( '(hPa|hPa\xb1|m|C|kts)'
> ).  And then the cleanstring.sub("",line).  I have tried using numerous \ to
> escape the \xb1.
> 
> I also tried replacing all non numeric characters that are part of a
> number-character string but I could not make that work. The idea was replace
> all non-number characters in a "word" that is made up of numbers followed by
> numbers.
> 
> I then split the line at the commas so in the current thinking I need the
> commas for the split.  How do I deal with the hPa±?  When I print it out it
> looks like it is a hexadecimal escape character (\xb1) but I am note sure
> how to deal with this.
> 
> Any ideas?
> 
> Thanks
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 



More information about the Tutor mailing list