[Tutor] Stuck: unicode in regular expressions
Kent Johnson
kent37 at tds.net
Tue Aug 9 17:01:04 CEST 2005
Ron Phillips wrote:
> I am expecting users to cut-and-paste DMS data into an application —
> like: +40 30 15 E40 15 34.56, -81 0 0, 81 57 34.27E, W 40° 13’
> 27.343”, 40° 13’ 27.343” S, 140° 13’ 27.343”S, S40° 13’ 27.34454,
> 81:57:34.27E
>
> I've been able to write a regex that seems to work in redemo.py, but it
> doesn't do at all what I want when I try to code it using the re module.
> The problem seems to be the way I am using unicode — specifically all
> those punctuation marks that might get pasted in. I anticipate the
> program getting its input from a browser; maybe that will narrow down
> the range somewhat.
I'm guessing a bit here, but you have to know what encoding you are getting from the browser. If the input is from a form, I think you will get back results in the same encoding as the page containing the form. Then I think you can either
- convert the form data to unicode and use unicode in the regex, or
- use the same encoding for the regex as the form data
A good way to start would be to
print repr(formdata)
that will show you exactly what is in the data.
Kent
>
> Anyway, given the string above, what regex will match the ” and ’
> characters, please? I have tried \x02BC and \x92 and \x2019 for the ’ ,
> but no result. I am sure it's simple; I am sure some other newbie has
> asked it, but I have Googled my brains out, and can't find it.
>
> Ron
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
More information about the Tutor
mailing list