a simple unicode question

beSTEfar jens.andersen.privat at gmail.com
Mon Oct 19 15:46:45 EDT 2009


On 19 Okt, 21:07, George Trojan <george.tro... at noaa.gov> wrote:
> A trivial one, this is the first time I have to deal with Unicode. I am
> trying to parse a string s='''48° 13' 16.80" N'''. I know the charset is
> "iso-8859-1". To get the degrees I did
>  >>> encoding='iso-8859-1'
>  >>> q=s.decode(encoding)
>  >>> q.split()
> [u'48\xc2\xb0', u"13'", u'16.80"', u'N']
>  >>> r=q.split()[0]
>  >>> int(r[:r.find(unichr(ord('\xc2')))])
> 48
>
> Is there a better way of getting the degrees?
>
> George

When parsing strings, use Regular Expressions. If you don't know how
to, spend some time teaching yourself how to - well spent time! A
great tool for playing around with REs is KODOS.

For the problem at hand you can e.g.:

  import re
  degrees = int(re.findall('\d+', s)[0])

that in essence will group together all groups of consecutive digits,
return the first group and int() it. No need to care/know about the
fact that the string is Unicode and the underlying coding of the
charset.



More information about the Python-list mailing list