Validate string as UTF-8?

Tony Nelson *firstname*nlsnews at georgea*lastname*.com
Sun Nov 6 13:58:50 EST 2005


I'd like to have a fast way to validate large amounts of string data as 
being UTF-8.

I don't see a fast way to do it in Python, though:

    unicode(s,'utf-8').encode('utf-8)

seems to notice at least some of the time (the unicode() part works but 
the encode() part bombs).  I don't consider a RE based solution to be 
fast.  GLib provides a routine to do this, and I am using GTK so it's 
included in there somewhere, but I don't see a way to call GLib 
routines.  I don't want to write another extension module.

Is there a (fast) Python function to validate UTF-8 data?

Is there some other fast way to validate UTF-8 data?

Is there a general way to call GLib functions?
________________________________________________________________________
TonyN.:'                        *firstname*nlsnews at georgea*lastname*.com
      '                                  <http://www.georgeanelson.com/>



More information about the Python-list mailing list