character sets? unicode?

Michael mogmios at mlug.missouri.edu
Thu Feb 3 12:22:22 EST 2005


I'm trying to import text from email I've received, run some regular 
expressions on it, and save the text into a database. I'm trying to 
figure out how to handle the issue of character sets. I've had some 
problems with my regular expressions on email that has interesting 
character sets. Korean text seems to be filled with a lot of '=3D=21' 
type of stuff. This doesn't look like unicode (or am I wrong?) so does 
anyone know how I should handle it? Do I need to do anything special 
when passing text with non-ascii characters to re, MySQLdb, or any other 
libraries? Is it better to save the text as-is in my db and save the 
character set type too or should I try to convert all text to some 
default format like UTF-8? Any advice? Thanks.

-- 
Michael <mogmios at mlug.missouri.edu>
http://kavlon.org




More information about the Python-list mailing list