Encoding troubles

JB jbravado at gmail.com
Mon May 17 18:10:25 EDT 2010


I'm working on the webapp of our company intranet and I had a question
about proper handling of user input that's causing encoding issues.

Some of the uesrs take notes in Microsoft Office and copy/paste these
into textarea's of the webapp. Some of the characters from Word such
as hypens (–) and apostrophes (’) are in an odd encoding. When passed
to the database using sqlalchemy they appear as – and other
characters.

What's the proper handling (conversion?) of user input before it gets
to my database. Do I need to start making a list of the offending
characters and .replace them? Or is there a means to decode/encode the
user input to something more generic? Thanks for your time.



More information about the Python-list mailing list