String character encoding when converting data from one type/format to another

Jacob Kruger jacob at blindza.co.za
Wed Jan 7 06:04:16 EST 2015


I'm busy using something like pyodbc to pull data out of MS access .mdb files, and then generate .sql script files to execute against MySQL databases using MySQLdb module, but, issue is forms of characters in string values that don't fit inside the 0-127 range - current one seems to be something like \xa3, and if I pass it through ord() function, it comes out as character number 163.

Now issue is, yes, could just run through the hundreds of thousands of characters in these resulting strings, and strip out any that are not within the basic 0-127 range, but, that could result in corrupting data - think so anyway.

Anyway, issue is, for example, if I try something like str('\xa3').encode('utf-8') or str('\xa3').encode('ascii'), or str('\xa3').encode('latin7') - that last one is actually our preferred encoding for the MySQL database - they all just tell me they can't work with a character out of range.

Any thoughts on a sort of generic method/means to handle any/all characters that might be out of range when having pulled them out of something like these MS access databases?

Another side note is for binary values that might store binary values, I use something like the following to generate hex-based strings that work alright when then inserting said same binary values into longblob fields, but, don't think this would really help for what are really just most likely badly chosen copy/pasted strings from documents, with strange encoding, or something:
#sample code line for binary encoding into string output
s_values += "0x" + str(l_data[J][I]).encode("hex").replace("\\", "\\\\") + ", "

TIA

Jacob Kruger
Blind Biker
Skype: BlindZA
"Roger Wilco wants to welcome you...to the space janitor's closet..."
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20150107/acb3d9d9/attachment.html>


More information about the Python-list mailing list