Micro Python -- a lean and efficient implementation of Python 3

Roy Smith roy at panix.com
Thu Jun 5 09:59:03 EDT 2014


In article <f935e85f-f86a-4821-86ab-3ab7e5e216d7 at googlegroups.com>,
 Rustom Mody <rustompmody at gmail.com> wrote:

> On Thursday, June 5, 2014 12:12:06 AM UTC+5:30, Roy Smith wrote:
> > Yup.  I wrote a while(*) back about the pain I was having importing some 
> > data into a MySQL(**) database

> Here's my interpretation of that situation; I'd like to hear yours:
> 
> Basic problem was that MySQL handled a strict subset of what the rest
> of the system (Python 2.7?)  could handle.

Yes.  This was not a Python issue.  I was just responding to ChrisA's 
statement:

>>> Binding your program to BMP-only is nearly as dangerous as binding 
>>> it to ASCII-only; potentially worse, because you can run an awful 
>>> lot of artificial tests without remembering to stick in some astral 
>>> characters.


> Of course switching to postgres may be a sound choice on other fronts.
> But if that were not an option, and you only had these choices:
> 
> - significantly complexify your MySQL data structures to handle 4 in
>   20 million cases
> - just detect and throw such cases out at the outset
> 
> which would you take?

It turns out, we could have upgraded to a newer version of MySQL, which 
did handle astral characters correctly.  But, what we did was discarded 
the records containing non-BMP data.  Of course, that's a decision that 
can only be made when you understand the business requirements.  In our 
case, discarding those four records had no impact on our business, so it 
made sense.  For other people, not having the full dataset might have 
been a fatal problem.

This was just one of many MySQL problems we ran into.  Eventually, we 
decided it wasn't worth fighting with what was obviously a brain-dead 
system, and switched databases.



More information about the Python-list mailing list