'Lite' Databases

Chris Angelico rosuav at gmail.com
Sat Feb 21 00:39:03 EST 2015


On Sat, Feb 21, 2015 at 2:13 PM, Ben Finney <ben+python at benfinney.id.au> wrote:
> In addition, MySQL silently [0] loses data in many common situations.
>
>     <URL:https://dev.mysql.com/doc/refman/5.6/en/silent-column-changes.html>
>     <URL:http://www.davidpashley.com/2009/02/15/silently-truncated/>
>     <URL:http://effectivemysql.com/downloads/MySQLIdiosyncrasiesThatBITE.pdf>
>

Right, I'd forgotten about silent truncations. Though to be fair, the
first link refers to some changes which aren't a problem:

> Columns that are part of a PRIMARY KEY are made NOT NULL even if not declared that way.

The requirement that a PK consist of non-nullable columns is part of
other databases, too, and having the words "PRIMARY KEY" imply "NOT
NULL PRIMARY KEY" is fine IMO. Back when I used DB2 version 5, I used
to use "ID SMALLINT NOT NULL PRIMARY KEY" at the beginning of most of
my table definitions, and all this "silent change" does is allow you
to abbreviate that.

> Trailing spaces are automatically deleted from ENUM and SET member values when the table is created.

Which means they're built on top of CHAR columns, not VARCHAR. Not a
huge deal IMO, though definitely something to be aware of.

Some of the other issues are more concerning, but these two don't
bother me, anyway.

The other two links, though? Definitely problems. Silent truncation of
data is a pest, and it gets even worse than that. I tried to put
together a test-case to see if I could do the classic "break two
UTF-16 strings and construct a brand new character out of them" trick,
but found something even worse.

# -*- encoding: UTF-8 -*-
from __future__ import print_function
import MySQLdb
# If you don't explicitly say charset="UTF8", you get some default that you may
# not be able to trust. On my test box, it gave me Latin-1. Plus, you can't say
# charset="UTF-8" - you have to omit the hyphen. Not good, but not a gotcha as
# you get an immediate exception.
con = MySQLdb.connect("localhost","demo","demo","demodb",charset="UTF8")
cur = con.cursor()
# Minor nastiness: This produces a warning if the table doesn't exist.
# So automatically displaying warnings will produce annoying noise.
cur.execute("drop table if exists nasty")
# Don't forget, "utf8" doesn't mean UTF-8... no, you have to say "utf8mb4"!
cur.execute("create table nasty (id smallint primary key, payload
char(8) charset utf8mb4 not null)")
strings = (u"English: safe", u"ελληνικά: safe", u"\U0001F4A9: accurate")
cur.execute("insert into nasty values (1, %s), (2, %s), (3, %s)", strings)
cur.execute("select payload from nasty order by id")
print("Input\t\tOutput")
for input, output in zip(strings, cur):
    print(input,output[0],sep="\t")

Your terminal may or may not be able to display U+1F4A9, but it's an
accurate description of MySQL's handling of astral characters in this
demo. I don't even know what's going on here. Tested on MySQL 5.5 on
Debian Wheezy.

ChrisA



More information about the Python-list mailing list