Python 3.x stuffing utf-8 into SQLite db

mm0fmf none at mailinator.com
Mon Feb 9 14:41:20 EST 2015


On 09/02/2015 03:44, Skip Montanaro wrote:
> I am trying to process a CSV file using Python 3.5 (CPython tip as of a
> week or so ago). According to chardet[1], the file is encoded as utf-8:
>
>  >>> s = open("data/meets-usms.csv", "rb").read()
>  >>> len(s)
> 562272
>  >>> import chardet
>  >>> chardet.detect(s)
> {'encoding': 'utf-8', 'confidence': 0.99}
>
> so I created the reader like so:
>
>          rdr = csv.DictReader(open(csvfile, encoding="utf-8"))
>
> This seems to work. The rows are read and records added to a SQLite3
> database. When I go into sqlite3, I get what looks to be raw utf-8 on
> output:
>
> % LANG=en_US.UTF-8 sqlite3 topten.db
> SQLite version 3.8.5 2014-08-15 22:37:57
> Enter ".help" for usage hints.
> sqlite> select * from swimmeet where meetname like '%Barracuda%';
> sqlite> select count(*) from swimmeet;
> 0
> sqlite> select count(*) from swimmeet;
> 4171
> sqlite> select meetname from swimmeet where meetname like
> '%Barracuda%Patrick%';
> Anderson Barracudas St. Patrick's Day Swim Meet
> Anderson Barracuda Masters - 2010 St. Patrick’s Day Swim Meet
> Anderson Barracuda Masters 2011 St. Patrick’s Day Swim Meet
> Anderson Barracuda Masters St. Patrick's Day Meet
> Anderson Barracuda Masters St. Patrick's Day Meet 2014
> Anderson Barracuda Masters 2015 St. Patrick’s Day Swim Meet
>

How is meetname defined? Is it a varchar or nvarchar?

My only experience is with MS-SQL and C# but reading from a utf-8 
encoded file with a StreamReader set to utf-8 and trying to insert that 
into varchar fields results in similar issues to what you are showing. I 
changed to using nvarchar and it all start working as expected.






More information about the Python-list mailing list