[melbourne-pug] Unicode for windows dummies

Mike Dewhirst miked at dewhirst.com.au
Mon Aug 15 21:01:30 EDT 2016


If anyone can point me to the appropriate advice for resolving the error 
below I would be most appreciative. Really very appreciative.

I think I understand Unicode in theory and have reread a lot of articles 
including ...

* https://docs.python.org/3/library/codecs.html#encodings-and-unicode
* 
https://pythonconquerstheuniverse.wordpress.com/2010/05/30/unicode-beginners-introduction-for-dummies-made-simple/
* 
https://pythonconquerstheuniverse.wordpress.com/2010/06/04/unicode-for-dummies-just-use-utf-8/
* https://en.wikipedia.org/wiki/UTF-8

This is the error which has stumped me ...

(xxex3) C:\Users\mike\env\xxex3\ssds>python 
substance/data_imports/map_csv.py

Traceback (most recent call last):

   File "substance/data_imports/map_csv.py", line 139, in <module>

     csvdata = CsvImport(csvfile, company, start, finish)

   File "substance/data_imports/map_csv.py", line 127, in __init__

     print("%s" % cells)

   File "C:\Users\mike\env\xxex3\lib\encodings\cp850.py", line 19, in encode

     return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u2030' in 
position 7452: character maps to <undefined>


I have saved the csv file involved as utf-8 using LibreOffice 5 on 
Windows 8.1. from the original Microsoft Excel spreadsheet.

This is in Python 3.5 on Windows but it also needs to run in Python 2.7 
on Ubuntu 14.04 server (no gui).

map_csv.py [1] is the beginning of a module I want to develop into a 
generic data import facility. I'm starting with a specific csv file I 
need to import (not mine and its contents are private) and all it does 
at the moment is read in the file and print the lines to stdout.

I have tried utf-8 encoding each line and that gets past the error but 
just produces a set of chars a snippet of which below [2]. Decoding that 
as utf-8 reproduces the error as might be expected. I have also tried 
decoding as utf-16 and encoding it as utf-8 but that didn't work either.

Thanks for reading this far

Mike

[1] ...

from __future__ import unicode_literals

import os

class CsvImport(object):

     """ Imports a csv file and converts it into a list of lists """

     def __init__(self, csvfile, company, start, finish):

         self.company = company

         self.rows = list()

         with open(csvfile, "r") as csv:

             i = 0

             self.rows = csv.readlines()

             for line in self.rows:

                 i += 1

                 cells = list(line)

                 if i >= start:

                     print("%s" % cells)

                 if i > finish:

                     break

if __name__ == "__main__":

     company = "Calia Pty Ltd"

     dirname = "{0}/csv".format(company.split()[0].lower())

     filename = "{0}1.csv".format(company.split()[0].lower())

     start = 105

     finish = 404

     currdir = os.path.realpath(os.path.dirname(__file__)).replace('\\', 
'/')

     csvfile = os.path.join(currdir, dirname, filename)

     csvdata = CsvImport(csvfile, company, start, finish)

[1] ... , 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 
44, 34, 65, 99, 117, 116, 101, 32, 72, 97, 122, 97, 114, 100, 32, 84, 
111, 32, 84, 104, 101, 32, 65, 113, 117, 97, 116, 105, 99, 32, 69, 110, 
118, 105, 114, 111, 110, 109, 101, 110, 116, 46, 34, 44, 44, 44, 44, 44, 
44, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 
34, 34, 44, 34, 34, 44, 34, 34, 44, 34, 67, 104, 114, 111, 110, 105, 99, 
32, 72, 97, 122, 97, 114, 100, 32, 84, 111, 32, 84, 104, 101, 32, 65, 
113, 117, 97, 116, 105, 99, 32, 69, 110, 118, 105, 114, 111,  110, 109, 
101, 110, 116, 46, 34, 44, 50, 44, 34, 78, 47, 65, 34, 44, 34, 71, 72, 
83, 48, 57, 34, 44, 34, 72, 52, 49, 49, 34, 44, 44, 44, 48, 46, 48, 48, 
48, 48, 48, 37, 44, 34, 34, 44, 44, 34, 34, 44, 34, 34, 44, 34, 34, 44, 
34, 34, 44, 34, 34, 44, 34, 72, 97, 122, 97, 114, 100, 111, 117, 115, 
32, 84, 111, 32, 84, 104, 101, 32, 79, 122, 111, 110, 101, 32, 76, 97, 
121, 101, 114, 46, 34, 44, 44, 44, 44, 44, 48, 46, 48, 48, 48, 48, 48, 
37, 44, 34, 34, 44, 34, 34, 44, 34, 65, 100, 100, 105, 116, 105, 111, 
110, 97, 108, 32, 78, 111, 110, 45, 71, 72, 83, 32, 72, 97, 122, 97, 
114, 100, 32, 83, 116, 97, 116, 101, 109, 101, 110, 116, 34, 44, 34, 65, 
85, 72, 48, 54, 54, 34, 44, 48, 46, 48, 48, 48, 48, 48, 37, 44, 34, 34, 10]




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/melbourne-pug/attachments/20160816/2b36f22c/attachment.html>


More information about the melbourne-pug mailing list