help I'm getting delimited

aka alexoplocatie at gmail.com
Thu Dec 18 07:00:27 EST 2008


On 18 dec, 00:06, John Machin <sjmac... at lexicon.net> wrote:


- Tekst uit oorspronkelijk bericht niet weergeven -
- Tekst uit oorspronkelijk bericht weergeven -

> On Dec 18, 3:15 am, aka <alexoploca... at gmail.com> wrote:

> Do you mean that this file was created by whatever.UnicodeWriter? If
> so, did you just now discover this information?


> How do you know that "the UnicodeWriter is functioning perfectly"?
> What does "functioning perfectly mean to you"? In particular, what
> encoding is it using?


> Which do you mean:
> (a) you typed those lines into Notepad yourself
> (b) you took a copy of a file created by whatever.UnicodeWriter,
> opened it with Notepad, trimmed off some rows and columns, and saved
> it again
> ?
> Here's a likely hypothesis: the file was written in utf16. In that
> case:
> either (i) you really want utf16 (why?), so:


> (1) the csv module will not cope with it, and is not expected to cope
> with it


> (2) the whatever.UnicodeReader should (in order of preference):
>    (a) be allowed to find out for itself that 'utf16' is the go
>    (b) be told explicitly that 'utf16' is the go
>    (c) be served with a bug report


> OR (ii) you really want utf8, so:


> (1) the csv module should be happy
> (2) the whatever.UnicodeWriter should be told to use 'utf8'
> (3) the whatever.UnicodeReader should (in order of preference):
>     [as above but s/16/8/]



The csv file originally was created by the UnicodeWriter class and
was
used for a mailmerge function with Microsoft Word which all
functioned
perfectly.
The reverse did not: read back the outputted file so at last I
editted
it in Notepad, cutting off columns, but I didn't know that the
encoding would remain even after that because it still caused
problems.
Now after testing from the Python command line with a csv file
generated from Excel I could get it working so it had to be the
encoding.
Because the write side of my code, which uses the UnicodeWriter, was
ok I didn't pay attention to the fact that I had changed the UW class
from UTF-8 to UTF-16 because of difficulties with dutch characters
like ë and ö.
Then at last I tried changing back to UTF-8 and noticed both out -and
input was working, including those special characters, so it was my
unjustifiable conclusion that I couldn't get around these special
characters at the write side without UTF-16 which ultimately got me
in
trouble with the read side.
With your help I got it straight. Once again minimizing the problem
to
its bare basics and to prevent big steps is the key.
Thanks a lot for your help John.
BTW, the TurboGears code is not very different from Python,
it just uses some extra identifiers around the Python code.



More information about the Python-list mailing list