[Tutor] Unicode trouble
Michael Lange
klappnase at freenet.de
Wed Nov 30 20:46:30 CET 2005
On Wed, 30 Nov 2005 13:41:54 -0500
Kent Johnson <kent37 at tds.net> wrote:
> >>>This is the full error:
> >>>Traceback (most recent call last):
> >>> File
> >>>"C:\Python23\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py",
> >>>line 310, in RunScript
> >>> exec codeObject in __main__.__dict__
> >>> File "C:\Python\BA\Oversett.py", line 47, in ?
> >>> File "C:\Python\BA\Oversett.py", line 23, in kjor
> >>> en = i.split('\t')[0]
> >>> File "C:\Python23\lib\codecs.py", line 388, in readlines
> >>> return self.reader.readlines(sizehint)
> >>> File "C:\Python23\lib\codecs.py", line 314, in readlines
> >>> return self.decode(data, self.errors)[0].splitlines(1)
> >>>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 168-170:
> >>>invalid data
> >
> >
> >>This is fairly strange as the line
> >> en = i.split('\t')[0]
> >>should not call any method in codecs. I don't know how you can get such a
> >>stack trace.
> >
> > The file f where en comes from does contain lots of lines with one english
> > word followed by a tab and a norwegian one. (Approximately 25000 lines) It
> > can look like this: core\tkjærne
>
> Yes, I understand that.
>
> > So en is supposed to be the english word that the program need to find in
> > MS Word, and to is the replacement word. So wouldn't that be a string that
> > should be handeled by codecs?
> >
> > for i in self.f.readlines():
> > en = i.split('\t')[0]
>
> The thing is, it's the line
> for i in self.f.readlines():
> that is calling the codecs module, not the line
> en = i.split('\t')[0]
> but it is the latter line that is in the stack trace.
>
> Can any of the other tutors make any sense of this stack trace?
As far as I see here, isn't the line
return self.decode(data, self.errors)[0].splitlines(1)
causing the traceback?
I haven't read all of this thread, but maybe you are trying to pass a
non-utf8 string to the utf8 codec?
Michael
More information about the Tutor
mailing list