Python 2.1 / 2.3: xreadlines not working with codecs.open

Eric Brunel eric_brunel at despammed.com
Tue Jun 28 08:42:18 EDT 2005


On Thu, 23 Jun 2005 14:23:34 +0200, Eric Brunel <eric_brunel at despammed.com> wrote:

> Hi all,
>
> I just found a problem in the xreadlines method/module when used with codecs.open: the codec specified in the open does not seem to be taken into account by xreadlines which also returns byte-strings instead of unicode strings.
>
> For example, if a file foo.txt contains some text encoded in latin1:
>
>>>> import codecs
>>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
>>>> [l for l in f.xreadlines()]
> ['\xe9\xe0\xe7\xf9\n']
>
> But:
>
>>>> import codecs
>>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
>>>> f.readlines()
> [u'\ufffd\ufffd']
>
> The characters in latin1 are correctly "dumped" with readlines, but are still in latin1 encoding in byte-strings with xreadlines.

Replying to myself. One more funny thing:

>>> import codecs, xreadlines
>>> f = codecs.open('foo.txt', 'r', 'utf-8', 'replace')
>>> [l for l in xreadlines.xreadlines(f)]
[u'\ufffd\ufffd']

So f.xreadlines does not work, but xreadlines.xreadlines(f) does. And this happens in Python 2.3, but also in Python 2.1, where the implementation for f.xreadlines() calls xreadlines.xreadlines(f) (?!?). Something's escaping me here... Reading the source didn't help.

At least, it does provide a workaround...
-- 
python -c "print ''.join([chr(154 - ord(c)) for c in 'U(17zX(%,5.zmz5(17;8(%,5.Z65\'*9--56l7+-'])"



More information about the Python-list mailing list