[I18n-sig] codecs module, readlines and xreadlines

M.-A. Lemburg mal@lemburg.com
Thu, 16 Jan 2003 10:15:37 +0100


Poor Yorick wrote:
> The following code shows an inconsistency between open.readlines and
> codecs.open.readlines, and also between open.xreadlines and
> codecs.open.xreadlines.  the call to open.readlines returns '\n' as the
> whereas codecs.open.readlines returns '\r\n'.  Any plans to fix this?

On Windows, the 'r' opens the file in text which mangles the line-end
information. You should try to open the file in 'rb' (binary) mode
for comparison.

codecs.open() automatically appends the 'b' to the 'r' for you,
so this is probably the cause of the problem.

>  >>> fh = open('test2.txt', 'r')
>  >>> lines = fh.readlines()
>  >>> print lines
> ['1120, "Serial Number", 1016993947\n', '1122, "msconfig.exe",
> 1016994129\n', '1123, "Microsoft Windows XP", 1016994141\n', '1124,
> "Version", 1016994143\n', '1125, "XP", 1016994156\n', '1126, "Microsoft
> Windows", 1016994169\n', '1127, "Component", 1016994468']
> 
>  >>> fh = codecs.open('test1.txt', 'r', 'utf-16')
>  >>> lines = fh.readlines()
>  >>> print lines
> [u'1120, "Serial Number", 1016993947\r\n', u'1122, "msconfig.exe",
> 1016994129\r\n', u'1123, "Microsoft Windows XP", 1016994141\r\n',
> u'1124, "Version", 1016994143\r\n', u'1125, "XP", 1016994156\r\n',
> u'1126, "Microsoft Windows", 1016994169\r\n', u'1127, "Component",
> 1016994468']
> 
>  >>> fh = open('test2.txt', 'r')
>  >>> lines = fh.xreadlines()
>  >>> lines.next()
> '1120, "Serial Number", 1016993947\n'
>  >>> lines.next()
> '1122, "msconfig.exe", 1016994129\n'
> 
>  >>> fh = codecs.open('test1.txt', 'r', 'utf-16')
>  >>> lines = fh.xreadlines()
>  >>> lines.next()
> '\xff\xfe1\x001\x002\x000\x00,\x00
> \x00"\x00S\x00e\x00r\x00i\x00a\x00l\x00
> \x00N\x00u\x00m\x00b\x00e\x00r\x00"\x00,\x00
> \x001\x000\x001\x006\x009\x009\x003\x009\x004\x007\x00\r\x00\n'
>  >>> lines.next()
> '\x001\x001\x002\x002\x00,\x00
> \x00"\x00m\x00s\x00c\x00o\x00n\x00f\x00i\x00g\x00.\x00e\x00x\x00e\x00"\x00,\x00 
> 
> \x001\x000\x001\x006\x009\x009\x004\x001\x002\x009\x00\r\x00\n'
>  >>>
> 
> Poor Yorick
> gp@pooryorick.com
> 
> 
> 
> _______________________________________________
> I18n-sig mailing list
> I18n-sig@python.org
> http://mail.python.org/mailman/listinfo/i18n-sig

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/