[issue7643] What is an ASCII linebreak?
Marc-Andre Lemburg
report at bugs.python.org
Wed Jan 6 10:14:10 CET 2010
Marc-Andre Lemburg <mal at egenix.com> added the comment:
Florent Xicluna wrote:
>
> New submission from Florent Xicluna <laxyf at yahoo.fr>:
>
> Bytes objects and Unicode objects do not agree on ASCII linebreaks.
>
> ## Python 2
>
> for s in '\x0a\x0d\x1c\x1d\x1e':
> print u'a{}b'.format(s).splitlines(1), 'a{}b'.format(s).splitlines(1)
>
> # [u'a\n', u'b'] ['a\n', 'b']
> # [u'a\r', u'b'] ['a\r', 'b']
> # [u'a\x1c', u'b'] ['a\x1cb']
> # [u'a\x1d', u'b'] ['a\x1db']
> # [u'a\x1e', u'b'] ['a\x1eb']
>
>
> ## Python 3
>
> for s in '\x0a\x0d\x1c\x1d\x1e':
> print('a{}b'.format(s).splitlines(1),
> bytes('a{}b'.format(s), 'utf-8').splitlines(1))
>
> ['a\n', 'b'] [b'a\n', b'b']
> ['a\r', 'b'] [b'a\r', b'b']
> ['a\x1c', 'b'] [b'a\x1cb']
> ['a\x1d', 'b'] [b'a\x1db']
> ['a\x1e', 'b'] [b'a\x1eb']
Unicode has more line break characters defined than ASCII, which
only has a single line break character \n, but also uses the
conventions \r and \r\n for meaning "start a new line,
go to position 1".
See e.g. http://en.wikipedia.org/wiki/Ascii#ASCII_control_characters
The three extra code points Unicode defines for line breaks are
group separators that are not in common use.
----------
nosy: +lemburg
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue7643>
_______________________________________
More information about the Python-bugs-list
mailing list