[Python-Dev] unicode regex quickie: should a newline be the same thing as a linebreak?

Fredrik Lundh Fredrik Lundh" <effbot@telia.com
Tue, 30 May 2000 12:26:29 +0200


I wrote:

> what's the best way to deal with this?  I see three alter-
> natives:
>=20
> a) stick to the old definition, and use chr(10) also for
>    unicode strings
>=20
> b) use different definitions for 8-bit strings and unicode
>    strings; if given an 8-bit string, use chr(10); if given
>    a 16-bit string, use the LINEBREAK predicate.
>=20
> c) use LINEBREAK in either case.
>=20
> I think (c) is the "right thing", but it's the only that may
> break existing code...

I'm probably getting old, but I don't remember if anyone followed
up on this, and I don't have time to check the archives right now.

so for the upcoming "feature complete" release, I've decided to
stick to (a).

...

for the next release, I suggest implementing a fourth alternative:

d) add a new unicode flag.  if set, use LINEBREAK.  otherwise,
   use chr(10).

background: in the current implementation, this decision has to
be made at compile time, and a compiled expression can be used
with either 8-bit strings or 16-bit strings.

a fifth alternative would be to use the locale flag to tell the
difference between unicode and 8-bit characters:

e) if locale is not set, use LINEBREAK.  otherwise, use chr(10).

comments?

</F>

<project name=3D"sre" phase=3D" complete=3D"97.1%" />