Blog "about python 3"

Fri Jan 3 06:14:41 EST 2014

On 02/01/2014 18:37, Terry Reedy wrote:
> On 1/2/2014 12:36 PM, Robin Becker wrote:
>
>> I just spent a large amount of effort porting reportlab to a version
>> which works with both python2.7 and python3.3. I have a large number of
>> functions etc which handle the conversions that differ between the two
>> pythons.
>
> I am imagine that this was not fun.

indeed :)
>
>> For fairly sensible reasons we changed the internal default to use
>> unicode rather than bytes.
>
> Do you mean 'from __future__ import unicode_literals'?

No, previously we had default of utf8 encoded strings in the lower levels of the 
code and we accepted either unicode or utf8 string literals as inputs to text 
functions. As part of the port process we made the decision to change from 
default utf8 str (bytes) to default unicode.

> Am I correct in thinking that this change increases the capabilities of
> reportlab? For instance, easily producing an article with abstracts in English,
> Arabic, Russian, and Chinese?
>
It's made no real difference to what we are able to produce or accept since utf8 
or unicode can encode anything in the input and what can be produced depends on 
fonts mainly.

>  > After doing all that and making the tests
...........
>> I know some of these tests are fairly variable, but even for simple
>> things like paragraph parsing 3.3 seems to be slower. Since both use
>> unicode internally it can't be that can it, or is python 2.7's unicode
>> faster?
>
> The new unicode implementation in 3.3 is faster for some operations and slower
> for others. It is definitely more space efficient, especially compared to a wide
> build system. It is definitely less buggy, especially compared to a narrow build
> system.
>
> Do your tests use any astral (non-BMP) chars? If so, do they pass on narrow 2.7
> builds (like on Windows)?

I'm not sure if we have any non-bmp characters in the tests. Simple CJK etc etc 
for the most part. I'm fairly certain we don't have any ability to handle 
composed glyphs (multi-codepoint) etc etc

....
> For one thing, indexing and slicing just works on all machines for all unicode
> strings. Code for 2.7 and 3.3 either a) does not index or slice, b) does not
> work for all text on 2.7 narrow builds, or c) has extra conditional code only
> for 2.7.
>

probably
-- 
Robin Becker