should writing Unicode files be so slow

Sun Mar 21 10:29:58 EDT 2010

Antoine Pitrou wrote:
> Le Fri, 19 Mar 2010 17:18:17 +0000, djc a écrit :
>> changing
>> with open(filename, 'rU') as tabfile: to
>> with codecs.open(filename, 'rU', 'utf-8', 'backslashreplace') as
>> tabfile:
>>
>> and
>> with open(outfile, 'wt') as out_part: to
>> with codecs.open(outfile, 'w', 'utf-8') as out_part:
>>
>> causes a program that runs  in
>> 43 seconds to take 4 minutes to process the same data.
> 
> codecs.open() (and the object it returns) is slow as it is written in 
> pure Python.
> 
> Accelerated reading and writing of unicode files is available in Python 
> 2.7 and 3.1, using the new `io` module.

Thank you, for a clear and to the point explanation. I shall concentrate on
finding an optimal time to upgrade from Python 2.6.

-- 
David Clark, MSc, PhD.              UCL Centre for Publishing
                                    Gower Str London WCIE 6BT
What sort of web animal are you?
            <https://www.bbc.co.uk/labuk/experiments/webbehaviour>