RE Module Performance

Chris Angelico rosuav at gmail.com
Tue Jul 30 16:04:18 EDT 2013


On Tue, Jul 30, 2013 at 8:09 PM,  <wxjmfauth at gmail.com> wrote:
> Matable, immutable, copyint + xxx, bufferint, O(n) ....
> Yes, but conceptualy the reencoding happen sometime, somewhere.
> The internal "ucs-2" will never automagically be transformed
> into "ucs-4" (eg).

But probably not on the entire document. With even a brainless scheme
like I posted code for, no more than 1024 bytes will need to be
recoded at a time (except in some odd edge cases, and even then, no
more than once for any given file).

> And do not forget, in a pure utf coding scheme, your
> char or a char will *never* be larger than 4 bytes.
>
>>>> sys.getsizeof('a')
> 26
>>>> sys.getsizeof('\U000101000')
> 48

Yeah, you have a few odd issues like, oh, I dunno, GC overhead,
reference count, object class, and string length, all stored somewhere
there. Honestly jmf, if you want raw assembly you know where to get
it.

ChrisA



More information about the Python-list mailing list