RE Module Performance

Thu Jul 25 23:06:21 EDT 2013

On 07/25/2013 01:07 PM, wxjmfauth at gmail.com wrote:
> Let start with a simple string \textemdash or \texttendash
> 
>>>> sys.getsizeof('–')
> 40
>>>> sys.getsizeof('a')
> 26

That's meaningless.  You're comparing the overhead of a string object
itself (a one-time cost anyway), not the overhead of storing the actual
characters.  This is the only meaningful comparison:

>>>> sys.getsizeof('––') - sys.getsizeof('–')

>>>> sys.getsizeof('aa') - sys.getsizeof('a')

Actually I'm not even sure what your point is after all this time of
railing against FSR.  You have said in the past that Python penalizes
users of character sets that require wider byte encodings, but what
would you have us do? use 4-byte characters and penalize everyone
equally?  Use 2-byte characters that incorrectly expose surrogate pairs
for some characters? Use UTF-8 in memory and do O(n) indexing?  Are your
programs (actual programs, not contrived benchmarks) actually slower
because of FSR?  Is FSR incorrect?  If so, according to what part of the
unicode standard?  I'm not trying to troll, or feed the troll.  I'm
actually curious.

I think perhaps you feel that many of us who don't use unicode often
don't understand unicode because some of us don't understand you.  If
so, I'm not sure that's actually true.