[XML-SIG] Weirdness (bug?) with smart_len (wasRe: Issues with Unicode type)

Thomas B. Passin tpassin@comcast.net
Wed, 25 Sep 2002 09:15:58 -0400


I can confirm this behavior - tested on Win2000 with Python 2.2.  But if the
same code is all in one module in does not fail on the repeated attempts,
only when the function def is imported from another model as Eric has it.

Eric is also right about the .pyc (and .pyo which I also tried) - if you
delete it the next one execution succeeds.

Furthermore, it is specifically the regular expression that does it, not its
being called in the function.  This is easy to show -  if you change the
code so that it is not used, the failure does not happen:

import re

#SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")

def smart_len(u):
    #sp_count = len(SP_PAT.findall(u))
    return 0
   # return len(u) - sp_count

Now just uncomment the SP_PAT line - leaving it still unused - and presto!
The failure returns.

Now that is really strange!  Hope someone who knows Python really well can
explain this and help get it fixed.

Cheers,

Tom P

[Eric van der Vlist]
[[
I am trying to use this when python is compiled with ucs2, but I am
seeing a weird behavior when using this function: it seems that it can't
stand being compiled as a .pyc!

I have:

test.py:
#!/usr/bin/env python
import Smart_len

print Smart_len.smart_len(u'\U00010800')

and Smart_len.py:

import re

SP_PAT = re.compile(u"[\uD800-\uDBFF][\uDC00-\uDFFF]")

def smart_len(u):
sp_count = len(SP_PAT.findall(u))
return len(u) - sp_count

It's working the 1st time (or when I remove Smart_len.pyc) but fails
after the second execution:

]]