byte compiled files

David Bolen db3l at fitlinxx.com
Fri Jul 27 17:31:56 EDT 2001


Mark Robinson <m.1.robinson at herts.ac.uk> writes:

> Thanks for the help guys, sorry I didn't include the code (what can I 
> say, duh). Anyway I did manage to fix the problem but I am still kinda 
> interested as to why I was getting that previous behaviour. I am running 
> python 2.1 under red hat 7, and python 2.1.1 under windows NT and had 
> the same problem under both. Here is the code:
> 
> def checkValidData(bsite):
> 	repeats = ['AAAA', 'CCCC', 'GGGG', 'TTTT']
> 	for x in bsite:	#check new bsite contains valid charactors
> 		if(x is not 'A' and x is not 'G' and
> 			x is not 'C' and x is not 'T' ):
> 			return 0
> 	for x in repeats:
> 		if (string.find(bsite, x) != -1):
> 			return 0
> 	return 1
> 
> Changing where I test if x is A, C, G or T by using the != test rather 
> than 'is not' solved the problem, and I guess I may have been using it 
> inappropriately, but I still don't see why it would have working 
> correctly first run and then incorrectly when working from the .pyc file.

Ah - seeing this makes it much clearer what was probably going on.
It's because of what "is" does.  It's an object identity test, which
isn't what you want here.

In fact, you're probably just lucky that it worked at all even the
first time, due to an implementation optimization in Python.  Python
will "intern" small strings automatically - placing them in a global
table of such strings for improved performance.  This guarantees that
references to such strings are actually references to single objects.

A similar thing happens for small integers (0-100 I think), which
makes checks such as "is 1" possibly work too.

For example:

    >>> a='A'
    >>> b='A'
    >>> print id(a),id(b),id('A')
    8262784 8262784 8262784

This means that (luckily) if you pass the string 'A' into your
function and then do a "<value> is 'A'" it'll work, but there's a
lot happening under the covers.  When you pass 'A' into the function
the literal string 'A' produces an object reference to the interned
copy, which is then passed to the function.  Internal to the function
the use of 'A' does the same thing, and thus both references to 'A'
are to the same object and "is" works (or "is not" fails).  E.g.:

    >>> def check(val):
    ...   print val is 'A'
    ...
    >>> check('A')
    1
    >>>

But that's by no means guaranteed behavior - there could just as
easily be two discrete objects involved.  Python only interns some
string constants (currently string constants with alphanumerics and
underscores), so if you happened to use something larger the same code
wouldn't work the way you thought, e.g.:

    >>> a='no intern'
    >>> b='no intern'
    >>> id(a),id(b),id('no intern')
    (8363408, 8361360, 8361968)

    >>> def check(val):
    ...   print val is 'no intern'
    ...
    >>> check('no intern')
    0

And I had to peek at the source to see how the current constant
interning is handled - it's not guaranteed behavior so certainly not
something to depend on.

In terms of your pickling behavior - I haven't reviewed the source,
but my best guess is that when the compiler loaded the pre-compiled
function code from the compiled module, the internal string literals
were their own objects, and thus no longer matched against the objects
representing the information supplied to the function from the caller.
But since such behavior (two independent strings yielding references
to the same object) is not well-defined behavior, the compiler is
certainly free to do that.

So definitely stick with "==" or "!=/<>" if you are trying to compare
the actual value of a string, as opposed to the identity of the string
object itself.  True, identity check can sometimes be faster, but as
you can see it is _not_ the same as a value check (and I believe
strings already have an optimization of checking object identity
before bothering to actually compare contents).

--
-- David
-- 
/-----------------------------------------------------------------------\
 \               David Bolen            \   E-mail: db3l at fitlinxx.com  /
  |             FitLinxx, Inc.            \  Phone: (203) 708-5192    |
 /  860 Canal Street, Stamford, CT  06902   \  Fax: (203) 316-5150     \
\-----------------------------------------------------------------------/



More information about the Python-list mailing list