[Patches] [ python-Patches-1658799 ] Handle requests to intern string subtype instances
SourceForge.net
noreply at sourceforge.net
Tue Mar 20 15:25:59 CET 2007
Patches item #1658799, was opened at 2007-02-13 11:34
Message generated for change (Comment added) made by hniksic
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1658799&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hrvoje Nikšić (hniksic)
Assigned to: Nobody/Anonymous (nobody)
Summary: Handle requests to intern string subtype instances
Initial Comment:
This patch implements a small modification of PyString_InternInPlace that allows for safe interning of string subtype instances. The change should be fully backward compatible -- for a rationale and discussion, see:
http://mail.python.org/pipermail/python-dev/2007-February/070973.html
----------------------------------------------------------------------
>Comment By: Hrvoje Nikšić (hniksic)
Date: 2007-03-20 15:25
Message:
Logged In: YES
user_id=1718107
Originator: YES
Thanks for the detailed analysis. I missed this case and I retract the
patch until I think of a way to resolve this problem. The obvious
possibility is to always copy subtype instances before attempting to intern
them, but right now I don't have the time to investigate if this slows
things down.
As for using s = intern(str(s)) in Python, it's a start, but it does
somewhat more than I'd like -- for example, it "interns" all kinds of
objects, which is not desirable. My patch was written with the intention
of making PyString_InternInPlace more robust wrt string subtype instances,
so that all the code in core and extensions that simply calls
PyString_InternInPlace keeps working without modification.
In the long run, the interface of PyString_InternInPlace is a bit too
undeterministic for my taste. It has no error reporting, so it silently
ignores some kinds of errors (not enough memory), throws fatal error on
others (non-string being passed), and also completely ignores string
subtypes. In my code I use this utility function:
int
intern_force(PyObject **s)
{
if (PyString_CheckExact(*s))
/* Most likely case: we're passed an exact string. */
PyString_InternInPlace(s);
else if (PyString_Check(*s))
{
/* The case we're covering with this function: we got a string
subtype. Intern a copy. */
PyObject *copy;
copy = PyString_FromStringAndSize(PyString_AS_STRING(*s),
PyString_GET_SIZE(*s));
if (!copy)
return -1;
Py_DECREF(*s);
*s = copy;
PyString_InternInPlace(s);
}
else
{
PyErr_SetString(PyExc_TypeError, "intern_force passed a
non-string");
return -1;
}
if (!PyString_CHECK_INTERNED(s))
{
/* PyString_InternInPlace failed and cleared the exception, most
likely due to insufficient memory. */
PyErr_Format(PyExc_RuntimeError, "failed to intern string '%s'",
PyString_AS_STRING(*s));
return -1;
}
return 0;
}
I don't expect a function like this one to become a part of Python, but
PyString_InternInPlace could be usefully improved even without breaking
compatibility.
----------------------------------------------------------------------
Comment By: Armin Rigo (arigo)
Date: 2007-02-14 13:21
Message:
Logged In: YES
user_id=4771
Originator: NO
Btw, any reason why you cannot simply say in
your Python program: intern(str(s)) ?
----------------------------------------------------------------------
Comment By: Armin Rigo (arigo)
Date: 2007-02-14 13:13
Message:
Logged In: YES
user_id=4771
Originator: NO
Ah, the code was the wrong way around. The following
causes an Fatal Python error in a debug build:
s1 = "hel"
s1 = intern(s1 + "lo")
class S(str):
def __hash__(self):
return 0
def __eq__(self, other):
return False
s = S(s1)
s2 = intern(s)
del s1
----------------------------------------------------------------------
Comment By: Hrvoje Nikšić (hniksic)
Date: 2007-02-14 09:31
Message:
Logged In: YES
user_id=1718107
Originator: YES
I don't think an attack is possible. This patch retains the property that
only exact strings are interned. If you create a pathological string
subtype that hashes like a different string instance (one that has already
been interned), all you'll achieve is that "interning" will return the
other instance. As far as I can tell, no string is actually removed from
the interned dictionary (until it becomes unreachable, that is.)
What is the expected result of your test code? I tried it and it ran
without error.
----------------------------------------------------------------------
Comment By: Armin Rigo (arigo)
Date: 2007-02-13 23:36
Message:
Logged In: YES
user_id=4771
Originator: NO
I think that this opens an attack (untested, though): it allows a
previously-interned string to be removed from the dictionary. This might
lead to a crash because the old string still thinks it is interned. Try
something along the lines of:
s1 = "hel"
s1 = intern(s1 + "lo")
class S(str):
def __hash__(self):
return hash(s1)
def __eq__(self, other):
return other == s1
s = S("world")
intern(s)
del s1
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1658799&group_id=5470
More information about the Patches
mailing list