[IronPython] Ironclad problem, with which someone here may be able to help

William Reade william at resolversystems.com
Fri Nov 7 13:02:37 CET 2008


So far as I can tell, CPython ctypes and installers (neither of which 
are currently relevant) use COM, and numpy doesn't use it at all. 
However, my search wasn't very sophisticated -- I just looked for 
CoInitialize, CoCreateInstance, and lpVtbl, which is all the COM I can 
remember*. Was that actually a sensible way of checking?

Ironclad doesn't use any of the COM-related methods on Marshal, and only 
uses a handful of unmanaged functions (from kernel32 and msvcr71), so if 
it does use COM it's doing so secretly and without my knowledge :).

Running ipy on its own does appear to load ole32, but I haven't worked 
out what, if anything, it's doing with it.

* I had to work with faked-up COM interfaces while working on mac game 
ports a few years ago  -- that's the full extent of my experience.

William Reade wrote:
> I'll look into that along with everything else :).
>
> Dino Viehland wrote:
>> Do you know if numpy is using COM anywhere?  Or does Ironclad use COM 
>> for any of its interop?
>>
>> -----Original Message-----
>> From: users-bounces at lists.ironpython.com 
>> [mailto:users-bounces at lists.ironpython.com] On Behalf Of William Reade
>> Sent: Thursday, November 06, 2008 6:01 AM
>> To: Discussion of IronPython
>> Subject: Re: [IronPython] Ironclad problem, with which someone here 
>> may be able to help
>>
>> Thanks Dino -- it seems that if I use ipy with -X:MTA, I can no longer
>> reproduce the problem. However, this bothers me a bit: I'm not competent
>> to follow all the consequences here, but this situation seems to imply
>> that Ironclad won't be usable safely from any STA thread. Is this an
>> intended restriction or a bug?
>>
>> Incidentally, tracking down the call stacks proved to be hard work: the
>> timing changed enough that I only got a single failure over dozens of
>> runs, and it turned out I'd got the logging wrong, so I didn't find out
>> anything useful :(.
>>
>> Dino Viehland wrote:
>>  
>>> I would suggest getting a snap shot of the call stacks when this is 
>>> happening if that's possible.  I can't pin anything down but I 
>>> wonder if you could have an STA object or something that otherwise 
>>> requires message pumping.  That message pumping could happen while 
>>> you're doing a Monitor.Enter call.  If that was happening maybe 
>>> there is some subtle CLR bug or a surprise feature?  It is 
>>> surprising that Monitor.Enter can go re-entrant but it can...
>>>
>>> So it'd be interesting to get thread snapshots and see if
>>>
>>>
>>>    
>>>>                EnsureGIL (443) 2
>>>>                EnsureGIL (443) 1      <- omg, wtf, bbq, etc.
>>>>
>>>>       
>>> Could be happening because Thread 1 experiences contention, blocks 
>>> and pumps messages, causing the finalizer thread (2) to run, the 
>>> lock is acquired and ... ?
>>>
>>> Only other thing I could think of is does it repro on other 
>>> machines?  Maybe it's a hardware bug as unlikely as that seems?
>>>
>>> -----Original Message-----
>>> From: users-bounces at lists.ironpython.com 
>>> [mailto:users-bounces at lists.ironpython.com] On Behalf Of William Reade
>>> Sent: Wednesday, November 05, 2008 10:01 AM
>>> To: Discussion of IronPython
>>> Subject: Re: [IronPython] Ironclad problem, with which someone here 
>>> may be able to help
>>>
>>> The log starts in the middle (after many lock/unlocks, some from each
>>> thread); I'm running on x86; and I have no additional AppDomains.
>>>
>>> I don't think it would be safe for me to entirely avoid locking during
>>> finalization, but I could probably cut it down to a quick lock, on a
>>> separate object, to enqueue pointers for cleanup and deallocation on 
>>> the
>>> main thread. However, I'm reluctant to do that until I'm sure that the
>>> problem is specifically related to GC, otherwise it'll just come 
>>> back as
>>> soon as anyone tries any serious multithreading :).
>>>
>>> Curt Hagenlocher wrote:
>>>
>>>    
>>>> Locking during finalization is often considered to be a bad idea.  In
>>>> particular, locking without a timeout introduces the possibility that
>>>> you will hang the finalization thread, preventing further objects from
>>>> being finalized.  But clearly, that's not what's happening here.
>>>>
>>>> Other questions that probably don't matter but might be interesting to
>>>> know:
>>>>
>>>> Can we assume that the finalization thread isn't the first place where
>>>> this lock is required?  That your log starts somewhere in the middle?
>>>>
>>>> Is this under x86 or x64 or both?
>>>>
>>>> Are you creating any additional AppDomains in the process?
>>>>
>>>>
>>>> On Wed, Nov 5, 2008 at 10:15 AM, William Reade
>>>> <william at resolversystems.com <mailto:william at resolversystems.com>> 
>>>> wrote:
>>>>
>>>>     Hi Curt
>>>>
>>>>     I am indeed; that's how I know thread 2 is the GC thread. Is
>>>>     locking during GC forbidden?
>>>>
>>>>     William
>>>>
>>>>     Curt Hagenlocher wrote:
>>>>
>>>>         ...or, for that matter, any __del__ methods from within Python
>>>>         -- which ultimately are handled by finalization.
>>>>
>>>>         On Wed, Nov 5, 2008 at 9:37 AM, Curt Hagenlocher
>>>>         <curt at hagenlocher.org <mailto:curt at hagenlocher.org>
>>>>         <mailto:curt at hagenlocher.org <mailto:curt at hagenlocher.org>>>
>>>>         wrote:
>>>>
>>>>            So, the obvious question for me is whether or not you're
>>>>         using any
>>>>            finalizers.
>>>>
>>>>
>>>>            On Wed, Nov 5, 2008 at 5:57 AM, William Reade
>>>>            <william at resolversystems.com
>>>>         <mailto:william at resolversystems.com>
>>>>         <mailto:william at resolversystems.com
>>>>         <mailto:william at resolversystems.com>>>
>>>>
>>>>            wrote:
>>>>
>>>>                Hi all
>>>>
>>>>                While running the numpy tests, I've come across a 
>>>> situation
>>>>                which, to the best of my knowledge, is simply
>>>>         impossible. I'm
>>>>                hoping that one of the local .NET gurus will be able to
>>>>         tell
>>>>                me what I'm missing, or point me somewhere I can get
>>>>         more insight.
>>>>
>>>>                The 4 methods involved are as follows:
>>>>                -----------------------
>>>>                      public int GetThreadId()
>>>>                      {
>>>>                          return Thread.CurrentThread.ManagedThreadId;
>>>>                      }
>>>>
>>>>                      public void WriteFlush(string info)
>>>>                      {
>>>>                          Console.WriteLine(info);
>>>>                          Console.Out.Flush();
>>>>                      }
>>>>
>>>>                      public void EnsureGIL()
>>>>                      {
>>>>                          Monitor.Enter(this.dispatcherLock);
>>>>                          this.WriteFlush(String.Format(
>>>>                              "EnsureGIL ({1}) {0}", 
>>>> this.GetThreadId(),
>>>>                Builtin.id(this.dispatcherLock)));
>>>>                      }
>>>>
>>>>                      public void ReleaseGIL()
>>>>                      {
>>>>                          this.WriteFlush(String.Format(
>>>>                              "ReleaseGIL ({1}) {0}\n", 
>>>> this.GetThreadId(),
>>>>                Builtin.id(this.dispatcherLock)));
>>>>                          Monitor.Exit(this.dispatcherLock);
>>>>                      }
>>>>                -----------------------
>>>>                ...and they can, and do, occasionally produce output as
>>>>         follows:
>>>>                -----------------------
>>>>                EnsureGIL (443) 2
>>>>                EnsureGIL (443) 1      <- omg, wtf, bbq, etc.
>>>>                ReleaseGIL (443) 2
>>>>
>>>>                EnsureGIL (443) 2
>>>>                ReleaseGIL (443) 1
>>>>
>>>>                ReleaseGIL (443) 2
>>>>                -----------------------
>>>>                When this happens, the process continues happily for a
>>>>         short
>>>>                time and then falls over in a later call to ReleaseGIL
>>>>         (after
>>>>                successfully calling it several times). The error is "
>>>>         Object
>>>>                synchronization method was called from an
>>>>         unsynchronized block
>>>>                of code", which I understand to mean "you can't release
>>>>         this
>>>>                lock because you don't hold it".
>>>>
>>>>                It doesn't happen very often, but I can usually
>>>>         reproduce it
>>>>                by running
>>>>         test_multiarray.TestFromToFile.test_malformed a few
>>>>                hundred times. It may be relevant to note that thread 2
>>>>         is the
>>>>                GC thread, and thread 1 is the main thread. I have
>>>>         considered
>>>>                the following possibilities:
>>>>
>>>>                (1) That I'm locking on the wrong object. I believe
>>>>         that isn't
>>>>                the case, because it's constructed only once, as a "new
>>>>                Object()" (ie, a reference type), and is only 
>>>> subsequently
>>>>                used for locking; and, because it keeps the same ipy id
>>>>                throughout.
>>>>
>>>>                (2) That Monitor.Enter occasionally allows two 
>>>> different
>>>>                threads to acquire the same lock. I consider this 
>>>> extremely
>>>>                unlikely, because... well, how many multithreaded 
>>>> .NET apps
>>>>                already exist? If Monitor really were broken, I 
>>>> think we'd
>>>>                probably know about it by now.
>>>>
>>>>                (3) That calling Flush() on a SyncTextWriter (the 
>>>> type of
>>>>                Console.Out) doesn't actually do anything, and the
>>>>         output is
>>>>                somehow wrongly ordered (although I can't imagine 
>>>> how this
>>>>                could actually be: if the locking is really working,
>>>>         then my
>>>>                console writes are strictly sequential). I don't have
>>>>         access
>>>>                to the code, so I have no idea how it's implemented,
>>>>         but even
>>>>                if this is the case it doesn't help much with the
>>>>         fundamental
>>>>                problem (the synchronisation error which follows).
>>>>
>>>>                Apart from the above, I'm out of ideas. Can anyone 
>>>> suggest
>>>>                what I've missed?
>>>>
>>>>                William
>>>>                _______________________________________________
>>>>                Users mailing list
>>>>                Users at lists.ironpython.com
>>>>         <mailto:Users at lists.ironpython.com>
>>>>         <mailto:Users at lists.ironpython.com
>>>>         <mailto:Users at lists.ironpython.com>>
>>>>
>>>>
>>>>          http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>>
>>>>
>>>>
>>>>         
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>>
>>>>
>>>>         _______________________________________________
>>>>         Users mailing list
>>>>         Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>>>>         http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>>
>>>>
>>>>
>>>>     _______________________________________________
>>>>     Users mailing list
>>>>     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
>>>>     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>>
>>>>
>>>> ------------------------------------------------------------------------ 
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.ironpython.com
>>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>>
>>>>
>>>>       
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.ironpython.com
>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.ironpython.com
>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>
>>>
>>>     
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>> _______________________________________________
>> Users mailing list
>> Users at lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>
>>   
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>




More information about the Ironpython-users mailing list