[IronPython] Slow Performance of CPython libs?

Birsch birsch at gmail.com
Fri Feb 22 18:40:05 CET 2008


Dino - do you want me to register a workitem? Also, if I wanted to patch
v1.1.1, where should I start?

On Thu, Feb 21, 2008 at 8:48 PM, Dino Viehland <dinov at exchange.microsoft.com>
wrote:

> After Birsh's last comment that they're calling compile I was thinking we
> might be doing something really stupid.  Turns out that we are.  For our
> supposedly compiled regex's we're not passing the Compiled option.  So
> they're all getting interpretted and of course that's going to suck big
> time.
>
> So the fix for this is probably to just pass RegexOptions.Compiled to the
> Regex object being contructed into RE_Pattern.  Internally the Regex class
> will compile to dynamic methods (much better than compiling to an assembly
> because it'll be collectible).
>
> It's actually a little more complex than that because we create RE_Pattern
> objects for the non-compiled so we need to flow the option in, but it's
> still a simple fix.
>
> -----Original Message-----
> From: users-bounces at lists.ironpython.com [mailto:
> users-bounces at lists.ironpython.com] On Behalf Of Dan Shechter
> Sent: Thursday, February 21, 2008 10:19 AM
> To: Discussion of IronPython
> Subject: Re: [IronPython] Slow Performance of CPython libs?
>
> Just out of curiosity...
> I'm guessing that there are two levels of caching that could be used.
> One would be to hold a global dictionary of some sort of pattern ->
> Regex that will save the lengthy parsing of the re.
>
> The other would be to actually call Regex.CompileToAssembly to get a
> more efficient reperesentation (in runtime that is) of the re.
>
> Would you (the IP team) favor the former or the latter?
>
>        Shechter.
>
>
> Birsch wrote:
> > I checked both sgmllib.py and BeautifulSoup.py - and it seems both are
> > reusing the same regexps (searched for re.compile).
> >
> > I think your suggestion is very relevant in this case. It makes sense to
> > replicate the "compile once use many" behavior that is commonly used
> > with regexp.
> >
> > -Birsch
> >
> > On Thu, Feb 21, 2008 at 7:30 PM, Dino Viehland
> > <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>>
> wrote:
> >
> >     Do you know if the same reg ex is being used repeatedly?  If so
> >     maybe we can cache & compile the regex instead of interpretting it
> >     all the time.
> >
> >
> >
> >     *From:* users-bounces at lists.ironpython.com
> >     <mailto:users-bounces at lists.ironpython.com>
> >     [mailto:users-bounces at lists.ironpython.com
> >     <mailto:users-bounces at lists.ironpython.com>] *On Behalf Of *Birsch
> >     *Sent:* Thursday, February 21, 2008 8:30 AM
> >
> >     *To:* Discussion of IronPython
> >     *Subject:* Re: [IronPython] Slow Performance of CPython libs?
> >
> >
> >
> >     I took on Cooper's advice and profiled with ANTS. Here are the top
> >     methods:
> >
> >     *Namespace*
> >
> >
> >
> >     *Method name*
> >
> >
> >
> >     *Time (sec.)*
> >
> >
> >
> >     *Time with children (sec.)*
> >
> >
> >
> >     *Hit count*
> >
> >
> >
> >     *Source file*
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Go()
> >
> >
> >
> >     37.0189
> >
> >
> >
> >     94.4676
> >
> >
> >
> >     13689612
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Operator()
> >
> >
> >
> >     6.2411
> >
> >
> >
> >     6.2411
> >
> >
> >
> >     131146274
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Advance(int i)
> >
> >
> >
> >     5.9264
> >
> >
> >
> >     8.7202
> >
> >
> >
> >     66000263
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.SetOperator(int op)
> >
> >
> >
> >     5.5750
> >
> >
> >
> >     5.5750
> >
> >
> >
> >     131146274
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Backtrack()
> >
> >
> >
> >     5.5692
> >
> >
> >
> >     9.4895
> >
> >
> >
> >     37781343
> >
> >
> >
> >     IronPython.Runtime.Operations
> >
> >
> >
> >     Ops.CallWithContext(ICallerContext context, object func, object
> >     arg0, object arg1)
> >
> >
> >
> >     5.5572
> >
> >
> >
> >     114.5245
> >
> >
> >
> >     79754
> >
> >
> >
> >     IronPython.Runtime.Calls
> >
> >
> >
> >     Method.Call(ICallerContext context, object arg0)
> >
> >
> >
> >     4.9052
> >
> >
> >
> >     114.8251
> >
> >
> >
> >     50886
> >
> >
> >
> >     IronPython.Runtime.Calls
> >
> >
> >
> >     PythonFunction.CallInstance(ICallerContext context, object arg0,
> >     object arg1)
> >
> >
> >
> >     4.8876
> >
> >
> >
> >     114.8059
> >
> >
> >
> >     50886
> >
> >
> >
> >     IronPython.Runtime.Calls
> >
> >
> >
> >     Function2.Call(ICallerContext context, object arg0, object arg1)
> >
> >
> >
> >     4.6400
> >
> >
> >
> >     114.5471
> >
> >
> >
> >     47486
> >
> >
> >
> >     IronPython.Runtime.Operations
> >
> >
> >
> >     Ops.CallWithContext(ICallerContext context, object func, object
> arg0)
> >
> >
> >
> >     4.2344
> >
> >
> >
> >     114.1604
> >
> >
> >
> >     146658
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexBoyerMoore.Scan(string text, int index, int beglimit, int
> endlimit)
> >
> >
> >
> >     3.6465
> >
> >
> >
> >     3.6465
> >
> >
> >
> >     13678131
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexCharClass.CharInClassRecursive(char ch, string set, int start)
> >
> >
> >
> >     3.6288
> >
> >
> >
> >     5.7113
> >
> >
> >
> >     31508162
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Goto(int newpos)
> >
> >
> >
> >     3.2058
> >
> >
> >
> >     5.1470
> >
> >
> >
> >     27364668
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Operand(int i)
> >
> >
> >
> >     3.1923
> >
> >
> >
> >     3.1923
> >
> >
> >
> >     73230687
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexRunner.EnsureStorage()
> >
> >
> >
> >     3.0803
> >
> >
> >
> >     3.0803
> >
> >
> >
> >     51474823
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexCharClass.CharInClass(char ch, string set)
> >
> >
> >
> >     3.0713
> >
> >
> >
> >     8.7827
> >
> >
> >
> >     31508162
> >
> >
> >
> >     IronPython.Runtime.Calls
> >
> >
> >
> >     Method.Call(ICallerContext context, object arg0, object arg1)
> >
> >
> >
> >     2.9821
> >
> >
> >
> >     7.8675
> >
> >
> >
> >     15012
> >
> >
> >
> >     IronPython.Runtime.Calls
> >
> >
> >
> >     PythonFunction.CallInstance(ICallerContext context, object arg0,
> >     object arg1, object arg2)
> >
> >
> >
> >     2.9794
> >
> >
> >
> >     7.8639
> >
> >
> >
> >     15012
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Forwardcharnext()
> >
> >
> >
> >     2.8852
> >
> >
> >
> >     2.8852
> >
> >
> >
> >     62865185
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Forwardchars()
> >
> >
> >
> >     2.8279
> >
> >
> >
> >     2.8279
> >
> >
> >
> >     59436277
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexCharClass.CharInClassInternal(char ch, string set, int start,
> >     int mySetLength, int myCategoryLength)
> >
> >
> >
> >     2.0632
> >
> >
> >
> >     2.0826
> >
> >
> >
> >     31508162
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexRunner.Scan(Regex regex, string text, int textbeg, int textend,
> >     int textstart, int prevlen, bool quick)
> >
> >
> >
> >     1.8376
> >
> >
> >
> >     101.7226
> >
> >
> >
> >     43009
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.FindFirstChar()
> >
> >
> >
> >     1.6405
> >
> >
> >
> >     5.3456
> >
> >
> >
> >     13701755
> >
> >
> >
> >     IronPython.Runtime.Types
> >
> >
> >
> >     OldClass.TryLookupSlot(SymbolId name, out object ret)
> >
> >
> >
> >     1.5573
> >
> >
> >
> >     2.8124
> >
> >
> >
> >     389516
> >
> >
> >
> >     IronPython.Runtime.Operations
> >
> >
> >
> >     Ops.GetAttr(ICallerContext context, object o, SymbolId name)
> >
> >
> >
> >     1.5365
> >
> >
> >
> >     5.3456
> >
> >
> >
> >     558524
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Textpos()
> >
> >
> >
> >     1.4020
> >
> >
> >
> >     1.4020
> >
> >
> >
> >     32648926
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Advance()
> >
> >
> >
> >     1.1916
> >
> >
> >
> >     2.9526
> >
> >
> >
> >     13703950
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Textto(int newpos)
> >
> >
> >
> >     1.1218
> >
> >
> >
> >     1.1218
> >
> >
> >
> >     24120890
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPeek()
> >
> >
> >
> >     1.0579
> >
> >
> >
> >     1.0579
> >
> >
> >
> >     24120894
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     Regex.Run(bool quick, int prevlen, string input, int beginning, int
> >     length, int startat)
> >
> >
> >
> >     0.7280
> >
> >
> >
> >     102.4644
> >
> >
> >
> >     43009
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPush(int I1)
> >
> >
> >
> >     0.6834
> >
> >
> >
> >     0.6834
> >
> >
> >
> >     13745149
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.StackPush(int I1)
> >
> >
> >
> >     0.6542
> >
> >
> >
> >     0.6542
> >
> >
> >
> >     13703955
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPop()
> >
> >
> >
> >     0.6068
> >
> >
> >
> >     0.6068
> >
> >
> >
> >     13663035
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPush()
> >
> >
> >
> >     0.6049
> >
> >
> >
> >     0.6049
> >
> >
> >
> >     13708230
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.StackPop()
> >
> >
> >
> >     0.5836
> >
> >
> >
> >     0.5836
> >
> >
> >
> >     13703956
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.Bump()
> >
> >
> >
> >     0.4987
> >
> >
> >
> >     0.4987
> >
> >
> >
> >     10472790
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPush(int I1, int I2)
> >
> >
> >
> >     0.4864
> >
> >
> >
> >     0.4864
> >
> >
> >
> >     10472790
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPeek(int i)
> >
> >
> >
> >     0.4663
> >
> >
> >
> >     0.4663
> >
> >
> >
> >     10457859
> >
> >
> >
> >     System.Text.RegularExpressions
> >
> >
> >
> >     RegexInterpreter.TrackPop(int framesize)
> >
> >
> >
> >     0.4396
> >
> >
> >
> >     0.4396
> >
> >
> >
> >     10457859
> >
> >
> >
> >
> >     Moving up the stack of regex.Go(), most calls originate from
> >     sgmllib's parse_starttag.
> >
> >     HTH,
> >     -Birsch
> >
> >     On Thu, Feb 21, 2008 at 2:35 PM, Birsch <birsch at gmail.com
> >     <mailto:birsch at gmail.com>> wrote:
> >
> >     Thanks Michael and Dino.
> >
> >     I'll prof and send update. Got a good profiler recommendation for
> .Net?
> >     Meanwhile I noticed the sample site below causes BeautifulSoup to
> >     generate quite a few [python] exceptions during __init__. Does
> >     IronPython handle exceptions significantly slower than CPtyhon?
> >
> >     Repro code is simple (just build a BeautifulSoup obj with mininova's
> >     home page).
> >     Here are the .py and .cs I used to time the diffs:
> >
> >     *bstest.py:*
> >     #Bypass CPython default socket implementation with IPCE/FePy
> >     import imp, os, sys
> >     sys.modules['socket'] = module = imp.new_module('socket')
> >     execfile('socket.py', module.__dict__)
> >
> >     from BeautifulSoup import BeautifulSoup
> >     from urllib import urlopen
> >     import datetime
> >
> >     def getContent(url):
> >         #Download html data
> >         startTime = datetime.datetime.now()
> >         print "Getting url", url
> >         html = urlopen(url).read()
> >         print "Time taken:", datetime.datetime.now() - startTime
> >
> >         #Make soup
> >         startTime = datetime.datetime.now()
> >         print "Making soup..."
> >         soup = BeautifulSoup(markup=html)
> >         print "Time taken:", datetime.datetime.now() - startTime
> >
> >     if __name__ == "__main__":
> >         print getContent("www.mininova.org <http://www.mininova.org>")
> >
> >
> >     *C#:*
> >     using System;
> >     using System.Collections.Generic;
> >     using System.Text;
> >     using IronPython.Hosting;
> >
> >     namespace IronPythonBeautifulSoupTest
> >     {
> >         public class Program
> >         {
> >             public static void Main(string[] args)
> >             {
> >                 //Init
> >                 System.Console.WriteLine("Starting...");
> >                 DateTime start = DateTime.Now;
> >                 PythonEngine engine = new PythonEngine();
> >
> >                 //Add paths:
> >                 //BeautifulSoup.py, socket.py, bstest.py located on exe
> dir
> >                 engine.AddToPath(@".");
> >                 //CPython Lib (replace with your own)
> >                 engine.AddToPath(@"D:\Dev\Python\Lib");
> >
> >                 //Import and load
> >                 TimeSpan span = DateTime.Now - start;
> >                 System.Console.WriteLine("[1] Import: " +
> >     span.TotalSeconds);
> >                 DateTime d = DateTime.Now;
> >                 engine.ExecuteFile(@"bstest.py");
> >                 span = DateTime.Now - d;
> >                 System.Console.WriteLine("[2] Load: " +
> span.TotalSeconds);
> >
> >                 //Execute
> >                 d = DateTime.Now;
> >                 engine.Execute("getContent(\"http://www.mininova.org\<http://www.mininova.org%5C>
> ")");
> >                 span = DateTime.Now - d;
> >                 System.Console.WriteLine("[3] Execute: " +
> >     span.TotalSeconds);
> >                 span = DateTime.Now - start;
> >                 System.Console.WriteLine("Total: " + span.TotalSeconds);
> >
> >
> >             }
> >         }
> >     }
> >
> >
> >     On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland
> >     <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>>
> >     wrote:
> >
> >     We've actually had this issue reported once before a long time ago -
> >     it's a very low CodePlex ID -
> >     http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651
> >
> >     We haven't had a chance to investigate the end-to-end scenario.  If
> >     someone could come up with a smaller simpler repro that'd be great.
> >      Otherwise we haven't forgotten about it we've just had more
> >     immediately pressing issues to work on :(.
> >
> >
> >     -----Original Message-----
> >     From: users-bounces at lists.ironpython.com
> >     <mailto:users-bounces at lists.ironpython.com>
> >     [mailto:users-bounces at lists.ironpython.com
> >     <mailto:users-bounces at lists.ironpython.com>] On Behalf Of Michael
> Foord
> >     Sent: Wednesday, February 20, 2008 5:20 AM
> >     To: Discussion of IronPython
> >     Subject: Re: [IronPython] Slow Performance of CPython libs?
> >
> >     Birsch wrote:
> >      > Hi - We've been using IronPython successfully to allow
> extensibility
> >      > of our application.
> >      >
> >      > Overall we are happy with the performance, with the exception of
> >      > BeautifulSoup which seems to run very slowly: x5 or more time to
> >      > execute compared to CPython.
> >      >
> >      > Most of the time seems to be spent during __init__() of BS, where
> the
> >      > markup is parsed.
> >      >
> >      > We suspect this has to do with the fact that our CPython env is
> >      > executing .pyc files and can precompile its libs, while the
> >     IronPython
> >      > environment compiles each iteration. We couldn't find a way to
> >      > pre-compile the libs and then introduce them into the code, but
> >     in any
> >      > case this will result in a large management overhead since the
> amount
> >      > of CPython libs we expose to our users contains 100's of modules.
> >      >
> >      > Any ideas on how to optimize?
> >
> >     I think it is worth doing real profiling to find out where the time
> is
> >     being spent during parsing.
> >
> >     If it is spending most of the time in '__init__' then the time is
> >     probably not spent in importing - so compilation isn't relevant and
> it
> >     is a runtime performance issue. (Importing is much slower with
> >     IronPython and at Resolver Systems we do use precompiled binaries -
> but
> >     strangely enough it doesn't provide much of a performance gain.)
> >
> >     Michael
> >     http://www.manning.com/foord
> >
> >      >
> >      > Thanks,
> >      > -Birsch
> >      >
> >      > Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime
> DLLs
> >      > (this was done to overcome library incompatibilities and network
> >      > errors). However, the relevant slow .py code (mainly SGMLParser
> and
> >      > BeautifulSoup) is the same.
> >      >
> >
> ------------------------------------------------------------------------
> >      >
> >      > _______________________________________________
> >      > Users mailing list
> >      > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> >      > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >      >
> >
> >     _______________________________________________
> >     Users mailing list
> >     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> >     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >     _______________________________________________
> >     Users mailing list
> >     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> >     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
> >
> >
> >
> >
> >
> >     _______________________________________________
> >     Users mailing list
> >     Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> >     http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20080222/232837a9/attachment.html>


More information about the Ironpython-users mailing list