[IronPython] Slow Performance of CPython libs?

Birsch birsch at gmail.com
Thu Feb 21 19:04:46 CET 2008


I checked both sgmllib.py and BeautifulSoup.py - and it seems both are
reusing the same regexps (searched for re.compile).

I think your suggestion is very relevant in this case. It makes sense to
replicate the "compile once use many" behavior that is commonly used with
regexp.

-Birsch

On Thu, Feb 21, 2008 at 7:30 PM, Dino Viehland <dinov at exchange.microsoft.com>
wrote:

>  Do you know if the same reg ex is being used repeatedly?  If so maybe we
> can cache & compile the regex instead of interpretting it all the time.
>
>
>
> *From:* users-bounces at lists.ironpython.com [mailto:
> users-bounces at lists.ironpython.com] *On Behalf Of *Birsch
> *Sent:* Thursday, February 21, 2008 8:30 AM
>
> *To:* Discussion of IronPython
> *Subject:* Re: [IronPython] Slow Performance of CPython libs?
>
>
>
> I took on Cooper's advice and profiled with ANTS. Here are the top
> methods:
>
> *Namespace*
>
> *Method name*
>
> *Time (sec.)*
>
> *Time with children (sec.)*
>
> *Hit count*
>
> *Source file*
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Go()
>
> 37.0189
>
> 94.4676
>
> 13689612
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Operator()
>
> 6.2411
>
> 6.2411
>
> 131146274
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Advance(int i)
>
> 5.9264
>
> 8.7202
>
> 66000263
>
> System.Text.RegularExpressions
>
> RegexInterpreter.SetOperator(int op)
>
> 5.5750
>
> 5.5750
>
> 131146274
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Backtrack()
>
> 5.5692
>
> 9.4895
>
> 37781343
>
> IronPython.Runtime.Operations
>
> Ops.CallWithContext(ICallerContext context, object func, object arg0,
> object arg1)
>
> 5.5572
>
> 114.5245
>
> 79754
>
> IronPython.Runtime.Calls
>
> Method.Call(ICallerContext context, object arg0)
>
> 4.9052
>
> 114.8251
>
> 50886
>
> IronPython.Runtime.Calls
>
> PythonFunction.CallInstance(ICallerContext context, object arg0, object
> arg1)
>
> 4.8876
>
> 114.8059
>
> 50886
>
> IronPython.Runtime.Calls
>
> Function2.Call(ICallerContext context, object arg0, object arg1)
>
> 4.6400
>
> 114.5471
>
> 47486
>
> IronPython.Runtime.Operations
>
> Ops.CallWithContext(ICallerContext context, object func, object arg0)
>
> 4.2344
>
> 114.1604
>
> 146658
>
> System.Text.RegularExpressions
>
> RegexBoyerMoore.Scan(string text, int index, int beglimit, int endlimit)
>
> 3.6465
>
> 3.6465
>
> 13678131
>
> System.Text.RegularExpressions
>
> RegexCharClass.CharInClassRecursive(char ch, string set, int start)
>
> 3.6288
>
> 5.7113
>
> 31508162
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Goto(int newpos)
>
> 3.2058
>
> 5.1470
>
> 27364668
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Operand(int i)
>
> 3.1923
>
> 3.1923
>
> 73230687
>
> System.Text.RegularExpressions
>
> RegexRunner.EnsureStorage()
>
> 3.0803
>
> 3.0803
>
> 51474823
>
> System.Text.RegularExpressions
>
> RegexCharClass.CharInClass(char ch, string set)
>
> 3.0713
>
> 8.7827
>
> 31508162
>
> IronPython.Runtime.Calls
>
> Method.Call(ICallerContext context, object arg0, object arg1)
>
> 2.9821
>
> 7.8675
>
> 15012
>
> IronPython.Runtime.Calls
>
> PythonFunction.CallInstance(ICallerContext context, object arg0, object
> arg1, object arg2)
>
> 2.9794
>
> 7.8639
>
> 15012
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Forwardcharnext()
>
> 2.8852
>
> 2.8852
>
> 62865185
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Forwardchars()
>
> 2.8279
>
> 2.8279
>
> 59436277
>
> System.Text.RegularExpressions
>
> RegexCharClass.CharInClassInternal(char ch, string set, int start, int
> mySetLength, int myCategoryLength)
>
> 2.0632
>
> 2.0826
>
> 31508162
>
> System.Text.RegularExpressions
>
> RegexRunner.Scan(Regex regex, string text, int textbeg, int textend, int
> textstart, int prevlen, bool quick)
>
> 1.8376
>
> 101.7226
>
> 43009
>
> System.Text.RegularExpressions
>
> RegexInterpreter.FindFirstChar()
>
> 1.6405
>
> 5.3456
>
> 13701755
>
> IronPython.Runtime.Types
>
> OldClass.TryLookupSlot(SymbolId name, out object ret)
>
> 1.5573
>
> 2.8124
>
> 389516
>
> IronPython.Runtime.Operations
>
> Ops.GetAttr(ICallerContext context, object o, SymbolId name)
>
> 1.5365
>
> 5.3456
>
> 558524
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Textpos()
>
> 1.4020
>
> 1.4020
>
> 32648926
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Advance()
>
> 1.1916
>
> 2.9526
>
> 13703950
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Textto(int newpos)
>
> 1.1218
>
> 1.1218
>
> 24120890
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPeek()
>
> 1.0579
>
> 1.0579
>
> 24120894
>
> System.Text.RegularExpressions
>
> Regex.Run(bool quick, int prevlen, string input, int beginning, int
> length, int startat)
>
> 0.7280
>
> 102.4644
>
> 43009
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPush(int I1)
>
> 0.6834
>
> 0.6834
>
> 13745149
>
> System.Text.RegularExpressions
>
> RegexInterpreter.StackPush(int I1)
>
> 0.6542
>
> 0.6542
>
> 13703955
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPop()
>
> 0.6068
>
> 0.6068
>
> 13663035
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPush()
>
> 0.6049
>
> 0.6049
>
> 13708230
>
> System.Text.RegularExpressions
>
> RegexInterpreter.StackPop()
>
> 0.5836
>
> 0.5836
>
> 13703956
>
> System.Text.RegularExpressions
>
> RegexInterpreter.Bump()
>
> 0.4987
>
> 0.4987
>
> 10472790
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPush(int I1, int I2)
>
> 0.4864
>
> 0.4864
>
> 10472790
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPeek(int i)
>
> 0.4663
>
> 0.4663
>
> 10457859
>
> System.Text.RegularExpressions
>
> RegexInterpreter.TrackPop(int framesize)
>
> 0.4396
>
> 0.4396
>
> 10457859
>
>
> Moving up the stack of regex.Go(), most calls originate from sgmllib's
> parse_starttag.
>
> HTH,
> -Birsch
>
> On Thu, Feb 21, 2008 at 2:35 PM, Birsch <birsch at gmail.com> wrote:
>
> Thanks Michael and Dino.
>
> I'll prof and send update. Got a good profiler recommendation for .Net?
> Meanwhile I noticed the sample site below causes BeautifulSoup to generate
> quite a few [python] exceptions during __init__. Does IronPython handle
> exceptions significantly slower than CPtyhon?
>
> Repro code is simple (just build a BeautifulSoup obj with mininova's home
> page).
> Here are the .py and .cs I used to time the diffs:
>
> *bstest.py:*
> #Bypass CPython default socket implementation with IPCE/FePy
> import imp, os, sys
> sys.modules['socket'] = module = imp.new_module('socket')
> execfile('socket.py', module.__dict__)
>
> from BeautifulSoup import BeautifulSoup
> from urllib import urlopen
> import datetime
>
> def getContent(url):
>     #Download html data
>     startTime = datetime.datetime.now()
>     print "Getting url", url
>     html = urlopen(url).read()
>     print "Time taken:", datetime.datetime.now() - startTime
>
>     #Make soup
>     startTime = datetime.datetime.now()
>     print "Making soup..."
>     soup = BeautifulSoup(markup=html)
>     print "Time taken:", datetime.datetime.now() - startTime
>
> if __name__ == "__main__":
>     print getContent("www.mininova.org")
>
>
> *C#:*
> using System;
> using System.Collections.Generic;
> using System.Text;
> using IronPython.Hosting;
>
> namespace IronPythonBeautifulSoupTest
> {
>     public class Program
>     {
>         public static void Main(string[] args)
>         {
>             //Init
>             System.Console.WriteLine("Starting...");
>             DateTime start = DateTime.Now;
>             PythonEngine engine = new PythonEngine();
>
>             //Add paths:
>             //BeautifulSoup.py, socket.py, bstest.py located on exe dir
>             engine.AddToPath(@".");
>             //CPython Lib (replace with your own)
>             engine.AddToPath(@"D:\Dev\Python\Lib");
>
>             //Import and load
>             TimeSpan span = DateTime.Now - start;
>             System.Console.WriteLine("[1] Import: " + span.TotalSeconds);
>             DateTime d = DateTime.Now;
>             engine.ExecuteFile(@"bstest.py");
>             span = DateTime.Now - d;
>             System.Console.WriteLine("[2] Load: " + span.TotalSeconds);
>
>             //Execute
>             d = DateTime.Now;
>             engine.Execute("getContent(\"http://www.mininova.org\")");
>             span = DateTime.Now - d;
>             System.Console.WriteLine("[3] Execute: " + span.TotalSeconds);
>             span = DateTime.Now - start;
>             System.Console.WriteLine("Total: " + span.TotalSeconds);
>
>
>         }
>     }
> }
>
>
>  On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland <
> dinov at exchange.microsoft.com> wrote:
>
> We've actually had this issue reported once before a long time ago - it's
> a very low CodePlex ID -
> http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651
>
> We haven't had a chance to investigate the end-to-end scenario.  If
> someone could come up with a smaller simpler repro that'd be great.
>  Otherwise we haven't forgotten about it we've just had more immediately
> pressing issues to work on :(.
>
>
> -----Original Message-----
> From: users-bounces at lists.ironpython.com [mailto:
> users-bounces at lists.ironpython.com] On Behalf Of Michael Foord
> Sent: Wednesday, February 20, 2008 5:20 AM
> To: Discussion of IronPython
> Subject: Re: [IronPython] Slow Performance of CPython libs?
>
> Birsch wrote:
> > Hi - We've been using IronPython successfully to allow extensibility
> > of our application.
> >
> > Overall we are happy with the performance, with the exception of
> > BeautifulSoup which seems to run very slowly: x5 or more time to
> > execute compared to CPython.
> >
> > Most of the time seems to be spent during __init__() of BS, where the
> > markup is parsed.
> >
> > We suspect this has to do with the fact that our CPython env is
> > executing .pyc files and can precompile its libs, while the IronPython
> > environment compiles each iteration. We couldn't find a way to
> > pre-compile the libs and then introduce them into the code, but in any
> > case this will result in a large management overhead since the amount
> > of CPython libs we expose to our users contains 100's of modules.
> >
> > Any ideas on how to optimize?
>
> I think it is worth doing real profiling to find out where the time is
> being spent during parsing.
>
> If it is spending most of the time in '__init__' then the time is
> probably not spent in importing - so compilation isn't relevant and it
> is a runtime performance issue. (Importing is much slower with
> IronPython and at Resolver Systems we do use precompiled binaries - but
> strangely enough it doesn't provide much of a performance gain.)
>
> Michael
> http://www.manning.com/foord
>
> >
> > Thanks,
> > -Birsch
> >
> > Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime DLLs
> > (this was done to overcome library incompatibilities and network
> > errors). However, the relevant slow .py code (mainly SGMLParser and
> > BeautifulSoup) is the same.
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20080221/337b598a/attachment.html>


More information about the Ironpython-users mailing list