[IronPython] Slow Performance of CPython libs?
Birsch
birsch at gmail.com
Fri Feb 22 18:40:05 CET 2008
Dino - do you want me to register a workitem? Also, if I wanted to patch
v1.1.1, where should I start?
On Thu, Feb 21, 2008 at 8:48 PM, Dino Viehland <dinov at exchange.microsoft.com>
wrote:
> After Birsh's last comment that they're calling compile I was thinking we
> might be doing something really stupid. Turns out that we are. For our
> supposedly compiled regex's we're not passing the Compiled option. So
> they're all getting interpretted and of course that's going to suck big
> time.
>
> So the fix for this is probably to just pass RegexOptions.Compiled to the
> Regex object being contructed into RE_Pattern. Internally the Regex class
> will compile to dynamic methods (much better than compiling to an assembly
> because it'll be collectible).
>
> It's actually a little more complex than that because we create RE_Pattern
> objects for the non-compiled so we need to flow the option in, but it's
> still a simple fix.
>
> -----Original Message-----
> From: users-bounces at lists.ironpython.com [mailto:
> users-bounces at lists.ironpython.com] On Behalf Of Dan Shechter
> Sent: Thursday, February 21, 2008 10:19 AM
> To: Discussion of IronPython
> Subject: Re: [IronPython] Slow Performance of CPython libs?
>
> Just out of curiosity...
> I'm guessing that there are two levels of caching that could be used.
> One would be to hold a global dictionary of some sort of pattern ->
> Regex that will save the lengthy parsing of the re.
>
> The other would be to actually call Regex.CompileToAssembly to get a
> more efficient reperesentation (in runtime that is) of the re.
>
> Would you (the IP team) favor the former or the latter?
>
> Shechter.
>
>
> Birsch wrote:
> > I checked both sgmllib.py and BeautifulSoup.py - and it seems both are
> > reusing the same regexps (searched for re.compile).
> >
> > I think your suggestion is very relevant in this case. It makes sense to
> > replicate the "compile once use many" behavior that is commonly used
> > with regexp.
> >
> > -Birsch
> >
> > On Thu, Feb 21, 2008 at 7:30 PM, Dino Viehland
> > <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>>
> wrote:
> >
> > Do you know if the same reg ex is being used repeatedly? If so
> > maybe we can cache & compile the regex instead of interpretting it
> > all the time.
> >
> >
> >
> > *From:* users-bounces at lists.ironpython.com
> > <mailto:users-bounces at lists.ironpython.com>
> > [mailto:users-bounces at lists.ironpython.com
> > <mailto:users-bounces at lists.ironpython.com>] *On Behalf Of *Birsch
> > *Sent:* Thursday, February 21, 2008 8:30 AM
> >
> > *To:* Discussion of IronPython
> > *Subject:* Re: [IronPython] Slow Performance of CPython libs?
> >
> >
> >
> > I took on Cooper's advice and profiled with ANTS. Here are the top
> > methods:
> >
> > *Namespace*
> >
> >
> >
> > *Method name*
> >
> >
> >
> > *Time (sec.)*
> >
> >
> >
> > *Time with children (sec.)*
> >
> >
> >
> > *Hit count*
> >
> >
> >
> > *Source file*
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Go()
> >
> >
> >
> > 37.0189
> >
> >
> >
> > 94.4676
> >
> >
> >
> > 13689612
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Operator()
> >
> >
> >
> > 6.2411
> >
> >
> >
> > 6.2411
> >
> >
> >
> > 131146274
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Advance(int i)
> >
> >
> >
> > 5.9264
> >
> >
> >
> > 8.7202
> >
> >
> >
> > 66000263
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.SetOperator(int op)
> >
> >
> >
> > 5.5750
> >
> >
> >
> > 5.5750
> >
> >
> >
> > 131146274
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Backtrack()
> >
> >
> >
> > 5.5692
> >
> >
> >
> > 9.4895
> >
> >
> >
> > 37781343
> >
> >
> >
> > IronPython.Runtime.Operations
> >
> >
> >
> > Ops.CallWithContext(ICallerContext context, object func, object
> > arg0, object arg1)
> >
> >
> >
> > 5.5572
> >
> >
> >
> > 114.5245
> >
> >
> >
> > 79754
> >
> >
> >
> > IronPython.Runtime.Calls
> >
> >
> >
> > Method.Call(ICallerContext context, object arg0)
> >
> >
> >
> > 4.9052
> >
> >
> >
> > 114.8251
> >
> >
> >
> > 50886
> >
> >
> >
> > IronPython.Runtime.Calls
> >
> >
> >
> > PythonFunction.CallInstance(ICallerContext context, object arg0,
> > object arg1)
> >
> >
> >
> > 4.8876
> >
> >
> >
> > 114.8059
> >
> >
> >
> > 50886
> >
> >
> >
> > IronPython.Runtime.Calls
> >
> >
> >
> > Function2.Call(ICallerContext context, object arg0, object arg1)
> >
> >
> >
> > 4.6400
> >
> >
> >
> > 114.5471
> >
> >
> >
> > 47486
> >
> >
> >
> > IronPython.Runtime.Operations
> >
> >
> >
> > Ops.CallWithContext(ICallerContext context, object func, object
> arg0)
> >
> >
> >
> > 4.2344
> >
> >
> >
> > 114.1604
> >
> >
> >
> > 146658
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexBoyerMoore.Scan(string text, int index, int beglimit, int
> endlimit)
> >
> >
> >
> > 3.6465
> >
> >
> >
> > 3.6465
> >
> >
> >
> > 13678131
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexCharClass.CharInClassRecursive(char ch, string set, int start)
> >
> >
> >
> > 3.6288
> >
> >
> >
> > 5.7113
> >
> >
> >
> > 31508162
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Goto(int newpos)
> >
> >
> >
> > 3.2058
> >
> >
> >
> > 5.1470
> >
> >
> >
> > 27364668
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Operand(int i)
> >
> >
> >
> > 3.1923
> >
> >
> >
> > 3.1923
> >
> >
> >
> > 73230687
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexRunner.EnsureStorage()
> >
> >
> >
> > 3.0803
> >
> >
> >
> > 3.0803
> >
> >
> >
> > 51474823
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexCharClass.CharInClass(char ch, string set)
> >
> >
> >
> > 3.0713
> >
> >
> >
> > 8.7827
> >
> >
> >
> > 31508162
> >
> >
> >
> > IronPython.Runtime.Calls
> >
> >
> >
> > Method.Call(ICallerContext context, object arg0, object arg1)
> >
> >
> >
> > 2.9821
> >
> >
> >
> > 7.8675
> >
> >
> >
> > 15012
> >
> >
> >
> > IronPython.Runtime.Calls
> >
> >
> >
> > PythonFunction.CallInstance(ICallerContext context, object arg0,
> > object arg1, object arg2)
> >
> >
> >
> > 2.9794
> >
> >
> >
> > 7.8639
> >
> >
> >
> > 15012
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Forwardcharnext()
> >
> >
> >
> > 2.8852
> >
> >
> >
> > 2.8852
> >
> >
> >
> > 62865185
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Forwardchars()
> >
> >
> >
> > 2.8279
> >
> >
> >
> > 2.8279
> >
> >
> >
> > 59436277
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexCharClass.CharInClassInternal(char ch, string set, int start,
> > int mySetLength, int myCategoryLength)
> >
> >
> >
> > 2.0632
> >
> >
> >
> > 2.0826
> >
> >
> >
> > 31508162
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexRunner.Scan(Regex regex, string text, int textbeg, int textend,
> > int textstart, int prevlen, bool quick)
> >
> >
> >
> > 1.8376
> >
> >
> >
> > 101.7226
> >
> >
> >
> > 43009
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.FindFirstChar()
> >
> >
> >
> > 1.6405
> >
> >
> >
> > 5.3456
> >
> >
> >
> > 13701755
> >
> >
> >
> > IronPython.Runtime.Types
> >
> >
> >
> > OldClass.TryLookupSlot(SymbolId name, out object ret)
> >
> >
> >
> > 1.5573
> >
> >
> >
> > 2.8124
> >
> >
> >
> > 389516
> >
> >
> >
> > IronPython.Runtime.Operations
> >
> >
> >
> > Ops.GetAttr(ICallerContext context, object o, SymbolId name)
> >
> >
> >
> > 1.5365
> >
> >
> >
> > 5.3456
> >
> >
> >
> > 558524
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Textpos()
> >
> >
> >
> > 1.4020
> >
> >
> >
> > 1.4020
> >
> >
> >
> > 32648926
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Advance()
> >
> >
> >
> > 1.1916
> >
> >
> >
> > 2.9526
> >
> >
> >
> > 13703950
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Textto(int newpos)
> >
> >
> >
> > 1.1218
> >
> >
> >
> > 1.1218
> >
> >
> >
> > 24120890
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPeek()
> >
> >
> >
> > 1.0579
> >
> >
> >
> > 1.0579
> >
> >
> >
> > 24120894
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > Regex.Run(bool quick, int prevlen, string input, int beginning, int
> > length, int startat)
> >
> >
> >
> > 0.7280
> >
> >
> >
> > 102.4644
> >
> >
> >
> > 43009
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPush(int I1)
> >
> >
> >
> > 0.6834
> >
> >
> >
> > 0.6834
> >
> >
> >
> > 13745149
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.StackPush(int I1)
> >
> >
> >
> > 0.6542
> >
> >
> >
> > 0.6542
> >
> >
> >
> > 13703955
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPop()
> >
> >
> >
> > 0.6068
> >
> >
> >
> > 0.6068
> >
> >
> >
> > 13663035
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPush()
> >
> >
> >
> > 0.6049
> >
> >
> >
> > 0.6049
> >
> >
> >
> > 13708230
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.StackPop()
> >
> >
> >
> > 0.5836
> >
> >
> >
> > 0.5836
> >
> >
> >
> > 13703956
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.Bump()
> >
> >
> >
> > 0.4987
> >
> >
> >
> > 0.4987
> >
> >
> >
> > 10472790
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPush(int I1, int I2)
> >
> >
> >
> > 0.4864
> >
> >
> >
> > 0.4864
> >
> >
> >
> > 10472790
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPeek(int i)
> >
> >
> >
> > 0.4663
> >
> >
> >
> > 0.4663
> >
> >
> >
> > 10457859
> >
> >
> >
> > System.Text.RegularExpressions
> >
> >
> >
> > RegexInterpreter.TrackPop(int framesize)
> >
> >
> >
> > 0.4396
> >
> >
> >
> > 0.4396
> >
> >
> >
> > 10457859
> >
> >
> >
> >
> > Moving up the stack of regex.Go(), most calls originate from
> > sgmllib's parse_starttag.
> >
> > HTH,
> > -Birsch
> >
> > On Thu, Feb 21, 2008 at 2:35 PM, Birsch <birsch at gmail.com
> > <mailto:birsch at gmail.com>> wrote:
> >
> > Thanks Michael and Dino.
> >
> > I'll prof and send update. Got a good profiler recommendation for
> .Net?
> > Meanwhile I noticed the sample site below causes BeautifulSoup to
> > generate quite a few [python] exceptions during __init__. Does
> > IronPython handle exceptions significantly slower than CPtyhon?
> >
> > Repro code is simple (just build a BeautifulSoup obj with mininova's
> > home page).
> > Here are the .py and .cs I used to time the diffs:
> >
> > *bstest.py:*
> > #Bypass CPython default socket implementation with IPCE/FePy
> > import imp, os, sys
> > sys.modules['socket'] = module = imp.new_module('socket')
> > execfile('socket.py', module.__dict__)
> >
> > from BeautifulSoup import BeautifulSoup
> > from urllib import urlopen
> > import datetime
> >
> > def getContent(url):
> > #Download html data
> > startTime = datetime.datetime.now()
> > print "Getting url", url
> > html = urlopen(url).read()
> > print "Time taken:", datetime.datetime.now() - startTime
> >
> > #Make soup
> > startTime = datetime.datetime.now()
> > print "Making soup..."
> > soup = BeautifulSoup(markup=html)
> > print "Time taken:", datetime.datetime.now() - startTime
> >
> > if __name__ == "__main__":
> > print getContent("www.mininova.org <http://www.mininova.org>")
> >
> >
> > *C#:*
> > using System;
> > using System.Collections.Generic;
> > using System.Text;
> > using IronPython.Hosting;
> >
> > namespace IronPythonBeautifulSoupTest
> > {
> > public class Program
> > {
> > public static void Main(string[] args)
> > {
> > //Init
> > System.Console.WriteLine("Starting...");
> > DateTime start = DateTime.Now;
> > PythonEngine engine = new PythonEngine();
> >
> > //Add paths:
> > //BeautifulSoup.py, socket.py, bstest.py located on exe
> dir
> > engine.AddToPath(@".");
> > //CPython Lib (replace with your own)
> > engine.AddToPath(@"D:\Dev\Python\Lib");
> >
> > //Import and load
> > TimeSpan span = DateTime.Now - start;
> > System.Console.WriteLine("[1] Import: " +
> > span.TotalSeconds);
> > DateTime d = DateTime.Now;
> > engine.ExecuteFile(@"bstest.py");
> > span = DateTime.Now - d;
> > System.Console.WriteLine("[2] Load: " +
> span.TotalSeconds);
> >
> > //Execute
> > d = DateTime.Now;
> > engine.Execute("getContent(\"http://www.mininova.org\<http://www.mininova.org%5C>
> ")");
> > span = DateTime.Now - d;
> > System.Console.WriteLine("[3] Execute: " +
> > span.TotalSeconds);
> > span = DateTime.Now - start;
> > System.Console.WriteLine("Total: " + span.TotalSeconds);
> >
> >
> > }
> > }
> > }
> >
> >
> > On Wed, Feb 20, 2008 at 6:57 PM, Dino Viehland
> > <dinov at exchange.microsoft.com <mailto:dinov at exchange.microsoft.com>>
> > wrote:
> >
> > We've actually had this issue reported once before a long time ago -
> > it's a very low CodePlex ID -
> > http://www.codeplex.com/IronPython/WorkItem/View.aspx?WorkItemId=651
> >
> > We haven't had a chance to investigate the end-to-end scenario. If
> > someone could come up with a smaller simpler repro that'd be great.
> > Otherwise we haven't forgotten about it we've just had more
> > immediately pressing issues to work on :(.
> >
> >
> > -----Original Message-----
> > From: users-bounces at lists.ironpython.com
> > <mailto:users-bounces at lists.ironpython.com>
> > [mailto:users-bounces at lists.ironpython.com
> > <mailto:users-bounces at lists.ironpython.com>] On Behalf Of Michael
> Foord
> > Sent: Wednesday, February 20, 2008 5:20 AM
> > To: Discussion of IronPython
> > Subject: Re: [IronPython] Slow Performance of CPython libs?
> >
> > Birsch wrote:
> > > Hi - We've been using IronPython successfully to allow
> extensibility
> > > of our application.
> > >
> > > Overall we are happy with the performance, with the exception of
> > > BeautifulSoup which seems to run very slowly: x5 or more time to
> > > execute compared to CPython.
> > >
> > > Most of the time seems to be spent during __init__() of BS, where
> the
> > > markup is parsed.
> > >
> > > We suspect this has to do with the fact that our CPython env is
> > > executing .pyc files and can precompile its libs, while the
> > IronPython
> > > environment compiles each iteration. We couldn't find a way to
> > > pre-compile the libs and then introduce them into the code, but
> > in any
> > > case this will result in a large management overhead since the
> amount
> > > of CPython libs we expose to our users contains 100's of modules.
> > >
> > > Any ideas on how to optimize?
> >
> > I think it is worth doing real profiling to find out where the time
> is
> > being spent during parsing.
> >
> > If it is spending most of the time in '__init__' then the time is
> > probably not spent in importing - so compilation isn't relevant and
> it
> > is a runtime performance issue. (Importing is much slower with
> > IronPython and at Resolver Systems we do use precompiled binaries -
> but
> > strangely enough it doesn't provide much of a performance gain.)
> >
> > Michael
> > http://www.manning.com/foord
> >
> > >
> > > Thanks,
> > > -Birsch
> > >
> > > Note: we're using FePy/IPCE libs with regular IP v1.1.1 runtime
> DLLs
> > > (this was done to overcome library incompatibilities and network
> > > errors). However, the relevant slow .py code (mainly SGMLParser
> and
> > > BeautifulSoup) is the same.
> > >
> >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> > > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> > >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com <mailto:Users at lists.ironpython.com>
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.ironpython.com
> > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users at lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ironpython-users/attachments/20080222/232837a9/attachment.html>
More information about the Ironpython-users
mailing list