From dkuhlman at rexx.com Sat Feb 3 00:39:42 2007 From: dkuhlman at rexx.com (Dave Kuhlman) Date: Fri, 2 Feb 2007 15:39:42 -0800 Subject: [Web-SIG] Support tools for analyzing pages on the Web Message-ID: <20070202233942.GA38964@cutter.rexx.com> I'd like to implement and explore tools for analyzing Web pages. I have in mind things like: - Tracing links from a Web page. Building a tree structure of links to a specified depth. - Tracing links to a Web page. Showing incoming links to a specified depth. - Word count, word frequency analysis, words in context, etc. - Etc. Basically, I'm interested in looking at the structure of the Web and trying to help make it useful. So, my question: Are there existing tools (in Python) of course for this kind of thing. I'd like (1) not to reinvent what is already there and (2) to make use of what already exists. I've done a few Web searches, but have not found that much of interest. I plan to start with BeautifulSoup.py at a minimum. Thanks for help. And, I'd be interested in any ideas and suggestions. Dave -- Dave Kuhlman http://www.rexx.com/~dkuhlman From christian at dowski.com Sat Feb 3 04:41:38 2007 From: christian at dowski.com (Christian Wyglendowski) Date: Fri, 2 Feb 2007 22:41:38 -0500 Subject: [Web-SIG] Support tools for analyzing pages on the Web In-Reply-To: <20070202233942.GA38964@cutter.rexx.com> References: <20070202233942.GA38964@cutter.rexx.com> Message-ID: On 2/2/07, Dave Kuhlman wrote: > I'd like to implement and explore tools for analyzing Web pages. I > have in mind things like: > > - Tracing links from a Web page. Building a tree structure of > links to a specified depth. > > - Tracing links to a Web page. Showing incoming links to a > specified depth. > > - Word count, word frequency analysis, words in context, etc. > > - Etc. > > Basically, I'm interested in looking at the structure of the Web > and trying to help make it useful. Sounds like an interesting project. > So, my question: Are there existing tools (in Python) of course for > this kind of thing. I'd like (1) not to reinvent what is already > there and (2) to make use of what already exists. Well, for your analysis phase, I would look at the Natural Language Tool Kit (NLTK) [1]. I haven't used it personally, but I have always wanted to try it out. The documentation is great. > I've done a few Web searches, but have not found that much of > interest. > > I plan to start with BeautifulSoup.py at a minimum. Maybe urllib2.urlopen + BeautifulSoup + nltk will be enough to get you going. Post back with any cool results. Christian From titus at caltech.edu Fri Feb 9 08:54:01 2007 From: titus at caltech.edu (Titus Brown) Date: Thu, 8 Feb 2007 23:54:01 -0800 Subject: [Web-SIG] wsgiref and wsgi.multithread/wsgi.multiprocess Message-ID: <20070209075401.GA9697@caltech.edu> Hi folks, I just ran into an interesting sanity check problem, and I was hoping you could all cross-check *my* sanity. Should the WSGI environ variables 'wsgi.multithread' and 'wsgi.multiprocess' be set to 'True' in wsgiref.simple_server.WSGIServer? They are, currently, but I see no indication in WSGIServer (inheriting from BaseHTTPServer.HTTPServer) of multithreadedness or multiprocessedness. thanks, --titus From pje at telecommunity.com Fri Feb 9 18:10:00 2007 From: pje at telecommunity.com (Phillip J. Eby) Date: Fri, 09 Feb 2007 12:10:00 -0500 Subject: [Web-SIG] wsgiref and wsgi.multithread/wsgi.multiprocess In-Reply-To: <20070209075401.GA9697@caltech.edu> Message-ID: <5.1.1.6.0.20070209120902.038b7e20@sparrow.telecommunity.com> At 11:54 PM 2/8/2007 -0800, Titus Brown wrote: >Hi folks, > >I just ran into an interesting sanity check problem, and I was hoping >you could all cross-check *my* sanity. > >Should the WSGI environ variables 'wsgi.multithread' and >'wsgi.multiprocess' be set to 'True' in >wsgiref.simple_server.WSGIServer? > >They are, currently, but I see no indication in WSGIServer >(inheriting from BaseHTTPServer.HTTPServer) of multithreadedness >or multiprocessedness. Yeah, multiprocess should probably be set false there, and multithreadedness should depend on whether the ThreadingTCPServer or whatever it's called is mixed in. (HTTPServer does in fact support this, but it's not tested in a WSGI context as far as I know.) From titus at caltech.edu Fri Feb 9 18:56:49 2007 From: titus at caltech.edu (Titus Brown) Date: Fri, 9 Feb 2007 09:56:49 -0800 Subject: [Web-SIG] wsgiref and wsgi.multithread/wsgi.multiprocess In-Reply-To: <5.1.1.6.0.20070209120902.038b7e20@sparrow.telecommunity.com> References: <20070209075401.GA9697@caltech.edu> <5.1.1.6.0.20070209120902.038b7e20@sparrow.telecommunity.com> Message-ID: <20070209175649.GA21915@caltech.edu> On Fri, Feb 09, 2007 at 12:10:00PM -0500, Phillip J. Eby wrote: -> At 11:54 PM 2/8/2007 -0800, Titus Brown wrote: -> >Hi folks, -> > -> >I just ran into an interesting sanity check problem, and I was hoping -> >you could all cross-check *my* sanity. -> > -> >Should the WSGI environ variables 'wsgi.multithread' and -> >'wsgi.multiprocess' be set to 'True' in -> >wsgiref.simple_server.WSGIServer? -> > -> >They are, currently, but I see no indication in WSGIServer -> >(inheriting from BaseHTTPServer.HTTPServer) of multithreadedness -> >or multiprocessedness. -> -> Yeah, multiprocess should probably be set false there, and -> multithreadedness should depend on whether the ThreadingTCPServer or -> whatever it's called is mixed in. (HTTPServer does in fact support this, -> but it's not tested in a WSGI context as far as I know.) OK. Err, do you want a patch? ;) The problem I'm running into is that our (Mike Orr & I) WSGI interface for Quixote does a check to make sure that the Quixote application is explicitly marked as threadsafe before allowing a multithreaded WSGI server to run it. I can't bring myself to remove this sanity check, because it does seem like a good idea, but it makes the example code a bit more complicated... --titus From t.koutsovassilis at innoscript.org Wed Feb 14 23:30:31 2007 From: t.koutsovassilis at innoscript.org (Tassos Koutsovassilis) Date: Thu, 15 Feb 2007 00:30:31 +0200 Subject: [Web-SIG] ANN: Porcupine Web Application Server v0.0.9 released Message-ID: <45D38D87.40503@innoscript.org> The inno:script team announces the new release of Porcupine server. This release introduces remarkable new features on the server side including a configurable in-memory object cache and a new post-processing filter for easy output i18n. Due to the method decorators used, Porcupine is no longer compatible with Python 2.3. We also recommend sub-classing the new type of QuiX servlet (XULSimpleTemplateServlet) instead of the primitive XULServlet class. The new type takes advantage of the new Python "string.Template" module, resulting in simpler and more readable QuiX templates. By default, the object cache is configured for keeping up to 500 objects. You can change this setting by editing the main Porcupine configuration file. Also keep in mind that each post processing filter is now declared as a child node of its registration node. See the store registrations file "store.xml" as a usage guideline. On the browser side, QuiX adds minor improvements to better support Internet Explorer 7 but also includes many minor bug fixes. Last but not least, the rendering performance is greatly improved by minimizing the number of redraws required when drawing new interfaces from XML. As a side effect of this optimization, you might need an extra call to the "redraw" method of some of your dynamically added widgets in order to have them displayed correctly. Enjoy. Resources ========= What is Porcupine? http://www.innoscript.org/content/view/30/42/ Porcupine Downloads: http://www.innoscript.org/component/option,com_remository/Itemid,33/func,selectcat/cat,1/ Porcupine online demo: http://www.innoscript.org/content/view/21/43/ Porcupine Wiki: http://wiki.innoscript.org From clifford_ilkay at dinamis.com Mon Feb 19 15:50:06 2007 From: clifford_ilkay at dinamis.com (CLIFFORD ILKAY) Date: Mon, 19 Feb 2007 09:50:06 -0500 Subject: [Web-SIG] Django Presentation at PyGTA Meeting on Feb. 20 Message-ID: <200702190950.06989.clifford_ilkay@dinamis.com> Hello, I will be presenting an overview of the Django web framework at the monthly PyGTA (Greater Toronto Area Python user group) meeting on Feb. 20. Django is represented as being "The Web framework for perfectionists with deadlines." In my experience, that is an apt description. I have found it to be coherent, powerful, well-documented, and very approachable. The support one can get on the Django IRC channel (irc.freenode.net, #django) and the Google Group is very good. There is an on-line book at , which fleshes out the documentation on the main site and the Wiki . When ---- Feb. 20, 2007 - 6:30 p.m. - informal part of the meeting where we can get (non-alcoholic) drinks and socialize 7:00 p.m. - formal part of the meeting starts (formal wear not required) Between 8:30 p.m. to 9:00 p.m. - wrap up and go to a nearby restaurant for beer, ice cream, hot chocolate, nibbles, sparkling conversation, etc. Where ----- LinuxCaffe (yes, that is how it is spelled) 326 Harbord Street, Toronto, ON, M6G 1H1 416-534-2116 It is one block south of the Christie subway station for those who will be taking TTC. There is plenty of parking in the area for those who will be driving. Synopsis -------- I will provide an overview of how to set up the development environment and then show the code behind a site that I am working on. The site is simple enough that it will not require a deep understanding of the business rules to understand how things work. I will flip back and forth between the public face of the site, the auto-generated admin interface, and the code that I have written in order to show the relationship between them. If you do not know Python at all but have some programming experience, you should still be able to follow along. I meet many people on #django, especially PHP refugees, who are learning Python while they are learning Django so it is quite feasible. If you plan to attend, please let me know so that I can let David at LinuxCaffe know how many people to expect. -- Regards, Clifford Ilkay Dinamis Corporation 3266 Yonge Street, Suite 1419 Toronto, ON Canada M4N 3P6 +1 416-410-3326 From fumanchu at amor.org Fri Feb 23 04:07:15 2007 From: fumanchu at amor.org (Robert Brewer) Date: Thu, 22 Feb 2007 19:07:15 -0800 Subject: [Web-SIG] The web dudes pad is open for business Message-ID: <435DF58A933BA74397B42CDEB8145A86224D41@ex9.hostedexchange.local> Chad Whitacre (of Aspen fame) and I got a nice suite across the street (from the PyCon hotel) at the Residence Inn, room 121. All web dudes welcome. We've got a kitchen, fireplace, sofa, and 3 TV's. I just stocked the fridge with Heineken and Diet Coke, plus mudslide and blue margarita fixin's. The jacuzzi's open, too, if you brought trunks. Feel free to drop by anytime; but call me first: 619 846-5585 (they lock everything around here). I'm here 'til Monday morning. Robert Brewer CherryPy Team -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/web-sig/attachments/20070222/511ed982/attachment.html From chad at zetaweb.com Fri Feb 23 14:06:48 2007 From: chad at zetaweb.com (Chad Whitacre) Date: Fri, 23 Feb 2007 08:06:48 -0500 Subject: [Web-SIG] The web dudes pad is open for business In-Reply-To: <435DF58A933BA74397B42CDEB8145A86224D41@ex9.hostedexchange.local> References: <435DF58A933BA74397B42CDEB8145A86224D41@ex9.hostedexchange.local> Message-ID: <45DEE6E8.7000405@zetaweb.com> > The jacuzzi's open, too, if you brought trunks. We have a jacuzzi!? Please tell me it's not heart-shaped ... You guys have fun today, I don't get in until tonight. :^( chad From titus at caltech.edu Fri Feb 23 16:30:07 2007 From: titus at caltech.edu (Titus Brown) Date: Fri, 23 Feb 2007 07:30:07 -0800 Subject: [Web-SIG] The web dudes pad is open for business In-Reply-To: <45DEE6E8.7000405@zetaweb.com> References: <435DF58A933BA74397B42CDEB8145A86224D41@ex9.hostedexchange.local> <45DEE6E8.7000405@zetaweb.com> Message-ID: <20070223153007.GA26903@caltech.edu> now doesn't everyone wish they were at PyCon, too? ;) On Fri, Feb 23, 2007 at 08:06:48AM -0500, Chad Whitacre wrote: -> > The jacuzzi's open, too, if you brought trunks. -> -> We have a jacuzzi!? Please tell me it's not heart-shaped ... -> -> You guys have fun today, I don't get in until tonight. :^( -> -> -> -> chad -> _______________________________________________ -> Web-SIG mailing list -> Web-SIG at python.org -> Web SIG: http://www.python.org/sigs/web-sig -> Unsubscribe: http://mail.python.org/mailman/options/web-sig/titus%40caltech.edu -> From sh at defuze.org Fri Feb 23 16:45:53 2007 From: sh at defuze.org (Sylvain Hellegouarch) Date: Fri, 23 Feb 2007 15:45:53 +0000 Subject: [Web-SIG] The web dudes pad is open for business In-Reply-To: <20070223153007.GA26903@caltech.edu> References: <435DF58A933BA74397B42CDEB8145A86224D41@ex9.hostedexchange.local> <45DEE6E8.7000405@zetaweb.com> <20070223153007.GA26903@caltech.edu> Message-ID: <45DF0C31.9090909@defuze.org> Titus Brown wrote: > now doesn't everyone wish they were at PyCon, too? ;) > Precisely what I told myself :)