From debmidya at yahoo.com Mon Feb 21 05:21:21 2011 From: debmidya at yahoo.com (Deb Midya) Date: Sun, 20 Feb 2011 20:21:21 -0800 (PST) Subject: [Web-SIG] Extracting web data Message-ID: <735780.8711.qm@web161412.mail.bf1.yahoo.com> Hi Python web-sig users, ? Thanks in advance and I am new to web-sig. ? I am using Python 2.6 on Windows XP. ? May I request you to assist me for the following please. ? I like to extract web data from the site (http://finance.yahoo.com, for example). ? The data may include Historical Prices, Key Statistics, News & Info, Headlines, etc. for a list of codes (such WOW, .... these are codes for company Ids). ? I am trying to automate the extraction of data. ? Is there any Python module or any assistance please? ? Once again, thank you very much for the time you have given. ? Regards, ? Deb ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From debmidya at yahoo.com Tue Feb 22 00:59:58 2011 From: debmidya at yahoo.com (Deb Midya) Date: Mon, 21 Feb 2011 15:59:58 -0800 (PST) Subject: [Web-SIG] Extracting web data In-Reply-To: Message-ID: <478984.64846.qm@web161411.mail.bf1.yahoo.com> Joost, ? Thank you very much for your response. ? I have found that there is no binary file of lxml in the package index of python.org. ? I am using Python 2.6 on Windows XP. ? Is there any alternative solution? ? Once again, thank you very much for the time you have given. ? Regards, ? Deb --- On Mon, 21/2/11, Joost Molenaar wrote: From: Joost Molenaar Subject: Re: [Web-SIG] Extracting web data To: "Deb Midya" Received: Monday, 21 February, 2011, 5:19 PM You should look at lxml, it knows how to parse HTML and XML and lets you use XPath to find the data you need. Joost Molenaar Op 21 feb 2011 05:28 schreef "Deb Midya" : Hi Python web-sig users, ? Thanks in advance and I am new to web-sig. ? I am using Python 2.6 on Windows XP. ? May I request you to assist me for the following please. ? I like to extract web data from the site (http://finance.yahoo.com, for example). ? The data may include Historical Prices, Key Statistics, News & Info, Headlines, etc. for a list of codes (such WOW, .... these are codes for company Ids). ? I am trying to automate the extraction of data. ? Is there any Python module or any assistance please? ? Once again, thank you very much for the time you have given. ? Regards, ? Deb ? ? _______________________________________________ Web-SIG mailing list Web-SIG at python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/j.j.molenaar%40gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From prologic at shortcircuit.net.au Tue Feb 22 01:07:10 2011 From: prologic at shortcircuit.net.au (James Mills) Date: Tue, 22 Feb 2011 10:07:10 +1000 Subject: [Web-SIG] Extracting web data In-Reply-To: <735780.8711.qm@web161412.mail.bf1.yahoo.com> References: <735780.8711.qm@web161412.mail.bf1.yahoo.com> Message-ID: On Mon, Feb 21, 2011 at 2:21 PM, Deb Midya wrote: > Hi Python web-sig users, > > Thanks in advance and I am new to web-sig. > > I am using Python 2.6 on Windows XP. > > May I request you to assist me for the following please. > > I like to extract web data from the site (http://finance.yahoo.com, for > example). > > The data may include Historical Prices, Key Statistics, News & Info, > Headlines, etc. for a list of codes (such WOW, .... these are codes for > company Ids). > > I am trying to automate the extraction of data. > > Is there any Python module or any assistance please? > > Once again, thank you very much for the time you have given. > You might want to look into using either the lxml or BeautifulSoup modules. cheers James -- -- James Mills -- -- "Problems are solved by method" -------------- next part -------------- An HTML attachment was scrubbed... URL: From arw1961 at yahoo.com Tue Feb 22 01:52:06 2011 From: arw1961 at yahoo.com (Aaron Watters) Date: Mon, 21 Feb 2011 16:52:06 -0800 (PST) Subject: [Web-SIG] Extracting web data In-Reply-To: Message-ID: <410445.67306.qm@web120116.mail.ne1.yahoo.com> BeautifulSoup is the standard response. I think lxml will not work very well unless the html is extremely nicely formatted, but I could be wrong. For what you describe I would suggest developing seat-of-the-pants heuristics -- just get the page using httplib and then use string.find liberally. I've had at least three consulting gigs solving this problems using various techniques and the general problem is quite difficult, but if you are trying to parse just a few pages in simple ways developing special purpose heuristics is pretty easy (until they redesign the pages, which they will do every so often). Best of luck, -- Aaron Watters btw: If you have lots of money to spend on this ? my former client connotate.com does this sort ? of scraping (and I developed some of the code). --- On Mon, 2/21/11, James Mills wrote: From: James Mills Subject: Re: [Web-SIG] Extracting web data To: "web-sig" Date: Monday, February 21, 2011, 7:07 PM On Mon, Feb 21, 2011 at 2:21 PM, Deb Midya wrote: Hi Python web-sig users, ? Thanks in advance and I am new to web-sig. ? I am using Python 2.6 on Windows XP. ? May I request you to assist me for the following please. ? I like to extract web data from the site (http://finance.yahoo.com, for example). ? The data may include Historical Prices, Key Statistics, News & Info, Headlines, etc. for a list of codes (such WOW, .... these are codes for company Ids). ? I am trying to automate the extraction of data. ? Is there any Python module or any assistance please? ? Once again, thank you very much for the time you have given. You might want to look into using eitherthe lxml or BeautifulSoup modules. cheersJames -- -- James Mills -- -- "Problems are solved by method" -----Inline Attachment Follows----- _______________________________________________ Web-SIG mailing list Web-SIG at python.org Web SIG: http://www.python.org/sigs/web-sig Unsubscribe: http://mail.python.org/mailman/options/web-sig/arw1961%40yahoo.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From j.j.molenaar at gmail.com Tue Feb 22 02:07:18 2011 From: j.j.molenaar at gmail.com (Joost Molenaar) Date: Tue, 22 Feb 2011 02:07:18 +0100 Subject: [Web-SIG] Extracting web data In-Reply-To: <478984.64846.qm@web161411.mail.bf1.yahoo.com> References: <478984.64846.qm@web161411.mail.bf1.yahoo.com> Message-ID: Hi Deb, sorry for sending directly to you instead of to the list, gmail makes it very easy to click the wrong reply button. :) It seems you will have to install a slightly older (5 months) version of lxml if you need a binary release, so try version 2.2.8 at http://pypi.python.org/pypi/lxml/2.2.8 instead of the newest 2.3. Joost On 22 February 2011 00:59, Deb Midya wrote: > Joost, > > Thank you very much for your response. > > I have found that there is no binary file of lxml in the package index of > python.org. > > I am using Python 2.6 on Windows XP. > > Is there any alternative solution? > > Once again, thank you very much for the time you have given. > > Regards, > > Deb > > --- On *Mon, 21/2/11, Joost Molenaar * wrote: > > > From: Joost Molenaar > Subject: Re: [Web-SIG] Extracting web data > To: "Deb Midya" > Received: Monday, 21 February, 2011, 5:19 PM > > > You should look at lxml, it knows how to parse HTML and XML and lets you > use XPath to find the data you need. > Joost Molenaar > > Op 21 feb 2011 05:28 schreef "Deb Midya" > >: > > Hi Python web-sig users, > > Thanks in advance and I am new to web-sig. > > I am using Python 2.6 on Windows XP. > > May I request you to assist me for the following please. > > I like to extract web data from the site (http://finance.yahoo.com, for > example). > > The data may include Historical Prices, Key Statistics, News & Info, > Headlines, etc. for a list of codes (such WOW, .... these are codes for > company Ids). > > I am trying to automate the extraction of data. > > Is there any Python module or any assistance please? > > Once again, thank you very much for the time you have given. > > Regards, > > Deb > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/j.j.molenaar%40gmail.com > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From foom at fuhm.net Tue Feb 22 02:27:55 2011 From: foom at fuhm.net (James Y Knight) Date: Mon, 21 Feb 2011 20:27:55 -0500 Subject: [Web-SIG] Extracting web data In-Reply-To: References: <735780.8711.qm@web161412.mail.bf1.yahoo.com> Message-ID: <7F560AEF-AD47-4B58-A873-33B9D44E2B7F@fuhm.net> On Feb 21, 2011, at 7:07 PM, James Mills wrote: > You might want to look into using either > the lxml or BeautifulSoup modules. For parsing random HTML, the html5lib module works much better than either of those. From rsyring at inteli-com.com Tue Feb 22 02:41:48 2011 From: rsyring at inteli-com.com (Randy Syring) Date: Mon, 21 Feb 2011 20:41:48 -0500 Subject: [Web-SIG] Extracting web data In-Reply-To: <735780.8711.qm@web161412.mail.bf1.yahoo.com> References: <735780.8711.qm@web161412.mail.bf1.yahoo.com> Message-ID: <4D63145C.7060104@inteli-com.com> Also, if you are familiar with jQuery selector syntax, pyquery is very helpful! -------------------------------------- Randy Syring Intelicom Direct: 502-276-0459 Office: 502-212-9913 For the wages of sin is death, but the free gift of God is eternal life in Christ Jesus our Lord (Rom 6:23) On 02/20/2011 11:21 PM, Deb Midya wrote: > Hi Python web-sig users, > Thanks in advance and I am new to web-sig. > I am using Python 2.6 on Windows XP. > May I request you to assist me for the following please. > I like to extract web data from the site (http://finance.yahoo.com > , for example). > The data may include Historical Prices, Key Statistics, News & Info, > Headlines, etc. for a list of codes (such WOW, .... these are codes > for company Ids). > I am trying to automate the extraction of data. > Is there any Python module or any assistance please? > Once again, thank you very much for the time you have given. > Regards, > Deb > > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From regebro at gmail.com Tue Feb 22 08:38:10 2011 From: regebro at gmail.com (Lennart Regebro) Date: Tue, 22 Feb 2011 08:38:10 +0100 Subject: [Web-SIG] Extracting web data In-Reply-To: <410445.67306.qm@web120116.mail.ne1.yahoo.com> References: <410445.67306.qm@web120116.mail.ne1.yahoo.com> Message-ID: On Tue, Feb 22, 2011 at 01:52, Aaron Watters wrote: > BeautifulSoup is the standard response. > I think lxml will not work very well unless the > html is extremely nicely formatted, but I could > be wrong. > lxml handles broken HTML pretty well. Tere are Windows binaries here: http://pypi.python.org/pypi/lxml/2.2.8 //Lennart -------------- next part -------------- An HTML attachment was scrubbed... URL: From brunovianarezende at gmail.com Tue Feb 22 11:16:05 2011 From: brunovianarezende at gmail.com (Bruno Rezende) Date: Tue, 22 Feb 2011 07:16:05 -0300 Subject: [Web-SIG] Extracting web data In-Reply-To: <735780.8711.qm@web161412.mail.bf1.yahoo.com> References: <735780.8711.qm@web161412.mail.bf1.yahoo.com> Message-ID: Hi Deb, On Mon, Feb 21, 2011 at 1:21 AM, Deb Midya wrote: > > > I like to extract web data from the site (http://finance.yahoo.com, for example). > > The data may include Historical Prices, Key Statistics, News & Info, Headlines, etc. for a list of codes (such WOW, .... these are codes for company Ids). > > I am trying to automate the extraction of data. > > take a look at scrapy: http://doc.scrapy.org/intro/overview.html -- Bruno From jdmain at comcast.net Sun Feb 27 00:49:27 2011 From: jdmain at comcast.net (J.D. Main) Date: Sat, 26 Feb 2011 16:49:27 -0700 Subject: [Web-SIG] Python ASP in IIS - Error 500 Message-ID: <4D699187.32271.16F7AC4@jdmain.comcast.net> Hello, I have been hunting for a solution to this one for some time. I've already tried all the easy stuff. I have IIS 5 on a WinXP Pro machine. CGI works, but no ASP. All I get is: HTTP/1.1 500 Server Error (nothing else - no other errors or explanation) I have tried 1. Reinstalling Python (v2.7 from ActiveState) 2. Reinstalling IIS 3. Running pyscript.py from "site-packages\win32comext\axscript\client" 4. I have "friendly errors" turned off in IE Here is the most basic ASP page I'm trying to run: <%@LANGUAGE=Python%>

Python Test

<% Response.Write('Python Test
') Response.write('

Smaller heading') %> Does anybody have any ideas? This is really getting annoying and I can't find a solution. Other people have complained of this and I've seen no solution. Thanks. J.D. From merwok at netwok.org Sun Feb 27 00:56:37 2011 From: merwok at netwok.org (=?UTF-8?B?w4lyaWMgQXJhdWpv?=) Date: Sun, 27 Feb 2011 00:56:37 +0100 Subject: [Web-SIG] Python ASP in IIS - Error 500 In-Reply-To: <4D699187.32271.16F7AC4@jdmain.comcast.net> References: <4D699187.32271.16F7AC4@jdmain.comcast.net> Message-ID: <4D699335.4060409@netwok.org> > Response.Write('Python Test
') > Response.write('

Smaller heading') Is the difference in case okay? (write vs. Write) Regards From jdmain at comcast.net Sun Feb 27 01:05:09 2011 From: jdmain at comcast.net (J.D. Main) Date: Sat, 26 Feb 2011 17:05:09 -0700 Subject: [Web-SIG] Python ASP in IIS - Error 500 In-Reply-To: <4D699335.4060409@netwok.org> References: <4D699187.32271.16F7AC4@jdmain.comcast.net>, <4D699335.4060409@netwok.org> Message-ID: <4D699535.20788.17DDC59@jdmain.comcast.net> An HTML attachment was scrubbed... URL: From prologic at shortcircuit.net.au Sun Feb 27 01:21:49 2011 From: prologic at shortcircuit.net.au (James Mills) Date: Sun, 27 Feb 2011 10:21:49 +1000 Subject: [Web-SIG] Python WebSockets Server with circuits Message-ID: Hi all, Just wanted to share my implementation of a Python WebSockets Server (using the circuits framework): http://prologic.shortcircuit.net.au/Blog/2011-02-27-09.50 Enjoy! cheers James -- -- James Mills -- -- "Problems are solved by method" From rsyring at inteli-com.com Sun Feb 27 02:50:31 2011 From: rsyring at inteli-com.com (Randy Syring) Date: Sat, 26 Feb 2011 20:50:31 -0500 Subject: [Web-SIG] Python ASP in IIS - Error 500 In-Reply-To: <4D699535.20788.17DDC59@jdmain.comcast.net> References: <4D699187.32271.16F7AC4@jdmain.comcast.net> <4D699335.4060409@netwok.org> <4D699535.20788.17DDC59@jdmain.comcast.net> Message-ID: You may get more traction on a question like this at the win32 mailing list. You might even want to check those archives. I have seen a few questions about python in ASP come through on that list. -------------------------------------- Randy Syring Intelicom Direct: 502-276-0459 Office: 502-212-9913 For the wages of sin is death, but the free gift of God is eternal life in Christ Jesus our Lord (Rom 6:23) On Sat, Feb 26, 2011 at 7:05 PM, J.D. Main wrote: > >> > Is the difference in case okay? (write vs. Write) > >> > > Nice catch, but unfortunately, that's not the problem. I have a new > test.asp that looks like this: > > > > > <% > response.write("Hello World!") > %> > > > > > Still the same problem... > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: