Parsing/Crawler Questions - solution

Lie Ryan lie.1296 at gmail.com
Sat Mar 7 04:14:43 EST 2009


bruce wrote:
> john...
> 
> again.... the problem i'm facing really has nothing to do with a specific
> url... the app i have for the usc site works...
> 
> but for any number of reasons... you might get different results when
> running the app..
> -the server could be screwed up..
> -data might be cached
> -data might be changed, and not updated..
> -actual app problems...
> -networking issues...
> -memory corruption issues...
> -process constraint issues..
> -web server overload..
> -etc...
> 
> the assumption that most people appear to make is that if you create a
> parser, and run and test it once.. then if it gets you the data, it's
> working.. when you run the same app.. 100s of times, and you're slamming the
> webserver... then you realize that that's a vastly different animal than
> simply running a snigle query a few times...

The assumptions is most websites edit and remove data from time to time 
and using the union of data collected throughout several runs might 
populate your program with redundant (but slightly different) or 
outdated data. The assumptions is these redundant or outdated data is 
not useful for most people.



More information about the Python-list mailing list