[Tutor] filtering a webpage for plucking to a Palm

Brian van den Broek bvande at po-box.mcgill.ca
Sun Jun 26 10:32:43 CEST 2005


Hi all,

I have a Palm handheld, and use the excellent (and written in Python) 
Plucker <http://www.plkr.org/> to spider webpages and format the 
results for viewing on the Palm.

One site I 'pluck' is the Daily Python URL 
<http://www.pythonware.com/daily/>. From the point of view of a daily 
custom 'newspaper' everything but the last day or two of URLs is so 
much cruft. (The cruft would be the total history of the last 
seven'ish days, the navigation links for www.pythonware.com, etc.)

Today, I wrote a script to parse the Daily URL, and create a minimal 
local html page including nothing but the last n items, n links, or 
last n days worth of links. (Which is employed is a user option.) 
Then, I pluck that, rather than the actual Daily URL site. Works 
great. :-)  (If anyone on the list is a fellow plucker'er and would be 
interested in my script, I'm happy to share.)

In anticipation of wanting to do the same thing to other sites, I've 
spent a bit of time abstracting it. I've made some real progress. But, 
before I finish up, I've a voice in the back of my head asking if 
maybe I'm re-inventing the wheel.

To my shame, I've not spent very much time at all exploring available 
frameworks and modules for any domain, and almost none for web-related 
tasks. So, does anyone know of any modules or frameworks which would 
make the sort of task I am describing easier?

The difficulty in making my routine general is that pretty much each 
site will need its own code for identifying what counts as a distinct 
item (such as a URL and its description in the Daily URL) and what 
counts as a distinct block of items (such as a days worth of Daily URL 
items). I can't imagine there's a way around that, but if someone else 
has done much of the work in setting up the general structure to be 
tweaked for each site, that'd be good to know. (Doesn't feel like one 
that would be googleable.)

Thanks for any suggestions, and best to all,

Brian vdB



More information about the Tutor mailing list