Podcast catcher in Python

Chris Rebert clp2 at rebertia.com
Fri Sep 11 21:32:33 EDT 2009


On Fri, Sep 11, 2009 at 11:09 AM, Chuck <galois271 at gmail.com> wrote:
> On Sep 11, 12:56 pm, Chuck <galois... at gmail.com> wrote:
>> On Sep 11, 10:30 am, Falcolas <garri... at gmail.com> wrote:
>> > On Sep 11, 8:20 am, Chuck <galois... at gmail.com> wrote:
>>
>> > > Hi all,
>>
>> > > I would like to code a simple podcast catcher in Python merely as an
>> > > exercise in internet programming.  I am a CS student and new to
>> > > Python, but understand Java fairly well.  I understand how to connect
>> > > to a server with urlopen, but then I don't understand how to download
>> > > the mp3, or whatever, podcast?  Do I need to somehow parse the XML
>> > > document?  I really don't know.  Any ideas?
>>
>> > > Thanks!
>>
>> > > Chuck
>>
>> > You will first have to download the RSS XML file, then parse that file
>> > for the URL for the audio file itself. Something like eTree will help
>> > immensely in this part. You'll also have to keep track of what you've
>> > already downloaded.
>>
>> > I'd recommend taking a look at the RSS XML yourself, so you know what
>> > it is you have to parse out, and where to find it. From there, it
>> > should be fairly easy to come up with the proper query to pull it
>> > automatically out of the XML.
>>
>> > As a kindness to the provider, I would recommend a fairly lengthy
>> > sleep between GETs, particularly if you want to scrape their back
>> > catalog.
>>
>> > Unfortunately, I no longer have the script I created to do just such a
>> > thing in the past, but the process is rather straightforward, once you
>> > know where to look.
>
> I am not sure how eTree fits in.  Is that eTree.org?

No, he's referring to the `xml.etree.elementtree` standard module:
http://docs.python.org/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

Although since you're dealing with feeds, you might be able to use
Universal Feed Parser, which is specifically for RSS/Atom:
http://www.feedparser.org/

Cheers,
Chris
--
http://blog.rebertia.com



More information about the Python-list mailing list