How do I Extract Attachment from Newsgroup Message

kyosohma at gmail.com kyosohma at gmail.com
Thu May 31 10:14:04 EDT 2007


On May 31, 8:54 am, "snewma... at gmail.com" <snewma... at gmail.com> wrote:
> I'm parsing NNTP messages that have XML file attachments. How can I
> extract the encoded text back into a file? I looked for a solution
> with mimetools (the way I'd approach it for email) but found nothing.
>
> Here's a long snippet of the message:
>
> >>> n.article('116431')
>
> ('220 116431 <D8PANK... at news.ap.org> article', '116431',
> '<D8PANK... at news.ap.org>', ['MIME-Version: 1.0', 'Message-ID:
> <D8PANK... at news.ap.org>', 'Content-Type: Multipart/Mixed;', '
> boundary="------------Boundary-00=_A5NJCP3FX6Y5BI3BH890"', 'Date: Thu,
> 24 May 2007 07:41:34 -0400 (EDT)', 'From: Newsclip <newsc... at ap.org>',
> 'Path: newsclip.ap.org!flounder.ap.org!flounder', 'Newsgroups:
> ap.spanish.online,ap.spanish.online.business', 'Keywords: MUN ECO
> PETROLEO PRECIOS', 'Subject: MUN ECO PETROLEO PRECIOS', 'Summary: ',
> 'Lines: 108', 'Xref: newsclip.ap.org ap.spanish.online:938298
> ap.spanish.online.business:116431', '', '', '--------------
> Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Plain',
> 'Content-Transfer-Encoding: 8bit', 'Content-Description: text,
> unencoded', '', '(AP) Precios del crudo se mueven sin rumbo claro',
> 'Por GEORGE JAHN', 'VIENA', 'Los precios
>
> ... (truncated for length) ...
>
> '', '___', '', 'Editores: Derrick Ho, periodista de la AP en Singapur,
> contribuy\xf3 con esta informaci\xf3n.', '', '', '--------------
> Boundary-00=_A5NJCP3FX6Y5BI3BH890', 'Content-Type: Text/Xml', 'Content-
> Transfer-Encoding: base64', 'Content-Description: text, base64
> encoded', '',
> 'PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiPz4KPCFET0NUWVBFIG5pdGYgU1lT',
> 'VEVNICJuaXRmLmR0ZCI+CjxuaXRmPgogPGhlYWQ
> +CiAgPG1ldGEgbmFtZT0iYXAtdHJhbnNyZWYi',
> 'IGNvbnRlbnQ9IlNQMTQ3MiIvPgogIDxtZXRhIG5hbWU9ImFwLW9yaWdpbiIgY29udGVudD0ic3Bh',
> 'bm9sIi8+CiAgPG1ldGEgbmFtZT0iYXAtc2VsZWN0b3IiIGNvbn

This looks like what you might be looking for:
http://mail.python.org/pipermail/python-list/2004-June/265018.html

Not sure if you'll need this or not, but here's some info on encoding/
decoding files:
http://www.jorendorff.com/articles/unicode/python.html

There are lots of ways to parse xml. I use the minidom module myself.

Mike




More information about the Python-list mailing list