[Catalog-sig] HTML in long description

Tarek Ziadé ziade.tarek at gmail.com
Fri Aug 21 16:51:37 CEST 2009


On Fri, Aug 21, 2009 at 4:35 PM, Fred Drake<fdrake at gmail.com> wrote:
> On Fri, Aug 21, 2009 at 10:33 AM, "Martin v. Löwis"<martin at v.loewis.de> wrote:
>> Which way should PyPI go: escape all markup if ReST rendering fails?
>> Or else allow arbitrary HTML to be embedded? I'm worried that somebody
>> would create a cross-site attack out of that...
>
> Same here; the text in the <pre> should be properly escaped.

FWIW lxml.html is pretty convenient to remove any dangerous tag, it's
a one-liner
that will get rid of any <form> <script> <embed> etc..

But in any case, I find the current situation fuzzy :

The reStructuredText format is an implicit rule from pypi and trying an
rst2html process on server side, no matter what long_description contains,
seem like a bad practice to me.

I'd like to see the nature of long_description explicitely declared in
the metadata

For example we could have a "long_description_format" field that would
be 'text',
'html' or 'restructuredtext'

If present, PyPI could use this info to decide what it should do with
long_description
(although this does not remove the need to clean it up on server side
for security reasons
of course)

Last, notice that there's a new command in distutils called "check" ,
that can be used
to check if the long_description field content compiles well in reStructuredText
This client-side process is convenient to avoid any error or warning
on the PyPI page.

(it's available only docutils is installed of course)


>
>
>  -Fred
>
> --
> Fred L. Drake, Jr.    <fdrake at gmail.com>
> "Chaos is the score upon which reality is written." --Henry Miller
> _______________________________________________
> Catalog-SIG mailing list
> Catalog-SIG at python.org
> http://mail.python.org/mailman/listinfo/catalog-sig
>



-- 
Tarek Ziadé | http://ziade.org


More information about the Catalog-SIG mailing list