encoding problems

Wed Aug 29 03:29:31 EDT 2007

Hi,

I would like to transform reST contents to HTML, but got problems
with accented chars.

Here's a rather simplified version using SVN Docutils 0.5:

%-------------------------------------------------------------

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from docutils.core import publish_parts

class Post(object):
     def __init__(self, title='', content=''):
         self.title = title
         self.content = content

     def _get_html_content(self):
         return publish_parts(self.content,
             writer_name="html")["html_body"]
     html_content = property(_get_html_content)

# Instanciate 2 Post objects
p1 = Post()
p1.title = "First post without accented chars"
p1.content = """This is the first.
...blabla
... end of post..."""

p2 = Post()
p2.title = "Second post with accented chars"
p2.content = """Ce poste possède des accents : é à ê è"""

for post in [p1,p2]:
     print post.title, "\n" +"-"*30
     print post.html_content

%-------------------------------------------------------------

The output gives me :

First post without accented chars
------------------------------
<div class="document">
<p>This is the first.
...blabla
... end of post...</p>
</div>

Second post with accented chars
------------------------------
Traceback (most recent call last):
File "C:\Documents and
Settings\kib\Bureau\Projets\python\dbTest\rest_error.py", line 30, in 
<module>
print post.html_content
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in 
position 39:
ordinal not in range(128)

Any idea of what I've missed ?

Thanks.