encoding problems
Diez B. Roggisch
deets at nospam.web.de
Wed Aug 29 04:48:25 EDT 2007
tool69 wrote:
> Hi,
>
> I would like to transform reST contents to HTML, but got problems
> with accented chars.
>
> Here's a rather simplified version using SVN Docutils 0.5:
>
> %-------------------------------------------------------------
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
This declaration only affects unicode-literals.
> from docutils.core import publish_parts
>
> class Post(object):
> def __init__(self, title='', content=''):
> self.title = title
> self.content = content
>
> def _get_html_content(self):
> return publish_parts(self.content,
> writer_name="html")["html_body"]
> html_content = property(_get_html_content)
Did you know that you can do this like this:
@property
def html_content(self):
...
?
> # Instanciate 2 Post objects
> p1 = Post()
> p1.title = "First post without accented chars"
> p1.content = """This is the first.
> ...blabla
> ... end of post..."""
>
> p2 = Post()
> p2.title = "Second post with accented chars"
> p2.content = """Ce poste possède des accents : é à ê è"""
This needs to be a unicode-literal:
p2.content = u"""Ce poste possède des accents : é à ê è"""
Note the u in front.
> for post in [p1,p2]:
> print post.title, "\n" +"-"*30
> print post.html_content
>
> %-------------------------------------------------------------
>
> The output gives me :
>
> First post without accented chars
> ------------------------------
> <div class="document">
> <p>This is the first.
> ...blabla
> ... end of post...</p>
> </div>
>
> Second post with accented chars
> ------------------------------
> Traceback (most recent call last):
> File "C:\Documents and
> Settings\kib\Bureau\Projets\python\dbTest\rest_error.py", line 30, in
> <module>
> print post.html_content
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in
> position 39:
> ordinal not in range(128)
You need to encode a unicode-string into the encoding you want it.
Otherwise, the default (ascii) is taken.
So
print post.html_content.encodec("utf-8")
should work.
Diez
More information about the Python-list
mailing list