Spanish Accents

Thu Dec 22 10:25:46 EST 2011

On Thu, Dec 22, 2011 at 10:58 AM, Chris Angelico <rosuav at gmail.com> wrote:

> Firstly, are you using Python 2 or Python 3? Things will be slightly
> different, since the default 'str' object in Py3 is Unicode.
>

2

>
> I would guess that your page is being output as UTF-8; you may find
> that the solution is as easy as declaring the encoding of your text
> file when you read it in.
>

So I tried this:

file = open(p + "2.txt")
for line in file:
  print unicode(line, 'utf-8')

and got this error:

 142   print unicode(line, 'utf-8')
   143
   144 print '''<br /><br /><form id="signup" action="
http://13gems.com/Sign_Up.py" method="post" target="_blank">
 *builtin* *unicode* = <type 'unicode'>, *line* = '<span class="text">\r\n'
 /usr/lib64/python2.4/encodings/utf_8.py<file:///usr/lib64/python2.4/encodings/utf_16.py>in
*decode*(input=<read-only buffer ptr 0x2b197e378454, size 21>,
errors='strict')    14
    15 def decode(input, errors='strict'):
    16     return codecs.utf_16_decode(input, errors, True)
    17
    18 class StreamWriter(codecs.StreamWriter):
 *global* *codecs* = <module 'codecs' from
'/usr/lib64/python2.4/codecs.pyc'>, codecs.*utf_16_decode* = <built-in
function utf_16_decode>, *input* = <read-only buffer ptr 0x2b197e378454,
size 21>, *errors* = 'strict', *builtin* *True* = True

*UnicodeDecodeError*: 'utf16' codec can't decode byte 0x0a in position 20:
truncated data
      args = ('utf16', '<span class="text">\r\n', 20, 21, 'truncated data')
      encoding = 'utf16'
      end = 21
      object = '<span class="text">\r\n'
      reason = 'truncated data'
      start = 20

Tried it with utf-16 with same results.

TIA,

Stan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20111222/c7d32a3c/attachment-0001.html>