Text Encoding - Like Wrestling Oiled Pigs

apotheos at gmail.com apotheos at gmail.com
Fri Dec 8 11:26:08 EST 2006


So I've got a problem.

I've got a database of information that is encoded in Windows/CP1252.
What I want to do is dump this to a UTF-8 encoded text file (a RSS
feed).

While the overall problem seems to be related to the conversion, the
only error I'm getting is a

"UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position
163: ordinal not in range(128)"

So somewhere I'm missing an implicit conversion to ASCII which is
completely aggrivating my brain.

So, what fundamental issue am I completely overlooking?

Code follows.

def GenerateNoticeRSS():


    output = codecs.open(FILEBASE + 'noticeboard.xml','w','utf-8')


    conn = psycopg.connect(DSN)


    curs = conn.cursor()


    sql_query = "select story.subject as subject, story.content as
content, story.summary as summary, story.sid as sid, posts.bid as
board, posts.date_to_publish as date from story$
    curs.execute(sql_query)


    rows = curs.fetchall()


    output.write('<?xml version="1.0" encoding="utf-8"?>\n')


    output.write('<rss version="2.0">\n')



    output.write('<channel>\n')


    output.write('<title>U of L Notice Board</title>\n')


    output.write('<link>http://www.uleth.ca/notice</link>\n')


    output.write('<description>University of Lethbridge News and
Events</description>\n')


    for each in rows:




          output.write('<item>\n')


          output.write('<title>' + rssTitlePrefix(each[4]) +
unicode(each[0]) + '</title>\n')


output.write('<link>http://www.uleth.ca/notice/display.html?b=' +
str(each[4]) + '&s=' + str(each[3]) + '</link>\n')


output.write('<guid>http://www.uleth.ca/notice/display.html?b=' +
str(each[4]) + '&s=' + str(each[3]) + '</guid>\n')
          descript = each[2] + '<BR><BR>' + each[1]





          output.write(u'<description>' + unicode(descript) +
u'</description>\n')     # this is the line that causes the error.


          output.write('</item>\n')
    output.write('</channel>\n')
    output.write('</rss>\n')
    output.close()

                                     
    return 0




More information about the Python-list mailing list