UnicodeEncodeError in Windows

geoff_ness geoffness8 at gmail.com
Mon Sep 17 06:38:16 EDT 2007


Hello - and apologies in advance for the length of this post.

I am having a hard time understanding the errors being generated by a
program I've written. The code is intended to parse text files which
are copied and pasted from web pages from an online game. The encoding
of the pages is ISO-8859-1, but the text that gets copied contains
characters from character sets other than latin-1.
For instance, one of the lines I need to be able to read is:
196679	Daimyo 石 Druid	145	27	12/09/07 21:40:04	[ Expel ]

I start with the file 'citizen_list' and use this function to read it
and return a list of names (for instance, Daimyo 石 Druid) and ID
numbers:

# builds the list of names from the citizens list
def getNames(f):
	"""Builds a list from the town list of names

	Returns a list"""
	newlist = []
	for line in f:
	        namewords = line.rstrip('[Expel]\n\t ')\
		.rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\
		.rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split()
		entry = ";".join([namewords[0], "
".join(namewords[1:len(namewords)])])
		newlist.append(entry)
	return newlist

citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict')
listNames = getNames(citizens)
citizens.close()

I've specified 'utf-8' as the encoding as this seemed to be the best
candidate for picking up all the names in the list. I use the names in
other functions - for example:

def getdamage(warrior, rpt):
	"""reads each line of war report

	returns damage and number of kills for citizen name"""
        for line in rpt:
                if (line.startswith(warrior.name) or \
                    line.startswith('A blue aura surrounds ' +
warrior.name))\
                    and line.find('weapon') > 0:
                        warrior.addDamage(int(line[line.find('caused ')
+7:line.find(' damage')]))
                        if rpt.next().find('is dead') >0:
                                warrior.addKill()
                elif line.startswith(warrior.name+' is dead'):
                        warrior.dies()
                        break
                elif line.startswith('Starting round'):
                        warrior.addRound()

for cit in listNames:
        c = Warrior(cit.split(';')[0], cit.split(';')[1])
        totalnum += 1
        report = codecs.open('war_report','r', 'utf-8', 'strict')
        getdamage(c, report)
        report.close()
--[snip]--

def buildString(warrior):
        """Build a string from a warrior's stats

        Returns string for output to warStat."""
        return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\
        "!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\
        "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"

This code runs fine on my linux machine, but when I sent the code to a
friend with python running on windows, he got the following error:

Traceback (most recent call last):
 File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework
\scriptutils.py", line 310, in RunScript
   exec codeObject in _main_._dict_
 File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\parser_1.0.py", line 63, in <module>
   "".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/
table!")
 File "C:\Documents and Settings\Administrator\Desktop
\reparser_014(2)\iotp_alt2.py", line 169, in buildString
   "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!"
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in
position 0: ordinal not in range(128)

As I understand it the error is related to the ascii codec being
unable to cope with the unicode string u'\ufeff'.
The issue I have is that this error doesn't show up for me - ascii is
the default encoding for me also. Any thoughts or assistance would be
welcomed.

Cheers




More information about the Python-list mailing list