encoding problem with BeautifulSoup - problem when writing parsed text to file

Greg gregor.hochschild at googlemail.com
Wed Oct 5 19:35:59 EDT 2011


Hi, I am having some encoding problems when I first parse stuff from a
non-english website using BeautifulSoup and then write the results to
a txt file.

I have the text both as a normal (text) and as a unicode string
(utext):
print repr(text)
'Branie zak\xc2\xb3adnik\xc3\xb3w'

print repr(utext)
u'Branie zak\xb3adnik\xf3w'

print text or print utext (fileSoup.prettify() also shows 'wrong'
symbols):
Branie zak³adników


Now I am trying to save this to a file but I never get the encoding
right. Here is what I tried (+ lot's of different things with encode,
decode...):
outFile=open(filePath,"w")
outFile.write(text)
outFile.close()

outFile=codecs.open( filePath, "w", "UTF8" )
outFile.write(utext)
outFile.close()

Thanks!!








More information about the Python-list mailing list