[Tutor] scraping and saving in file

Dave Angel davea at ieee.org
Wed Dec 29 13:25:31 CET 2010


On 01/-10/-28163 02:59 PM, Tommy Kaas wrote:
> Steven D'Aprano wrote:
>> But in your case, the best way is not to use print at all. You are writing
> to a
>> file -- write to the file directly, don't mess about with print. Untested:
>>
>>
>> f = open('tabeltest.txt', 'w')
>> url = 'http://www.kaasogmulvad.dk/unv/python/tabeltest.htm'
>> soup = BeautifulSoup(urllib2.urlopen(url).read())
>> rows = soup.findAll('tr')
>> for tr in rows:
>>       cols = tr.findAll('td')
>>       output = "#".join(cols[i].string for i in (0, 1, 2, 3))
>>       f.write(output + '\n')  # don't forget the newline after each row
>> f.close()
>
> Steven, thanks for the advice.
> I see the point. But now I have problems with the Danish characters. I get
> this:
>
> Traceback (most recent call last):
>    File "C:/pythonlib/kursus/kommuner-regioner_ny.py", line 36, in<module>
>      f.write(output + '\n')  # don't forget the newline after each row
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position
> 5: ordinal not in range(128)
>
> I have tried to add # -*- coding: utf-8 -*- to the top of the script, but It
> doesn't help?
>
> Tommy
>
The coding line only affects how characters in the source module are 
interpreted.  For each file you input or output, you need to also decide 
the encoding to use.  As Peter said, you probably need
     codecs.open(filename, "w", encoding="utf-8")

DaveA



More information about the Tutor mailing list