Question about working with html entities in python 2 to use them as filenames

Paul Rubin no.email at nospam.invalid
Wed Nov 23 00:43:46 EST 2016


Steven Truppe <steven.truppe at chello.at> writes:

> # here i would like to create a directory named after the content of
> # the title... I allways get this error:
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2

The title has a à (capital A with tilde) character in it, and there is
no corresponding ascii character.  So you can't encode that string into
ascii.  You can encode it into utf8 or whatever, but are you on an OS
that recognizes utf8 in filenames?  Maybe you want to transcode it
somehow.  Otherwise you may be asking for trouble if some of those html
strings have control characters and stuff in them.

Also, if you're scraping web pages, you may have an easier time with
BeautifulSoup (search web for it) than HTMLparser.



More information about the Python-list mailing list