Question about working with html entities in python 2 to use them as filenames

Steven Truppe steven.truppe at chello.at
Tue Nov 22 15:33:30 EST 2016


I all,


i'm using linux and python 2 and want to parse a file line by line by 
executing a command with the line (with os.system).

My problem now is that i'm opening the file and parse the title but i'm 
not able to get it into a normal filename:


import os,sys

import urlib,re,cgi

import HTMLParser, uincodedata

import htmlentiytdefs

imort chardet

for ULR in open('list.txt', "r").readlines():

     teste_egex="<title>(.+?)</title>

     patter = re.compile(these_regex)

     htmlfile=urlib.urlopen(URL)

     htmltext=htmlfile.read()

     title=re.aindall(pater, htmltext)[0]

     title = HTMLParser.HTMLParser.unescape(title)

     print "title = ", title

# here i would like to create a directory named after the content of the title


I allways get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2



i've played around with .ecode('latin-1') or ('utf8') but i was not yet 
able to sove this simple issue.


Tanks in advance,

Truppe Steven




More information about the Python-list mailing list