What do I do to read html files on my pc?

mikcec82 michele.cecere at gmail.com
Wed Aug 29 06:22:26 EDT 2012


Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
> 
> 
> 
> I have an html file on my pc and I want to read it to extract some text.
> 
> Can you help on which libs I have to use and how can I do it?
> 
> 
> 
> thank you so much.
> 
> 
> 
> Michele

Hi Peter and thanks for your precious help.
Fortunately, there aren't runs of "X" with repeats other than 2 or 4.
Starting from your code, I wrote this code (I post it, so it could be helpful for other people):
f = open(fileorig, 'r') 
nomefile = f.read()

start = nomefile.find("XX")
start2 = nomefile.find("NOT PASSED")
c0 = 0
c1 = 0
c2 = 0

while (start != -1) | (start2 != -1):
    
    if nomefile[start:start+4] == "XXXX": 
        print "XXXX       found at location", start
        start += 4
        c0 +=1
    elif nomefile[start:start+2] == "XX":
        print "XX         found at location", start
        start += 2
        c1 +=1
        
    if nomefile[start2:start2+10] == "NOT PASSED": 
        print "NOT PASSED found at location", start2
        start2 += 10
        c2 +=1

    start = nomefile.find("XX", start)
    start2 = nomefile.find("NOT PASSED", start2)

print "XXXX       %s founded" % c0, "\nXX         %s founded" % c1, "\nNOT PASSED %s founded" % c2

Now, I'm able to find all occurences of strings: "XXXX", "XX" and "NOT PASSED" 


Thank you so much.



More information about the Python-list mailing list