What do I do to read html files on my pc?

mikcec82 michele.cecere at gmail.com
Tue Aug 28 10:51:57 EDT 2012


Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
> 
> 
> 
> I have an html file on my pc and I want to read it to extract some text.
> 
> Can you help on which libs I have to use and how can I do it?
> 
> 
> 
> thank you so much.
> 
> 
> 
> Michele

Hi Oscar,
I tried as you said and I've developed the code as you will see.
But, when I have a such situation in an html file, in wich there is a repetition of a string (XX in this case):
CODE Target: 	        0201
CODE Read: 	        XXXX
CODE CHECK 	: NOT PASSED
TEXT Target:              13
TEXT Read: 	          XX
TEXT CHECK 	: NOT PASSED
CHAR Target: 	          AA
CHAR Read: 	          XX
CHAR CHECK 	: NOT PASSED 

With this code (created starting from yours)

index = nomefile.find('XXXX')
print 'XXXX_ found at location', index

index2 = nomefile.find('XX')
print 'XX_ found at location', index2

found = nomefile.find('XX')
while found > -1:
    print "XX found at location", found
    found = nomefile.find('XX', found+1)

I have an answer like this:

XXXX_ found at location 51315
XX_ found at location 51315
XX found at location 51315
XX found at location 51316
XX found at location 51317
XX found at location 52321
XX found at location 53328

I have done it to find all occurences of 'XXXX' and 'XX' strings. But, as you can see, the script find the occurrences of XX also at locations 51315, 51316 , 51317 corresponding to string XXXX.

Is there a way to search all occurences of XX avoiding XXXX location?

Thank you.
Michele



More information about the Python-list mailing list