What do I do to read html files on my pc?

mikcec82 michele.cecere at gmail.com
Tue Aug 28 06:09:11 EDT 2012


Il giorno lunedì 27 agosto 2012 12:59:02 UTC+2, mikcec82 ha scritto:
> Hallo,
> 
> 
> 
> I have an html file on my pc and I want to read it to extract some text.
> 
> Can you help on which libs I have to use and how can I do it?
> 
> 
> 
> thank you so much.
> 
> 
> 
> Michele

Thank you to all.

Hi Chris, thank you for your hint. I'll try to do as you said and to be clear:

I have to work on an HTML File. This file is  not a website-file, neither it comes from internet.
It is a file created by a local software (where "local" means "on my pc").

On this file, I need to do this operation:

	1) Open the file
	2) Check the occurences of the strings:
		2a) XXXX, in this case I have this code:
					
					<tr style="font-size: 10" align="left">
					<th>
					</th><th>
					DTC CODE Read:
					</th>
					<td>
					<samp>
					 
					 
					 
					 
					 
					</samp>
					XXXX
					</td>
					</tr>

		2b)	NOT PASSED, in this case I have this code:
		
					<tr style="color: red" align="left">
					<th>
					</th><th>
					CODE CHECK
					</th>
					<th>
					: NOT PASSED
					</th>
					</tr>
			Note: color in "<tr style="color: red" align="left">" can be "red" or "orange"
			
		2c) OK or PASSED
	   
	3) Then, I need to fill an excel file following this rules:
		3a) If 2a or 2b occurs on htmlfile, I'll write NOK in excel file
		3b) If 2c occurs on htmlfile, I'll write OK in excel file

Note:
1) In this example, in 2b case, I have "CODE CHECK" in the code, but I could also have "TEXT CHECK" or "CHAR CHECK".
2) The research of occurences can be done either by tag ("<tr style="color: red" align="left">") or via  (NOT PASSED, PASSED). But I would to use the first method.
==================================================

In my script I have used the second way to looking for, i.e.:

**
fileorig = "C:\Users\Mike\Desktop\\2012_05_16_1___p0201_13.html"

f = open(fileorig, 'r')
nomefile = f.read()

for x in nomefile:
    if 'XXXX' in nomefile:
        print 'NOK'
    else :
        print 'OK'
**
But this one works on charachters and not on strings (i.e.: in this way I have searched NOT string by string, but charachters-by-charachters).
		
===============================================

I hope I was clear.

Thank for your help
Michele



More information about the Python-list mailing list