html parser , unexpected '<' char in declaration

Jesus Rivero (Neurogeek) jrivero at latinux.org
Tue Feb 21 13:45:40 EST 2006


Oopss!

You are totally right guys, i did miss the closing '>' thinking about
maybe errors in the use of ' or ".

Jesus

Tim Roberts wrote:

>"Jesus Rivero - (Neurogeek)" <jrivero at latinux.org> wrote:
>  
>
>>hmmm, that's kind of different issue then.
>>
>>I can guess, from the error you pasted earlier, that the problem shown
>>is due to the fact Python is interpreting a "<" as an expression and not
>>as a char. review your code or try to figure out the exact input you're
>>receving within the mta.
>>    
>>
>
>Well, Jesus, you are 0 for 2.  Sakcee pointed out what the exact problem
>was in his original message.  The HTML he is being given is ill-formed; the
><!DOCTYPE directive is not closed.  The SGML parser finds a <html> tag
>which it thinks is inside the <!DOCTYPE, and that's illegal.
>
>  
>
>>>well probabbly I should explain more.  this is part of an email . after
>>>the mta delivers the email, it is stored in a local dir.
>>>After that the email is being parsed by the parser inside an web based
>>>imap client at display time.
>>>
>>>I dont think I have the choice of rewriting the message!? and I dont
>>>want to reject the message alltogether.
>>>
>>>I can either 1-fix the incoming html by tidying it up
>>>or 2- strip only plain text out and dispaly that you have spam, 3 - or
>>>ignore that mal-formatted tag and display the rest
>>>      
>>>
>
>If this is happening with more than one message, you could check for it
>rather easily with a regular expression, or even just ''.find, and then
>either insert a closing '>' or delete everything up to the <html> before
>parsing it.
>  
>




More information about the Python-list mailing list