html parser , unexpected '<' char in declaration
Tim Roberts
timr at probo.com
Tue Feb 21 02:54:35 EST 2006
"Jesus Rivero - (Neurogeek)" <jrivero at latinux.org> wrote:
>
>hmmm, that's kind of different issue then.
>
>I can guess, from the error you pasted earlier, that the problem shown
>is due to the fact Python is interpreting a "<" as an expression and not
>as a char. review your code or try to figure out the exact input you're
>receving within the mta.
Well, Jesus, you are 0 for 2. Sakcee pointed out what the exact problem
was in his original message. The HTML he is being given is ill-formed; the
<!DOCTYPE directive is not closed. The SGML parser finds a <html> tag
which it thinks is inside the <!DOCTYPE, and that's illegal.
>> well probabbly I should explain more. this is part of an email . after
>> the mta delivers the email, it is stored in a local dir.
>> After that the email is being parsed by the parser inside an web based
>> imap client at display time.
>>
>> I dont think I have the choice of rewriting the message!? and I dont
>> want to reject the message alltogether.
>>
>> I can either 1-fix the incoming html by tidying it up
>> or 2- strip only plain text out and dispaly that you have spam, 3 - or
>> ignore that mal-formatted tag and display the rest
If this is happening with more than one message, you could check for it
rather easily with a regular expression, or even just ''.find, and then
either insert a closing '>' or delete everything up to the <html> before
parsing it.
--
- Tim Roberts, timr at probo.com
Providenza & Boekelheide, Inc.
More information about the Python-list
mailing list