Jargons of Info Tech industry

Sun Oct 16 15:31:25 EDT 2005

On 16 Oct 2005 00:31:38 GMT, John Bokma <john at castleamber.com> wrote:

>bokr at oz.net (Bengt Richter) wrote:
>
>> On Tue, 04 Oct 2005 17:14:45 GMT, Roedy Green
>> <my_email_is_posted_on_my_website at munged.invalid> wrote: 
>> 
>>>On Tue, 23 Aug 2005 08:32:09 -0500, l v <lv at aol.com> wrote or quoted :
>>>
>>>>I think e-mail should be text only.
>> I think that is a useful base standard, which allows easy creation of
>> ad-hoc tools to search and extract data from your archives, etc. 
>>>
>>>I disagree.  Your problem  is spam, not HTML. Spam is associated with
>>>HTML and people have in Pavlovian fashion come to hate HTML.
>>>
>>>But HTML is not the problem!
>> Right, it's what the HTML-interpreting engines might do that is
>> the problem.
>
>You mean the same problem as for example using a very long header in 
>your email to cause a buffer overflow? That is possible with plain 
>ASCII, and has been done.
Are you trolling? No, I don't mean the same problem.
What an HTML interpreter does by _design_ is not in the same category
as an implementation error enabling a root exploit.

>
>>>That is like hating all choirs because televangelists use them.
>>>  
>>>HTML allows properly aligned table, diagrams, images, use of
>>>colour/fonts to encode speakers. emphasis, hyperlinks.
>> All good stuff, but I don't like worrying about side effects when I
>> read email.
>
>Then you should ask people to print it out, and use snail mail. Exploits 
_I_ should, because _you_ can't think of a better solution?
Always happy to get useful advice, though ;-)

>in email programs are not happening since HTML was added to them.
>
You mean they didn't start happening, presumably. But I'm not talking about exploits,
I'm talking about what HTML is designed to do, which is to describe a presentation
composed of elements which in general requires retrieving many elements separately
as the indirect references (links) are interpreted and the data is requested from
the indicated servers -- all at HTML interpretation-time, whatever client engine is
doing that for browser or email reader etc.

Don't get me wrong, I said "all good stuff," as far as control of presentation
is concerned. And I would be happy to have nice graphic email if I could get it
as a self-contained file from my ISP's mail server, and I had a presentation
engine involved that I knew was guaranteed to stick to presentation work without
communicating over the web or doing anything else without my knowledge.

I don't see any technical obstacle to that, but HTML is not designed to be
the solution to that. IMO pdf comes close. I recognize that a pdf interpreter
can also have exploitable implementation errors, just like an ascii email client,
but that is not what I am talking about.

I prefilter email into plain and X/HTML-containing mailboxes, and I don't open
HTML email from unknown sources, though if I am really curious I will drag and
drop the email into a "probtrash" mailbox and use a python script that extracts the
text or other info as text in a console window. All the ones purportedly from ebay and amazon
and paypal have been phishing attempts which would look pretty convincing if displayed
by normal X/HTML interpretation. If my ISP had a better filter or I imporved mine,
I wouldn't see that, but in my normal ascii email boxes I don't have to worry about that,
I just have to resist the social engineering of the offers from Nigeria etc. ;-)

>>>I try to explain Java each day both on my website on the plaintext
>>>only newsgroups. It is so much easier to get my point across in HTML.
>
>> How about pdf?
>
>Ah, and that's exploit free?
That's not the issue. All programs can have the kind of exploit possibilities
that you are talking about. A program with the single purpose of interpreting
a page description and presenting it graphically is easier to eliminate
exploitable vulnerabilities from than a program that involves a lot of additional
stuff.
>
>>>Program listings are much more readable on my website.
>> IMO FOSS pdf could provide all the layout benefits while
>> avoiding (allowing for bugs) all the downsides of X/HTML in emails.
>
>Amazing, so one data format that's open is better compared to another 
>open data format based on what?
I take it you don't understand the difference between pdf and html?

A primary thing is the monitorable data-moving activity that is involved.
A pdf can have links, but they are not followed (not counting what closed
source proprietary softare might risk a PR black eye doing) in the process
of opening and presenting the document to you.

The whole file comes as a single unit normally (though I could see the temptation
to implement automatic font downloads and enable font-bugs like web-bugs based on that,
though in a FOSS implementation, such [mal]features could easily be made optional).

You could say features can be optional re HTML CSS and JS and all the
other automatic web-accessing and other features of HTML, but by the time you
made them all optional and turned them off, you wouldn't see the HTML-author's
intended presentation. That is not the case with pdf. Also, a single pdf file would
be coming from one place. There is not an on-the-fly gathering of elements
that you have to use a special tool to determine for sure where all the
requests to get them went, or to prevent them from going, and having the activity
logged, not to mention what the interpretation of unknown elements might do.

Regards,
Bengt Richter