Clean mail/news messages from dangerous HTML
Document may be slightly outdated

Introduction and Motivation

(Some prose)

Many of the security and privacy attacks on mail software bases on HTML messages. Also, many users dislike HTML, because it practically forces a certain layout on them (although there are other ways to prevent that). That's why many advanced users arrived at the conclusion that HTML in email is a bad thing.

I don't completely agree with that. If, and only if,

then I do think that HTML is superior to plaintext, because it gives semantically richer data.

Having written the plaintext mail display code, I know that much of it is basically guessing - what a URL is and where it ends, what quotes are, wrapping etc.. That's the reason for many problems (like the embarrasing comb-effect line wrap in quotes, produced by Microsoft Outlook Express) and endless discussions (Should abbreviated URLs like www.mozilla.org be recognized? What about ben@localmailserver? Should *important* be recognized as bold? How should quotes to signified by the sender?).

HTML puts an end to all that. Nested quotes, lists, headings, links are all clearly marked and can be displayed the way the user likes it (within the capabilities of the rendering engine) and this usually greatly improves readability. Everything wraps correctly.

The idea of the HTML Sanitizer is to reduce incoming HTML to only the harmless structural markup. This reduces the amount of code that could be exploited for security/privacy attacks significantly. Thus, we get the advantages of HTML (rich information) without the disadvantages (security/privacy risk, sender-defined layout).

But some users still prefer to view all mails in plaintext. That's why I added a mode that shows mails as plaintext only. If the sender sent a plaintext version, display that, ignoring the HTML version. In case the sender sent only HTML, the code will convert the HTML to plaintext first and then back to HTML for rendering. This is also slightly more secure, because it is very unlikely that evil HTML will slip through due to bugs in the sanitizer. It is still vulnerable to possible bugs in Mozilla's HTML parser and networking code, but the likeliness of that is small.

Finally, some users want to see the HTML source. This is as secure as plaintext, but IMO very user-unfriendly. That's why there's no UI for it.

Usage

GUI

Menu:

UI optionpref values
prefer_plaintexthtml_asdisallow_mime_handlers
Original HTMLfalse00
Simple HTMLfalse31*
Plaintexttrue11*

If you select one of the options, the prefs will be sat (premanently) and the message will be reloaded with the new settings.

Any other combination of the backend prefs will cause none of the menu items to be selected initially, but you can still select them (overriding your custom prefs).
*If you manually sat your backend pref disallow_mime_handlers to a value > 1, this value will be used for the UI options Simple HTML and Plaintext.

Backend prefs

The backend prefs are, as often, more flexible than the GUI.

Some particularly interesting uses could be

pref name
(no spaces)
default value Description
mailnews. display. prefer_plaintext false Ignore HTML parts in multipart/alternative
mailnews. display. html_as 0 How to display HTML parts.
ValueDescription
0Render the sender's HTML
1HTML->TXT->HTML
2Show HTML source
3Sanitize HTML
mailnews. display. html_sanitizer. allowed_tags html head title body p br div(lang,title) h1 h2 h3 h4 h5 h6 ul ol li(value,start,compact) dl dt dd blockquote(type,cite) pre noscript noframes strong em sub sup span(lang,title) acronym(title) abbr(title) del(title,cite,datetime) ins(title,cite,datetime) q(cite) a(href,name,title) img(alt,title,longdesc) base(href) area(alt) applet(alt) object(alt) var samp dfn address kbd code cite s strike tt b i table(align) caption tr(align,valign) td(rowspan,colspan,align,valign) th(rowspan,colspan,align,valign)
mailnews. display. disallow_ mime_handlers 0 Let only a few classes process incoming data. This protects from bugs (e.g. buffer overflows) and from security loopholes (e.g. allowing unchecked HTML in some obscure classes, although the user has html_as > 0).
ValueDescription
0allow all available classes
1Use hardcoded blacklist to avoid rendering (incoming) HTML
2... and inline images
3... and some other uncommon content types
100Use hardcoded whitelist to avoid even more bugs(buffer overflows).

This mode will limit the features available (e.g. uncommon attachment types and inline images) and is for paranoid users.

References

Tracking bugs

Disclaimers

No garantees that this works as advertized.