Why MathML

MathML “is a low-level specification for describing mathematics as a basis for machine to machine communication which provides a much needed foundation for the inclusion of mathematical expressions in Web pages” designed by W3C and now part of HTML(5) and EPUB(3). Unfortunately, support MathML wasn’t a priority for the big players that drive the Web and some times the question “Why not deprecate MathML?” raised.

This post try to summarize (with lots of quotes) the very long discussion about native MathML support in Firefox and WebKit/Chromium communities.

Note

Frédéric Wang wrote three amazing blog posts with lots of technical details of MathML support for majors web browsers:

Note

madisli create a pro-cons list related to MathML. Download local copy of the list.

Screenshot of madisli list.

MathML is too specialized

Yes, it is as all other human languages.

“I personally see mathematical writing as language by itself and so not having it in the browsers is just like not supporting Arabic or Asian scripts (BTW MathML was implemented in Gecko a long time before HTML ruby). Just to add one point: mathematical expressions are also very often mixed with other content like text or diagrams and it makes sense to have HTML+SVG+MathML+CSS well integrated together.” Frédéric Wang

And as Peter said:

  • MathML is widely used. Almost all publishers use XML workflows and in those MathML for math. Similarly, XML+MathML dominates technical writing;
  • In particular, the entire digital textbook market and thus the entire educational sector comes out of XML/MathML workflows right now;
  • MathML is the only format supported by math-capable accessibility tools right now.

MathML reinvents the wheel, poorly

“A suitable subset of TeX (not the entirety of TeX, as that is a huge, single-implementation technology that reputedly only Knuth ever fully understood) was the right choice all along, because: (1) TeX is already the universally adopted standard and (2) TeX is very friendly to manual writing, being concise and close to natural notation, with limited overhead (some backslashes and curly braces), while MathML is as tedious to handwrite as any other XML-based format.” Benoit Jacob

MathML was design for machine to machine communication which means that, for example, (1) “f(x)” meaning “f of x” must be different of “f times x” and (2) “ab” meaning “a times b” must be different from “variable named ‘ab’”. For we humans is easy to solve this problem but machines need more information and that’s the reason why MathML need to be tedious to handwrite.

A subset of TeX won’t be sufficient because

“I’ve had to use various LaTeX packages (particularly amsmath and amssymb) in order to get all of the symbols and so on that I needed. I suspect that “heavy” users of TeX frequently need more than these two packages.” Justin Lebar

One of the solutions for symbols is using UTF8 (in this case MathML will be much smaller than TeX) but this means using a GUI that list UTF8 symbols or memorize their codes.

MathML is very stable (although the implementations aren’t) but

“While TeX and the basic LaTeX packages are stable, most macro packages are unreliable. Speaking as a mathematician, it’s often hard to compile my own TeX documents from a few years ago. You can also ask the arXiv folks how painful it is to do what they do.” Peter

MathML has a DOM. And what?

“That may not matter much for rendering, but it does if you want to support (WYSIWYG) editing.” Robert O’Callahan

Although (La)TeX is much older that MathML there isn’t a full native (La)TeX WYSIWYG editor (yes, there is LyX but it use it’s own format).

Pure (La)TeX can only build static pages.

“We expose HTML and SVG content to Web applications by structuring that content as a tree and then exposing it using standard DOM APIs. These APIs let you examine, manipulate, parse and serialize content subtrees. They also let you handle events on that content. CSS also depends on content having a DOM tree structure for selectors and inheritance to work. You definitely need to able to handle events and apply CSS to elements of your math markup.” Robert O’Callahan

MathML never saw much traction

“MathML never saw much traction outside of Mozilla, despite having been around for a decade. WebKit only got a very limited partial implementation recently, and Google removed it from Blink. The fact that it was just dropped from Blink says much about how little it’s used: Google wouldn’t have disabled a feature that’s needed to render web pages in the real world. Opera got an implementation too, but Opera’s engine has been phased out.” Benoit Jacob

How many users will shift to a web browser that have MathML native support? Even if all mathematical sciences (math, mathematical physics, large parts of CS) shift I believe that won’t be enough to Mozilla, Apple, Google, Microsoft or any other player to put money in MathML implementation. But if all mathematical education resource move to the web using MathML some players will start put money (more at Education).

The office suites “MS Word and Libre Office produce MathML out of the box” as Peter remember.

And “the ISO 32000-2 specification is expected to have support for MathML tagging of formulas in PDF documents.” as inform by Deyan.

Using Javascript libraries

“High-quality mathematical typography in browsers is now possible, without using MathML. Examples include MathJax, which happily takes either TeX or MathML input and renders it without specific browser support, and of course PDF.js which is theoretically able to render all PDFs including those generated by pdftex. Both approaches give far higher quality output than what any current MathML browser implementation offers.” Benoit Jacob

MathJax is nice but is a huge Javascript Library and for long documents take some time to properly convert/render the math elements (as Peter said it is ~5 time slower than native support). Even people that use it want not need it:

“We run a large academic journal site which includes occasional math. The challenge for us with MathJax is that we either: (1) include math on every page on our site, or (2) build a math detector that includes MathJax only when we need it. Neither solution is completely ideal. We’re going with (1), but it does mean extra javascript downloads and processing on all our pages, even though most don’t actually have math on them. Caching helps, of course, but first-page views for every user are affected.” msoko...@gmail.com

Moreover

“The MathJax team is strongly in favor of native MathML implementation.” Frédéric Wang

Even IF

“tomorrow a competing browser solves these problems, and renders MathJax’s HTML output fast, we will obviously have to follow. That can easily happen, especially as neither of our two main competitors is supporting MathML.” Benoit Jacob

we have to remember that some devices (e.g. ereaders) will have limited hardware for some reason (e.g. for ereaders is the battery duration).

Education

Peter remember that math is a subject in basic education:

“It’s also something that all school children will encounter for 9-12 years. IMHO, this makes it necessary to implement mathematical typesetting functionality.”

But Benoit Jacob refute this necessity:

“School children are only on the reading end of math typesetting, so for them, AFAICS, it doesn’t matter that math is rendered with MathML or with MathJax’s HTML+CSS renderer.”

Fortunately, Brian Smith give a great reply to Benoit:

“School children traditionally have been on the reading end of math typesetting because they get punished for writing in their math books. However, I fully expect that scribbling in online books will be highly encouraged going forward. School children are not going to write MathML or TeX markup. Instead they will use graphical WYSIWYG math editors. The importance of MathML vs. alternatives, then, will have to be judged by what those WYSIWYG end up using. WYSIWYG editing of even basic wiki pages is still almost completely unusable right now, so I don’t think we’re even close to knowing what’s optimal as far as editing non-trivial mathematics goes.”

And

“Performance matters not only for the initial document rendering. When you do WYSIWYG editing performance characteristics matter in a lot more subtle ways. When you are editing big equations, or some really big document updates need to happen as close as possible to instant.” Mihai Sucan

Unfortunately we still lock with keyboards or WYSIWYG:

“Finally, people are also interested in handwritting recognition (see e.g. https://www.youtube.com/watch?v=26opB8DRf3c or http://webdemo.visionobjects.com/portal.html).” Frédéric Wang

Lot less important, color in math expressions can be very useful:

“Note that I’ve seen at least one PhD dissertation that made good use of color to highlight which terms of a 2-page-long equation canceled each other to produce the final (much shorter; iirc it was 0) result.

Of course that was in the PDF version; the print version had to be black-and-white, and was a lot harder to follow.” Boris Zbarsky

Conclusions

I hope that you already convinced that support MathML in web browser is important but there are some last amazing thoughts.

From Peter:

“MathML still feels a lot like HTML 1 to me. It’s only entered the web natively in 2012. We’re lacking a lot of tools, in particular open source tools (authoring environments, cross-conversion, a11y tools etc).

But that’s a bit like complaining in 1994 that HTML sucks and that there’s TeX which is so much more natural with chapter and section and has higher typesetting quality anyway.

...

A statistical plot has no more reason to be an image than an equation – it should be markup/data in the page and the browser should render it. Browsers may be the new printing press, but we are looking at Gutenberg’s model here, not 20th century digital offset printing.

Anyway, the MathWG has fought extremely hard for 15 years to make mathematics a first class citizen on the web. Certainly, MathML is only the beginning for math on the web. But abandoning it now will throw scientific content back 20 years.

Personally, I don’t want to wait for another Knuth to show up and fix the problem.”

From someone at Design Science:

“MathML was never intended to be typed by humans so it is no wonder that you find it a bad experience. TeX is a poor computer representation which is one reason why MathML was invented.

It is reasonable to have a discussion of the relative merits of entering math by typing TeX vs point-and-click editing of math (ie, direct manipulation editing). I am biased toward the latter but I can understand the feelings of those whose hands know TeX really well.

In short, both MathML and TeX have good reasons to exist and don’t compete with each other in their primary categories.”

From Gerd

“Math is THE universal language, understood by every nation on this world and used to describe and analyse our world in physics, chemistry, biology and even economics.

Let me compare the situation to hypothetical one: In my native language, german, we have ‘Umlaute’: ä ö ü. Two main browers included them already in their charset. All others still have to read something like ae oe ue in their browser. To avoid this, some clever folks have come with a program, that searches for every ae an replaces it with a tiny image of ‘ä’ and so on. This only works when being online.

And now the suggestion is: No, we don`t implement the ‘Umlaute’, as the standard suggests, we make the replacement program faster. NO WAY!”

From gel:

“Students are impatient, even at the undergraduate college level. It is important for math to be fully on the same playing field as classical html.”

From rynec:

“I’m extremely disappointed to hear this. Researchers, educators and students have been waiting for a decent way to communicate on the web about technical topics for the better part of twenty years. MathML provides that.

Can we please prioritize a feature that actually has, you know, an actual social benefit?”

From William F Hammond:

“Do we want our children to grow up with the impression that math is not important enough to be in web pages?”

From Mihai Sucan

“I do not care about the technology here - MathML or TeX. What I care about is for the web browsers to meet the technical demands for producing really good math rendering and editors. I want this not for the academics, not for professors who can write TeX documents. I want this for school children who cannot write math on paper, who are blind, or who have other physical disabilities. Manually writing LaTeX does not “cut it” at early stages, when children learn maths. Such tools are invaluable for them.”

From Frédéric Wang

“It seems that Benoit thought that MathML was not used at all and could easily be dropped or replaced. As others have tried to explain that’s not true and there are already many concrete projects that have been developed for 15 years, several places where MathML plays a key role and many people keeping asking for MathML support in browsers. Certainly LaTeX can be parsed into a tree for WYSIWYG edition, we can convert ASCIIMath directly to HTML+CSS or accessibility tools could perhaps even read a formula without building a tree at all. But we don’t care here since we are talking about the Web and about Gecko-based applications so we want to have an SGML format, a DOM, compatibility with all the HTML5 family and build tools on top of that. There are MathML-based projects to address the needs mentioned in this thread and there are already many pages with MathML content on the Web. So there is no reason to replace MathML by something that will help for the simplest cases (already addressed by existing tools as I said above) but won’t work in general and will break all existing MathML contents and projects. The main remaining issue is the lack of browser support so dropping MathML from Gecko would definitely be the wrong choice and a very negative signal to the Web community, especially since one of the argument given is that Gecko should follow what Blink does. Mozilla should be prude to have had volunteers involved in MathML projects during all these years and see that as a benefit. Fortunately, I see that the majority of comments in this thread go in that direction.”