From Graphics to Math

Note

I recomend that you use Firefox to read this post.

Yesterday, Jure Triglav said

“I dropped by to let you know that I wrote a new post on scientific data visualization: http://juretriglav.si/standards-for-graphic-presentation/ and would love any kind of feedback you could give me”

at #sciencelab (a nice place to talk with people about how to do science in the 21st).

Jure’s post is awesome (and you should read it). The main point of it are that “a committee of several really smart people from all branches of science and engineering” wrote a document for standardizing scientific graphics in 1914.

One of the rules at this document “is essentially saying that data should accompany the figure. A hundred years later, we’re still far from applying this simple but powerful rule. Projects like The Content Mine expend a tremendous amount of energy trying to get data back from figures, and even then it’s a very lossy process. All of that could be avoided if we just follow this one simple rule.”

At Jure’s post you will find what you can do to follow this one simple rule when working with graphics. In this post I want to extend this simple rule to mathematical expressions.

In 1914 the data of mathematical expressions could always be found with the “figure” of it. What I mean here is that for everyone living in 1914 that see, for example,

a 2 + b 2 = c 2

will understand that “the square of the c is equal to the square of a plus the square of b”.

With the creation of computers and the use of it to replace paper we start to see, for example,

../../../_images/pythagorean_theorem.jpg

that looks very similar to the first example and, except if you are a program, you probably will understand it in the same way of the first example.

In the same way of the graphics we are, mostly, using figures to visualize mathematical expressions (data) and doing that we lost the data. And in the same way that a committee of several really smart people wrote a document to address the problem with graphics in 1914 another committee wrote, in 1998, a document to address the problem with mathematical expressions (this document is know as Mathematical Markup Language (MathML) and is a W3C Recommendation).

What are we losing when the data is lost?

In the case of the graphics you need the data to (1) recreate the figures, what is very important for the reproducibility of science, (2) extract more information and (3) use it with other set of data to get new information.

In the case of mathematical expressions you need the data to (1) search and (2) use it as input for machine solve it. MPT wrote one article in 1999 about things that you can only accomplished with MathML and is a good start point to think about it.

What can we do next?

We, mostly, need to convince publishers that they should delivery HTML with MathML instead of PDF’s and support native MathML support (get math at mobile devices is hard). If you want to know more or are interested in helping, send me a email.