Hand-writing MathML

jefftk

LESSWRONG
LW

Hand-writing MathML

by jefftk

jefftk

1 min read23rd Sep 202340 comments

16

Logic & Mathematics ProgrammingPractical

Frontpage

When I write posts I use raw HTML. Yes, the modern thing to do is probably Markdown, but HTML was designed for hand-coding and still works well for that if you don't want anything especially fancy. But what if you want math?

Previously when I've wanted to do math I've written it out as fixed-width ASCII:

e^(-7t)

In my editor this looks like:

<pre>
e^(-7t)
</pre>

This is reasonably readable, works anywhere, and I like the aesthetic. I probably should have stuck with it, but after helping publish a report that included some traditionally-formatted equations and learning that MathML has been supported cross-browser since the beginning of the year (thanks Igalia!), I decided to try it out. I wrote the equations in two recent posts in it, and am mixed on the experience.

It definitely does look nicer:

On the other hand, here's how it looks in my editor:

<math display=block>
<msup>
  <mi>e</mi>
  <mrow>
    <mo>-</mo>
    <mn>7</mn>
    <mi>t</mi>
  </mrow>
</msup>
</math>

There's a small learning curve on when to use the different tags, but mostly it's just very verbose. And I think, needlessly so? That "-" is an operator, "7" is a number, and "t" is an identifier could all be the default. Then I could just write:

<math display=block>
<msup>
  e
  <mrow>
    -7t
  </mrow>
</msup>
</math>

And we could remove many uses of <mrow> too: a series of characters without whitespace separating them could be already treated as a group:

<math display=block>
<msup>
  e
  -7t
</msup>
</math>

Of course if you wanted to use a character for a non-traditional purpose you could still mark it up as one, but a good set of defaults would make MathML much more pleasant. I'd hate to have to read and write blog posts as:

  <word><lt>h</lt><lt>e</lt><lt>l</lt><lt>l</lt><lt>o</lt></word>
  <word><lt>w</lt><lt>o</lt><lt>r</lt><lt>l</lt><lt>d</lt></word>
  <pnct>.</pnct>

I know I'm about 25 years too late on this, and I'm happy that a pure-HTML solution is now cross-browser, but it's still sad we ended up so close to a comfortable hand-editable solution.

(Just use MathJax? Nope—I don't want a runtime dependency on JS. Though I could see including a LaTeX-to-MathML or a MathML-verbosifier step at build time.)

Comment via: facebook, mastodon

New to LessWrong?

Getting Started

FAQ

Library

Logic & Mathematics ProgrammingPractical

Frontpage

16

Mentioned in

15Feedly Breaks MathML

New Comment

40 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:19 AM

[-]jefftk7mo130

Two updates:

I went and coded a verbosifier, so it's possible to write <msup>a 2</msup> + <msup>b 2</msup> = <msup>c 2</msup> and not <msup><mi>a</mi><mn>2</mn></msup><mo>+</mo>...: https://github.com/jeffkaufman/mathml_verbosifier
It turns out Feedly (and maybe other RSS readers?) strip out MathML, so even if they're using a rendering engine that supports it you see nothing. So I'll stick to ascii math for a while longer.

[-]Said Achmiz7mo80

FYI, when cross-posted to Less Wrong, that math in your post gets rendered with MathJax, not browser-native MathML. (This is the case both on LW itself and on GreaterWrong.)

And it’s a good thing, too, because this is what I see in the original post, on your website:

MathML failure

It seems that Chrome only supports MathML starting with Chrome 109, which is not compatible with my version of macOS.

Also, caniuse.com reports the following:

Browsers based on Chromium 109+ specifically support MathML Core. While there is significant support overlap with other MathML implementations there are some differences (see details).

(The “see details” link goes to a discussion on the blink-dev mailing list, which is… less than straightforwardly informative to an ordinary web developer, much less an ordinary web user. I’m still not sure what the differences are. But it’s not exactly reassuring.)

[-]jefftk7mo40

It seems that Chrome only supports MathML starting with Chrome 109, which is not compatible with my version of macOS.

That's not a great setup to be running: there have been several serious vulnerabilities since then, including the WebP zero-day.

specifically support MathML Core

Pretty sure there's nothing I would be interested in hand-coding that's outside of MathML Core.

[-]Said Achmiz7mo20

That’s not a great setup to be running: there have been several serious vulnerabilities since then, including the WebP zero-day.

Sure. The larger point is that (again, as per caniuse.com) MathML support is available to only 90% of users globally. That’s somewhat less than ideal, if you want your site to cater to a diverse user base.

[-]jefftk7mo53

In general I don't think it makes sense for site owners to make changes to support users who are running dangerous configurations, and skimming caniuse it looks to me like this 10% is almost all people running very old versions of browsers.

I also suspect that a lot of what looks like people running really old browsers is actually bots, since it's common to make a bot that emulates whatever the current version is at the time you made it (often because it's based on that browser version and they're not getting around to updating, or because they hardcode a UA and don't prioritize updating).

[-]gwern7mo40

That's a lot of readers to throw away, and if you go to 95%, it isn't that limiting, especially with various kinds of backwards compatibility.

I also suspect that a lot of what looks like people running really old browsers is actually bots

Caniuse isn't reporting raw numbers that bots could trivially inflate, but using Statcounter's statistics, which claims to screen for bots. (How successful they are at this is unknowable, of course.)

[-]Three-Monkey Mind7mo50

That’s a lot of readers to throw away

Depends on how popular you are. Even if you make the highly questionable assumption that browser statistics collected on sites like cnn.com and such are representative of the readership of jefftk.com, if jefftk.com has hundreds of readers, he's still doing a lot of work for a group that can only manage to claim that there are "dozens of us", and in any case really ought to upgrade to a proper browser (and in probably most cases, OS) anyway, for security reasons.

[-]jefftk7mo20

I wouldn't be ok serving pages that didn't work for 10% of my visitors, but I'd be really surprised if the number is really that high.

[-]gwern7mo1410

You are losing >10% if you deliberately break it for 10%. Breakage is the union of all breakages. Like, your site is already on thin ice: you have nonsense like margin-left: em; in your CSS (how large is that margin, exactly?) and <li> list items which are... not... inside any <ol>/<ul> lists? (Almost like a Zen koan. If list items don't need to be inside a list, isn't everything a list item, in some sense...) Also, I have no idea what <script nonce="this-is-not-a-real-nonce" type="text/javascript"> is supposed to do, but it very much makes me wonder if it's doing what it's supposed to do for anyone at all.

Given the hell that is web dev, even if you have immaculate HTML/CSS and carefully code to the standards, you will still run into hilarious breakage for many users, particularly the many mobile and/or Mac users. And remember: silence is not golden, because on the contemporary Internet, "they'll never tell you". (I have had literally half a million people go to a website which was broken, and not a single one sent in a report or comment.) Everything has to be tested, and not taken for granted, like, say, simply assuming that pasting MathML would work out-of-the-box because it worked for you and you hadn't heard otherwise from readers...

[-]jefftk7mo40

Breakage is the union of all breakages. ...

I don't disagree, but none of the things you pointed out are actually breakage as far as I can tell:

margin-left: em;

That was a typo for margin-left: 1em, but the browser ignoring the directive doesn't actually do anything because it only ever appears immediately to then right of something that has margin-right: 1em. Fixed!

<li> list items which are... not... inside any <ol>/<ul> lists

Looks like at some point I missed the <ul>; added. (This is already only semantic -- I have css removing all the list-specific display already)

I have no idea what <script nonce="this-is-not-a-real-nonce" type="text/javascript"> is supposed to do, but it very much makes me wonder if it's doing what it's supposed to do for anyone at all.

The validator is complaining because type="text/javascript" is no longer something you need to write, but it's not really wrong to include it.

The nonce="this-is-not-a-real-nonce" is something I added when I temporarily served my site with a CSP (but without taking the time to fully set it up) as part of verifying that some other code I was testing on my site did the right thing in the presence of a CSP. It's not doing anything, but also not breaking anything. This is annoying enough to rip out that I'm leaving it for now.

immaculate HTML/CSS and carefully code to the standards, you will still run into hilarious breakage for many users

As long as you verify that you're coding to a standard that's supported by the versions of the browsers you're trying to support, what sort of breakage are you thinking about? This does happen (ex: Chrome/iOS advertising in its Accept header that it supported webp when it didn't support inline webp) but it's pretty rare, especially in the last ~5y.

For my own site I normally approach this by testing in multiple browser engines: Chrome + Firefox, sometimes also Safari. When I worked in this area professionally I additionally used careful A/B tests, but that's not worth it for my personal site.

[-]gwern7mo172

I don't disagree, but none of the things you pointed out are actually breakage as far as I can tell:

I didn't say they were. If you are 'skating on thin ice', you have by definition not fallen through and started to drown, because you can't skate and drown simultaneously. (At least, I can't.) My point is that you are engaged in sloppy coding practices, and so it's unsurprising that you are making mistakes like casually assuming that MathML can be copied around or would be compatible with random web applications, when you should know that the default assumption is that MathML will be broken everywhere and must be proven supported. That Internet math support is parlous is nothing new.

because it only ever appears immediately to then right of something that has margin-right: 1em.

Until, of course, it doesn't, because you refactored or something, and hit a spot of particularly thin ice.

but it's not really wrong to include it.

Not at all. (My site has a few instances of unnecessary type declarations not worth ripping out.) I merely quoted that for the nonce part, which did concern me. CSP is one of the most arcane and frustrating areas of web dev, and the less one has to do with it, the better. Leaving in anything to do with CSRF or CSP or framejacking is indeed tempting fate.

As long as you verify that you're coding to a standard that's supported by the versions of the browsers you're trying to support, what sort of breakage are you thinking about?

Web dev is crack & AIDS. We run into problems all the time where we code to a standard and then it breaks in Chrome or Firefox.

The day before yesterday I discovered that when I added dropcaps to my essay on why cats knock things over, it looked fine in Chrome... and bad in Firefox, because they define 'first letter' differently for the opening word 'Q-tips'. (Firefox includes the hyphen in the "first letter", so the hyphen was getting blown up to the size of the drop cap!) My solution was to put a space and write it 'Q -tips'. Because we live in a world without a just and loving god and where standards exist to be honored in the breach.

Especially in Safari, which was created by a fallen demiurge in a twisted mockery of real browsers. Yesterday, Said had to fix a Safari-specific bug where the toggle bar breaks & vanishes on Safari. Worked fine everywhere else, coded against the standard... He also had to polyfill the standardized crypto.randomUUID (2021) for iOS.

And today Said removed the CSS-standardized-and-deployed-since-at-least-2015 property box-decoration-break and -webkit-box-decoration-break from Gwern.net because it breaks in Safari. ('webkit' = 'Safari', for the non-web-devs reading this. Yes, that's right, the Safari version breaks in Safari, on top of the standardized version breaking in Safari for which the Safari version was supposed to be the fix. Good job, Apple! Maybe you can fix that after you get around to fixing your Gill Sans which renders everything written in it full of random typos? And then make your browser hyphenation not suck?) He also had to remove hanging-punctuation due to its interaction with the link text-shadows on Safari, but arguably link text-shadows are a hack which hanging-punctuation shouldn't try to play well with, so might be our fault.

I look forward to tomorrow. (That was sarcasm. If every day were like this, I would instead look forward to the sweet release of death.)

[-]Said Achmiz7mo50

Just use MathJax? Nope—I don’t want a runtime dependency on JS.

Have you heard the good news about server-side MathJax rendering?

Quoting gwern on the subject:

MathJax: getting well-rendered mathematical equations requires MathJax or a similar heavyweight JavaScript library; worse, even after disabling features, the load & render time is extremely high—a page like the embryo selection page which is both large & has a lot of equations can visibly take >5s (as a progress bar that helpfully pops up informs the reader).

The solution here is to prerender MathJax locally after Hakyll compilation, using the local tool mathjax-node-page to load the final HTML files, parse the page to find all the math, compile the expressions, define the necessary CSS, and write the HTML back out. Pages still need to download the fonts but the overall speed goes from >5s to <0.5s, and JavaScript is not necessary at all.

[-]jefftk7mo40

That's a lot more complexity than I want to be maintaining in my publishing pipeline.

I'm also not excited about requiring external fonts.

[-]gwern7mo80

The complexity has been quite minimal. You npm install one executable, which you run on a HTML file in place, and it's done. After the npm install, it's fairly hassle-free after that; you don't even need to host the webfonts if you don't want to. We chose to for some additional speed. (It's not the size, but the latency: an equation here or there will pull in a few fonts which aren't that big, but the loading of a new domain and reflow take time.) IIRC, over the, I dunno, 6 years that I've been using it, there has only been 1 actual bug due to mathjax-node-page: it broke a link in the navbox at the end of pages because the link had no anchor text (AFAICT), which I solved by just sticking in a ZERO WIDTH SPACE. All my other work related to it has been minor optimizations like rehosting the fonts, stripping a bit of unnecessary CSS, adding an optimization setting, etc. Considering how complicated this feature is, that's quite impressive reliability. Many much simpler features, which deliver far less value, screw up far more regularly than the static MathJax compilation feature does.

[-]Said Achmiz7mo40

What do you mean by “external” fonts? Are you referring to webfonts, in general?

If you don’t use webfonts at all, your website is very unlikely to ever look particularly good (much less to look good consistently across platforms)…

[-]jefftk7mo41

Yes, I don't use any webfonts. I'm happy with the default fonts across platforms, don't care whether my site looks consistent across platforms, and don't want the performance penalty of requiring each visitor to load a new font to view my page.

[-]Three-Monkey Mind7mo10

Daring Fireball, a site you've probably heard of, seems to do OK with only browser-supplied fonts:

	font-family: Verdana, system-ui, Helvetica, sans-serif;

Also, jefftk said "requiring". Sure, he could have a site that uses Inter, either loaded from his own site or from a CDN like Google Fonts, but if Inter doesn't load (mostly likely because of user preference), then everything will be fine.

If TeX fonts don't load…then what happens? Does the user see raw TeX, or nothing at all, or…?

[-]gwern7mo62

Daring Fireball is a site one has primarily heard of for being an Apple/Mac shill, so perhaps not the best example of a website relying on OS-supplied fonts...

[-]Said Achmiz7mo31

Daring Fireball also uses:

"Gill Sans MT", "Gill Sans", "Gill Sans Std", Georgia, serif

Because of this, and what you quoted, a page that, on a Mac, looks like this:

Daring Fireball page, as seen on a Mac

on a Linux, looks like this:

Daring Fireball page, as seen on a Linux

i.e., it looks bad.

And that is what happens when you don’t use webfonts.

If TeX fonts don’t load…then what happens? Does the user see raw TeX, or nothing at all, or…?

The user sees the rendered equations, set in whatever font is inherited by the equation element (most likely, the font of the surrounding text block). This might be fine:

Equation set in default body text font on GreaterWrong

(Or, it could be very bad. You never know!)

[-]jefftk7mo42

i.e., it looks bad.

For what it's worth I think the Linux screenshot is fine -- that's the default font on that system.

[-]Said Achmiz7mo20

that’s the default font on that system

I have no trouble believing it, but that speaks more about Linux’s generally sloppy and incompetent approach to typography than it does about whether leaving your website to the whims of OS-provided fonts has good results or not…

[-]jefftk7mo20

I don't know -- when I used Linux on my main machine I was happy with how things looked and generally preferred sites and programs that fit in with the rest of the environment. And Linux users are disproportionately the kind of people who, if they don't like their system's default font, will pick something they prefer.

[-]Lorenzo7mo30

Though I could see including a LaTeX-to-MathML or a MathML-verbosifier step at build time.

This should be something GPT excels at: https://chat.openai.com/share/7e19d5e1-1a17-484c-a2ea-e0d7d2cfd56b If your editor supports gpt plugins

[-]gwern7mo41

You can also use GPT to convert LaTeX to HTML/Unicode, incidentally. For simple inline expressions, this is very good. Like, there is not actually a need to use LaTeX or MathML to render <em>e</em><sup><em>i</em>π</sup>. That works fine in HTML+Unicode, and winds up looking better than an obtrusive MathML/LaTeX block, where even something as simple as $1$ winds up looking visibly alien and inserted.

[-]duck_master7mo30

Speaking of MathML are there other ways for one to put mathematical formulas into html? I know Wikipedia uses <math> and its own template {{math}} (here's the help page), but I'm not sure about any others. There's also LaTeX (which I think is the best program for putting mathematical formulas into text in general), as well as some other bespoke things in Google Docs and Microsoft Word that I don't quite understand.

[-]jefftk7mo50

In terms of what browsers support, MathML is the best way to do it in a modern browser. In an older browser you could do canvas, images, or something with custom fonts.

Most users, though, are in authoring environments that offer something else, usually a way to write LaTeX-style math and have it automatically converted into something the browser can handle.

[-]RHollerith7mo20

If LW would suddenly change so that math could be saved for reading at a later time when I'm not connected to the internet, the amount thinking I do about math would probably suddenly triple.

Details: the main way I save text from web for reading at a later time is by copying a part of the web page, then pasting into a file on my local machine. That does not work for most text containing math.

[-]gwern7mo146

If LW would suddenly change so that math could be saved for reading at a later time when I'm not connected to the internet, the amount thinking I do about math would probably suddenly triple.

The way the static approach on GreaterWrong & gwern.net works is that the original LaTeX is stored alongside the CSS/font/HTML stuff you actually see. Then when you copy-paste, instead of getting a bunch of gibberish letters sans formatting, a little bit of Javascript swaps out the gibberish for the accompanying LaTeX.

So for example, if I go to a random recent page with LaTeX in it, like https://www.greaterwrong.com/posts/wR8CFTasFpfCQZKKn/if-influence-functions-are-not-approximating-leave-one-out , and I copy-paste the first complicated-yet-abstractly-beautiful math expression, I get: LOO(\hat x,\hat y) = \text{argmin}_\theta \frac 1 N\sum_{(x,y)\sim D-\{(\hat x,\hat y)\}}L(f_\theta(x),y) in my Emacs text buffer. This is what the author originally wrote, so it's as lossless as it gets, and if you are able to understand what it means, you presumably already know how to read the LaTeX version, and your text editor can render it or whatever else you need to do with it. I haven't seen any better solutions.

(Whereas for OP, written in MathML, I get e−7t on LW, and Equation on GW. Hypothetically, they could try to decompile or interpret it as LaTeX, but needless to say, they do not. And even if they copy-pasted it as MathML - what destination programs would support MathML? Very few, I imagine.)

[-]RHollerith7mo20

There is a bug in GW around the functionality you describe: navigate to an article posted today, namely,

https://www.lesswrong.com/posts/JCgs7jGEvritqFLfR/evaluating-hidden-directions-on-the-utility-dataset

Then use the mouse to select the equation that occurs right after "the projection (aka the scalar product)"

When you paste that equation (I tried 2 programs, Emacs and gnome-text-editor, as the destination of the paste operation), you get

P(x_i) =

--with the right-hand side of the equation completely missing.

[-]gwern7mo42

I can't replicate this with my Ubuntu Linux/MATE/Firefox/Emacs setup. I get the whole equation no matter how I copy it.

(Note that there is one catch to the JS copy-paste listener: confusingly to contemporary users, X.org has multiple copy-paste buffers, 'primary' / 'secondary' / 'copypaste', of which browsers will apparently only allow web page JS to affect the first one. Since the browser doesn't cooperate, this cannot be fixed by the webpage. So if you copy-paste in X.org, depending on how you do it, you may get the intended P(xi)=<xi,v> or you may get that newline-after-every-character version that jefftk quotes. If you are unsure what is going on, you can investigate using the xclip utility, like xclip -o -selection copypaste vs xclip -o -selection primary.)

[-]Said Achmiz7mo20

Hmm, I can’t replicate this bug on GreaterWrong. Could you please say what browser/version/platform you are using?

Also, do other equations on other posts work?

[-]RHollerith7mo40

Chrome downloaded from Google, running on Fedora 38 using the standard graphical environment (Gnome on Wayland).

Firefox works correctly.

>Also, do other equations on other posts work?

5 other instances of LaTex (some paragraph equations, some not) on 3 other posts work.

[-]Said Achmiz7mo40

Which version of Chrome, please? (You can find this out by putting chrome://version into your URL bar.)

5 other instances of LaTex (some paragraph equations, some not) on 3 other posts work.

Hmm, so it is just that one specific post, and the equations in that one post copy-paste incorrectly, while the equations in every other post you’ve tried copy-paste correctly? Is that right?

[-]RHollerith7mo20

Chrome reports as 117.0.5938.92 (Official Build) (64-bit).

I already described the problem with the first paragraph equation (display equation) on the page.

The second paragraph equation, which can be located by searching for "log-likelihood", also has the problem. In particular, it copies as

\text{PPL}(X) = \exp\left(-\frac{1}{n}\sum_i^n \log p_\theta(x_i|x_{

The third one, locatable via "concept vector v", works correctly:

P_{\perp}(x_i) = x_i - \frac{}{||v||^2}v\,.

There is no fourth paragraph equation on the page.

Let me know if you want me to continue to search for instances of the bug, on other pages.

[-]Said Achmiz7mo20

Alright, thank you.

I’ll try to figure out what might be causing this, though I can’t promise it’ll be soon, unfortunately.

[-]jefftk7mo20

Copying that equation from LW with Chrome on Mac, anything I paste it into (pbpaste, standard website, Google Docs) I get:

P
(
x
i
)
=<
x
i
,
v
>

But when I use the GW version I get:

P(x_i) = <x_i, v>

Did you mean to link to the LW version of the post?

[-]RHollerith7mo20

Great! My being given some way to obtain the original LaTeX as written by the author is the solution I have been tending to imagine over the years when I imagined what might be the best realistically-achievable way to change LW to accommodate my current workflow!

Thanks for pointing it out!

BTW, I'd like to learn more about the workflows of people who work with math all day every day.

[-]jefftk7mo20

MathML should copy fine, as long as the destination program supports it. What program are you pasting it into?

[-]RHollerith7mo20

I've been pasting into Emacs. If you're a Linux user, I would be interested to know what program you paste math into. Or if the thing you paste into "uses web tech" (and consequently is independent of OS), tell me which web site or program it is.

[-]jefftk7mo100

After shooting my mouth off I went and tried it, and even programs that I would expect to handle it well (ex: Google Docs) didn't. Sorry!

(I think my statement is still literally true, except for the problem that ~no destination program currently supports it)

Moderation Log