Dotting i's and Crossing t's - a Journey to Publishing Elegance

by wedrifid7 min read14th Mar 201215 comments

19

Practical
Personal Blog

More literally a journey to making the dots of the 'i's line up just right with the 'f's and ensuring that the crossing of 'T' meets up neatly with the tip of the 'h' - all without breaking text searching and copy and paste.

Task

Now, as we all know, science isn't just about little things like peer review and double blind placebo controlled studies. Far more important is presenting your work in accordance with the grand traditions of scientific publication - all while ensuring you flatter all the right people for their sometimes obsolete and possibly only slightly relevant past works. Of course you must do this all according to standard citation formulae developed a century or two ago back when the city in which a text document was published was somehow a useful piece of information.

Some may consider people like Galileo and Bacon to be the most influential figures in science but the man who made the greatest contribution to the way humanity seeks and disseminates knowledge is of course Donald Knuth. The man who took a decade off writing his multi-volume magnum opus [The Art of Computer Programming](http://en.wikipedia.org/wiki/The_Art_of_Computer_Programming) to create TeX, the foundation of LaTeX and without which science as we know it would be unrecognizable. These days presenting academic publications without using LaTeX may be nearly as uncouth and banal as writing about your research in first person rather than than the passive voice!

The above cynicism is largely sincere and only a trifle exaggerated. Yet at the same time I acknowledge that there is much value to be had in wearing a uniform and the time for lonely dissent is not on matters as trivial as presentation. The overhead of presenting work in a form that other academics are willing to accept is comparatively minor and the payoffs significant.

One of the many initiatives lukeprog has set in motion now that he is organizing things over at SingInst is the porting of all of SIAI's past publications from various adhoc formats to LaTeX with a standard publication template. You can see an early example of the new format here.

Challenge

Unfortunately, Wei_Dai encountered a problem. In the first presentation of the converted document copy and pasting "The" would give something like "Ļe" and copying "fi" would give "ŀ". The problem is with the implementation of ligatures. Back when typesetting was done manually - I can only imagine using a whole bunch of little metal stamp like things that could be plugged into the right places - the typsetters had an extra collection of pseudo letters to use instead of combinations like "fi", "ffi" and "Th". The reason being that those particular combinations just don't look too good if they are placed together the same way that you would place them with other letters. You wind up with either having the too far apart or having parts of them overlap in a way that isn't particularly neat.

In the font SingInst uses the non-ligature versions of 'f' and 'i' combine with the dot of the 'i' only partially ovelapping the 'f' which somehow makes it jump out more easily to the reader. The way this is solved with the ligatures is actually increase the degree of overlap such that the f smoothly blends in to the i. Someone with far more highly honed aesthetic sense than I concluded that this is the best way to present English letters and it looks fairly good to me so I'll take their word for it.

The problem is that while ligatures are easy for humans to read "Notepad", "Word" and "Firefox" aren't nearly as smart. And unfortunately there isn't a consistent standard between fonts of which ligature means what so we end up with all sorts of random mess if we try to copy and paste from a ligature riddled document into our editor of choice. This left me with rather a lot of work to do while I was generating LaTeX files from those of the old SingInst publications that were only available in PDF form and that isn't a task I would wish on all the future consumers of SingInst literature.

Opportunity

Fortunately, the PDF format and the LaTeX are both advanced enough to handle making the visible text use the ligature characters while keeping the original text available for easy copy and pasting by the interested reader. This involves something called a 'cmap'. It is a mapping from an input encoding to the output encoding. With that cmap embedded in the pdf file any fully featured pdf reader is able to take the pretty text, strip apart the ligatures and figure out what they were originally.

Why then is Wei unable to copy our Th's and fi's? I haven't the slightest idea. My research suggests that the xelatex distribution we were using should just work and handle this sort of thing. So confident is it in managing such mappings that it outright rejects compatibility with the 'cmap' passage which could be used in the older 'pdflatex' compiler to handle this sort of task.

Attempted Workarounds

  • \usepackage{cmap} - Recommended as the solution to all problems ligature related as the result of all obvious google searches. Unfortunately the package doesn't load in xelatex and from all reports just isn't supposed to be needed.
  • Use a different, similar font. There are plenty of alternatives to Adobe Caslon Pro - Adobe Garamond Pro for example. No luck - the problem seemed to apply to all fonts installed to the system (and thereby made accessible via xelatex's font magic).
  • Find a font that doesn't need ligatures - This works, obviously. There are plenty of fonts that keep the letters sufficiently spread - or are even mono-typed. None of them looked anywhere near as good as Adobe Caslon Pro but they would have to suffice if no better alternative could be found.
  • Manually edit .map files. If I recall that helped a tad. One by one characters could be retargetted but then all ended up pointing at the basic font rather than, say to 'bold'.
  • Extracting maps from otf (font) files - There are all sorts of linux based command line tools for the manipulation of fonts between various formats and the extraction of data from them. Some of the work as specified. The ones that try to do more than one step at the same time do not - at least without extensive intervention. While no doubt it would lead to eventual success this approach is not recommended to anyone who has less than several weeks to spend learning the dark arts of font internal details.
  • autoinst - this is a tool that is supposed to 'just work' and install fonts for use even in the comparatively primitive pdflatex. Suffice it to say that it does not.
  • autoinst with manual assistance - autoinst seems to produce all the files that should be needed, the task then is to distribute them in a way that allows them to work with latex. This approach would probably work... eventually. It is far from trivial and did not work within the time I allocated to.
  • Expert assistance - Money solves everything. Luke contacted assorted people who know about LaTeX and offered to pay them to fix our problem. Unfortunately none of the responders had a clue in this case, at least not at first glance.
  • Pristine, up to date installation of TeXLive -often the packages installed by ubuntu are not as fresh as those to be had by installing directly from the source. Reverting the ubuntu virtual machine to a pre-latex state and downloading 2gb worth of TeXlive distribution could well have helped. It didn't.
  • lualatex or pdflatex instead of xelatex - no luck (yet).
  • MikTeX - the easy to use windows based distribution of latex may have allowed the autoinst program or perhaps xelatex magic to 'just work'. It didn't - in fact a known bug in one of the packages in that distribution prevented the SingInst template from working with MikTeX at all.
  • inbuild font packages - success - to a degree. Fonts that come with old style latex packages in either MikTeX or TeXLive work as intended. They still don't look as good as Adobe Caslon Pro but would have been been good enough.

Success!

  • Running MikTeX instead of TeXlive Reinstalling MikTeX, downloading fresh packages and then running lualatex. Success! Adobe Caslon Pro now appears in our publications without any Ligature related problems. Why did this work while lualatex on TeXLive still doesn't work correctly? I'm not entirely sure. But I'm rather glad I had the hunch to go back and try it even when my attention had moved on to more important matters.

Optimal Decision Making

An analysis could be done on what the optimal problem solving strategy would have been at any point in that process. Among other things I would note that rather early on in the process I decided that the expected value of continuing to attack the problem was rather low - so I stopped billing Luke for the time. But since I really don't like being bested by a challenge I went ahead and did it anyway. Much frustration was involved but in this case I was rewarded with a large boost of personal satisfaction and with SingInst publications that are an iota or two more beautiful!

19

15 comments, sorted by Highlighting new comments since Today at 4:18 AM
New Comment

Isn't the punchline that you solved the problem? If so, that punchline is buried.

True, fixed.

(In case you were wondering, yes, I did just write a post rather than just a reply to Wei's comment because I wanted to use "Dotting i's and crossing t's" as a double entendre in the title.)

The TeX Stackexchange is a good place to ask questions about LaTeX. (As an example, this question is similar to the problem solved here.)

The TeX Stackexchange is a good place to ask questions about LaTeX. (As an example, this question is similar to the problem solved here.)

One of the many threads on TeX Stackexchange that I memorized while tackling the problem. Your point is well taken, however, I have never once treated forums like that as if they grant write access. I seek out solutions that are already there but don't bother trying to have people solve mine. Perhaps because most technical problems of this nature don't end up being nearly so intractable.

Regarding LaTeX: My father (a professor of engineering) once got annoyed that the journal he submitted to wouldn't accept his Microsoft Word file as a submission, asking for a LaTeX one instead. Having no clue what the heck LaTeX was and not wanting to learn any kind of crazy new system, he managed to get the journal to accept a PostScript document instead. (I showed him LaTeX and he said anyone who wants to learn a whole programming language so they can do what Microsoft Word will do just fine must either be crazy or a professional typesetter who has the job of formatting things for a dead tree edition.)

I had to submit two book chapters in Word format recently -- just finished fiddling with the references this morning -- and I've decided that in future I'll just refrain from submitting when Word format is required. So much pain in every step of the process.

I had to submit two book chapters in Word format recently -- just finished fiddling with the references this morning -- and I've decided that in future I'll just refrain from submitting when Word format is required. So much pain in every step of the process.

But, but, you know LaTeX! Knowing word and having to submit in LaTeX is a nightmare. Knowing LaTeX and having to export to word is a matter of replacing your LaTeX to pdf converter with a LaTeX to word converter!

It's a googlesearch away.

Knowing LaTeX and having to export to word is a matter of replacing your LaTeX to pdf converter with a LaTeX to word converter!

Oh no it isn't! In fact that was the workflow I tried for one of the two chapters. It was really painful for many reasons. One of the big things that people always seem to overlook in this discussion (in both directions) is the need to use templates specified by the publisher. That messes up the workflow.

Of course, I do admit that I know LaTeX better than Word.

Word AND their template? Barbarians! What are they thinking? A research embargo upon them!

Can't tell if serious or joking :-/

Just in case, the point about the template is that you can't export from LaTeX to Word and then impose a template on that. You have to start from the template.

Have you published in academia? Almost all conferences, journals, and edited books do indeed require a template.

Can't tell if serious or joking

More or less serious with a touch of hyperbole.

Have you published in academia?

Yes, with LaTeX.

[-][anonymous]10y 0

Having to submit things in Word format is always hilariously painful.

What program would you rather be using?

LaTeX. There's a learning curve but I am long long past that. I don't feel that it gets in my way.