1

I'm trying to export an Org table that contains the Unicode characters for moon phases (U+1F311 etc). Export to HTML works fine, but I just see blank spaces in the exported PDF. I know next to nothing about LaTeX. What can I do? Preferably from within Org. Emacs 27.2, Org 9.5.3, Manjaro.

Here's the table in full:

|---------------+---------------+-------|
| *Phase*         | *UCS Codepoint* | *Glyph* |
|---------------+---------------+-------|
| New moon      | =U+1F311=       |     |
| First quarter | =U+1F313=       |     |
| Full moon     | =U+1F315=       |     |
| Last quarter  | =U+1F317=       |     |
|---------------+---------------+-------|
|               |               |  <c>  |
Phil Hudson
  • 1,651
  • 10
  • 13
  • If you are using `pdflatex` for the export, you might want to use `xelatex` or `lualatex` and modify `org-latex-pdf-process` to use one of them instead: they are much better at dealing with Unicode. And see if you can install `latexmk` or `texi2dvi` which are smarter processors (see the doc string of `org-latex-pdf-process` for some info). As an example, my setting is `"latexmk --shell-escape -pdf -xelatex -output-directory=%o %f"`. – NickD Jun 03 '22 at 15:10
  • Unlike with `latex`, exporting to `pdf` via `odt` doesn't require additional setup beyond installing `LibreOffice`. Do `(setq org-odt-preferred-output-format "pdf")` and export the `org` file to `odt` with `C-c C-e o O`. You will get a `pdf` file, with no missing characters. –  Jun 04 '22 at 01:32
  • Thought I'd try the LO approach first, since it sounded like a good solution, but what a nightmare I've run into: something in my config (?) is either preventing a crucial export directory (`~/tmp/odt-xxxxxx/META-INF`) being created or immediately deleting it, erroring the conversion process and leaving me with `Manifest.xml` buffers I can't kill until I either manually recreate the directory they're supposed to be in or restart Emacs. So... sounds great, but no joy. Web searches avail me nought so far. – Phil Hudson Jun 04 '22 at 10:28
  • @NickD I am using `pdflatex`. I had already tried transitioning to `latexmk` before posting, never got a completed run. I guess the next step is to try your precise recipe. – Phil Hudson Jun 04 '22 at 10:43
  • @NickD Using your recipe exposed a conflicting `geometry` command in my file, which I quickly fixed, and then produced very nice but indistinguishable output -- still no moon phase glyphs, just white space. I'll stick with your recipe anyway for future use, taking you at your word that it has other advantages. – Phil Hudson Jun 04 '22 at 10:52
  • I don't think the `odt` exporter will be *that* trivially broken ... Try to minimize the contents of `org` file and post the "minimal" problematic content. On linux, `temporary-file-directory` points to `/tmp/`, and in your case it is pointing to `~/tmp/`. What is special about this mount point .. I suggest you file a bug report with Emacs. Or ... if you want more arrows in your quiver try [the enhanced ODT exporter](https://github.com/kjambunathan/org-mode-ox-odt#installation) , and open a bug in *this* repo. –  Jun 04 '22 at 12:47
  • ODT export fails in the same way with an Org file consisting of nothing but headline "Foo" and subheadline "Bar". Changing `temporary-file-directory` to "/tmp" makes no difference. I'll try the enhanced exporter next. – Phil Hudson Jun 04 '22 at 13:54
  • 1
    Losing the will to live... all I wanted was some Unicode character glyphs... how much time would I have saved if I'd printed to PDF from my web browser. Hey! That works! – Phil Hudson Jun 04 '22 at 14:02
  • Can you add a minimal example to your question? – NickD Jun 04 '22 at 14:26
  • Guess what? I just tried starting from `emacs -Q` and export to PDF via ODT works now, including nice moon phase glyphs. So it *is* my config that's messing things up. – Phil Hudson Jun 04 '22 at 15:15
  • @NickD Added the literal table, is that sufficient? – Phil Hudson Jun 04 '22 at 15:28

2 Answers2

1

Here's a better method to deal with Unicode characters that do not map to a glyph in the current font (the old answer is left here for reference: there is some useful information in there, but I don't think the \setmainfont method should be used).

This assumes that you are using XeLaTeX as your processor.

See the old answer below for the setting of org-latex-pdf-process that uses XeLaTeX.

The basic idea is that of font substitution. If you use Unicode chars with glyphs that are not provided by the main document font, we want to arrange for XeLaTeX to insert the glyphs from a different font that does provide them. That is what Emacs does e.g. automatically (if, despite its best efforts, none of the fonts it knows about provides the necessary glyph, Emacs will print a box with the Unicode number of the char inside it: try C-h h to print the Hello file in various scripts - in my case, I don't have TaiViet fonts, so that line just shows the boxes. In contrast, LaTeX leaves it blank).

For LaTeX, you have to tell it what fonts to use: it doesn't have a boatload of them predefined. That can be done for XeLaTeX by defining the following symbolasubst.sty

% define a new font family
\usepackage{fontspec}
\newfontfamily{\SymbolaSubstFont}{Symbola}

% use the interchartoken mechanism for font substitution
\XeTeXinterchartokenstate=1
\newXeTeXintercharclass\SymbolaSubst

% define the chars that are going to be substituted
\XeTeXcharclass"1F311=\SymbolaSubst
\XeTeXcharclass"1F313=\SymbolaSubst
\XeTeXcharclass"1F315=\SymbolaSubst
\XeTeXcharclass"1F317=\SymbolaSubst

% enclose every "unknown" character in a group that uses the substitute font
\XeTeXinterchartoks 0 \SymbolaSubst = {\begingroup\SymbolaSubstFont}
\XeTeXinterchartoks 4095 \SymbolaSubst = {\begingroup\SymbolaSubstFont}
\XeTeXinterchartoks \SymbolaSubst 0 = {\endgroup}
\XeTeXinterchartoks \SymbolaSubst 4095 = {\endgroup}

This informs XeLaTeX that the specified Unicode characters should be substituted from the Symbola font, rather than from the main font (whatever that font might be). Here is a reference to an older and somewhat outdated version that I used and touched up lightly to come up with the version above (see the comments on that answer for the changes necessary). The reference also describes how to deal with whole swaths of characters if you have to.

Once you have the above file (maybe in the same directory as your Org mode file, but if you want to reuse it, then install it in a directory that XeLaTeX knows about), using it is easy - you just have to add the \usepackage to the preamble:

#+LATEX_HEADER: \usepackage{symbolasubst}

* Test
I'm trying to export an Org table that contains the Unicode characters for moon phases (U+1F311 etc). Export to HTML works fine, but I just see blank spaces in the exported PDF. I know next to nothing about LaTeX. What can I do? Preferably from within Org. Emacs 27.2, Org 9.5.3, Manjaro.

Here's the table in full:

|---------------+-----------------+---------|
| *Phase*       | *UCS Codepoint* | *Glyph* |
|---------------+-----------------+---------|
| New moon      | =U+1F311=       |       |
| First quarter | =U+1F313=       |       |
| Full moon     | =U+1F315=       |       |
| Last quarter  | =U+1F317=       |       |
|---------------+-----------------+---------|
|               |                 |   <c>   |

I'm pretty sure that this version will work not only with the minimal file but with any file that uses these characters. If you try it out, please let me know if there are any problems.


OLD ANSWER

Here's one way to get what you want out of the Org->LaTeX->PDF workflow.

I assume you have set org-latex-pdf-process to '("latexmk --shell-escape -pdf -xelatex -output-directory=%o %f") as mentioned (somewhat inaccurately - the value of org-latex-pdf-process is a list of strings, not a string) in my comment. The following should work with lualatex as well as xelatex, although I've only tried with xelatex. As mentioned in the comment, pdflatex's support for Unicode is rudimentary: xelatex and lualatex are better choices in this day and age.

The problem is that the glyphs for the Unicode code points you use come from a font that TeX does not know about (but apparently emacs, web browsers and LibreOffice do). If you do C-u C-x = on any of the moon phases glyphs, you'll see that they all come from the Symbola font. This is installed as a system font, but not as a TeX font. However, xelatex and lualatex can use arbitrary TTF/OTF fonts fairly easily.

To tell TeX about this font, you can use the fontspec package. Here's a complete minimal example based on your table: N.B. See the better answer above. I do not recommend setting the main font as is done below.

#+LATEX_HEADER: \usepackage{fontspec}

#+LATEX_HEADER: \setmainfont{Symbola}

* moon phases
|---------------+-----------------+---------|
| *Phase*       | *UCS Codepoint* | *Glyph* |
|---------------+-----------------+---------|
| New moon      | =U+1F311=       |       |
| First quarter | =U+1F313=       |       |
| Full moon     | =U+1F315=       |       |
| Last quarter  | =U+1F317=       |       |
|---------------+-----------------+---------|
|               |                 |   <c>   |

Exporting this to PDF should now work.

This is based on this blog post which I found in this TeX/LaTeX SE question.

NickD
  • 27,023
  • 3
  • 23
  • 42
  • Fantastic progress. This does indeed work with the minimal example. I found with the original file that I had to leave out the `\setmainfont` (it caused some completely opaque error that resulted in a zero-length `*Org PDF LaTeX Output*` buffer) and instead use inline fragments: `\fontspec{Symbola} ` etc. That worked! I finally had a PDF that looked like I wanted. Unfortunately, though, that borks my HTML output, where the inline fragments are rendered literally. So you've answered the question and deserve to have it accepted, thank you very much, but I've had to go with the ODT one. – Phil Hudson Jun 05 '22 at 05:39
  • Does the `moon symbol` render in *color* on your `Emacs` and `LibreOffice` side. Can you share the fonts that Emacs is using (Hint: `C-u C-x =`) and `LibreOffice` is using (Hint: Export to `pdf` and within `evince` see the `Properties`->`Fonts`). Also look at my remarks here https://bugs.documentfoundation.org/show_bug.cgi?id=129523#c50 . Speaking of the converter ... it is better to use `soffice`, and leave `org-odt-convert-process` at its default value of `LibreOffice`. AFAICS, using `unoconv` doesn't provide any specific advantage. I consider `unoconv` a bloat. –  Jun 05 '22 at 07:09
  • On the Emacs side I use Iosevka, which recently added the glyphs in response to my RFE. `qpdfview` indicates Symbola in the generated PDF. – Phil Hudson Jun 05 '22 at 07:39
1

TLDR: I suggest that you export to pdf via odt, but remember to configure LibreOffice to replaceNoto Color Emoji with Symbola or Emoji One Color or any non-Coloured font. I couldn't find Emoji One Color font on my Debian Unstable/June 2022 , so I went with Symbola.


To get this

ODT/PDF Export with Symbola font

do this

  1. Set your preferred output format to pdf
(setq org-odt-preferred-output-format "pdf")
  1. ... and export to odt with C-c C-e o O
#+odt_preferred_output_format: pdf

|---------------+-----------------+---------|
| *Phase*       | *UCS Codepoint* | *Glyph* |
|---------------+-----------------+---------|
| New moon      | =U+1F311=       |       |
| First quarter | =U+1F313=       |       |
| Full moon     | =U+1F315=       |       |
| Last quarter  | =U+1F317=       |       |
|---------------+-----------------+---------|
|               |                 |   <c>   |

Configure LibreOffice to use Symbola instead of Noto Color Emoji

To get LibreOffice to use Symbola instead of Noto Color Emoji, within the LibreOffice GUI Tools->Options, do this

Configure LibreOffice to translate Noto Color Emoji to Symbola

Note: I am assuming that these days most folks have full range of Noto fonts installed by default. So, in your specific case, it is possible that LibreOffice has picked a different font for rendering the moon symbol emojis. So, set the font translation table to what is appropriate in your case.

LibreOffice-7.3.4.1 has issues with Noto Color Emoji; The colour emojis renders well on the LibreOffice UI side, but not on the PDF export side

See LO bug#129523. This is marked as a High Major bug as on June 3, 2022, so expect that this bug will be fixed very soon.

LibreOffice has issues with Noto Color Emoji

  • Good news: successful export, no futzing with LO settings required. Superior output, including my themed colors in code blocks. Slight bad news, and in no way your problem: due to my aforementioned config issue, where the process aborts after creation of an intermediate `.odt` file but before conversion to PDF, I have to manually run `unoconv`. I'll automate that and leave `org-odt-preferred-output-format` at default. – Phil Hudson Jun 05 '22 at 05:53