5

I'm working on a script that displays UTF-8 characters as output. In my Gnome Terminal, this prints out a pretty maple leaf ():

$ echo -e '\xF0\x9F\x8D\x81'

In rxvt, it prints out a box (the character it uses for "unknown"). locale is UTF-8 for both, but the fonts are different. Is there a way to determine on a user's machine whether certain characters are supported or not?

kris
  • 153

1 Answers1

7

An application running in a terminal has no way to find out from the terminal what the glyphs that the terminal has drawn look like (or even if they are substitute/placeholder characters).

One thing the application can do is find out if the terminal supports UTF-8 at all, and if it does, if it supports variable width characters. The method is as follows:

  • Read the cursor position by writing ESC [ 6 n and expecting ESC [ line ; col R
  • Write the 2-byte sequence "\xc2\xa0". If the terminal supports UTF-8, this is a single nonbreaking space. If the terminal does not support UTF-8, it's something unknown but which probably occupies 2 character positions (probably  followed by nonbreaking space, in fact).
  • Read the cursor position again and find out of the cursor moved by one position or two positions

If the terminal does support UTF-8, then you can find out if it supports variable characters widths by basically using the same trick. Read the cursor position, write a character which is supposed to be double-width in monospace fonts, such as あ, then read the cursor position again. If the terminal does not support double-width characters, the cursor will probably have naively moved by only one position.

Celada
  • 44,132
  • 1
    That's only half the problem: the terminal may "know" a width for a character, but not display it due to font limitations. Also, the width may not be what you expect. – Thomas Dickey May 23 '16 at 20:46
  • 1
  • 1
    As Celada correctly points out, there's no way to detect if a glyph is displayed correctly. Although the answer shows you correctly how to detect if UTF-8 is supported. I recommend you not to do this. Any emulator not supporting UTF-8 should've been ditched a long time ago. If the terminal's behavior doesn't match the locale's charset, all the applications will fall apart big time. Imagine if every app repeated this check, and then... then what? It's not feasible. Apps should assume that the underlying system is set up correctly. If you really care, I recommend you to add a FAQ entry. – egmont May 24 '16 at 12:10
  • @egmont is quite right to recommend not using the trick I proposed in general. It's from a login script I wrote in 2003 whose job was to autodetect the UTF-8 support of the terminal and set the locale appropriately. A normal app should just assume the locale is set right without testing. I agree that if you need this kind of trick in 2016 you have a sorry system. – Celada May 25 '16 at 19:38
  • "An application running in a terminal has no way to find out from the terminal what the glyphs that the terminal has drawn look like (or even if they are substitute/placeholder characters)"" : This is not quite true, at least on linux systems. It is sometimes possible to figure out what font a given terminal is using by reading configuration variables, and then detect supported glyphs by examining the corresponding font file. The setfont command can also be used to output a map from unicode-code-points to terminal-font-code points, which provide clues about which glyphs can be rendered. – MRule Jun 02 '21 at 13:06