Questions tagged [unicode]

Unicode is a standard for the encoding, representation and handling of text with the intention of supporting all the characters required for written text incorporating all writing systems, technical symbols and punctuation.

Unicode

Unicode assigns each character a code point to act as a unique reference:

  • U+0041 A
  • U+0042 B
  • U+0043 C
  • ...
  • U+039B Λ
  • U+039C Μ

Unicode Transformation Formats

UTFs describe how to encode code points as byte representations. The most common forms are UTF-8 (which encodes code points as a sequence of one, two, three or four bytes) and UTF-16 (which encodes code points as two or four bytes).

Code Point          UTF-8           UTF-16 (big-endian)
U+0041              41              00 41
U+0042              42              00 42
U+0043              43              00 43
...
U+039B              CE 9B           03 9B
U+039C              CE 9C           03 9C

Specification

The Unicode Consortium also defines standards for sorting algorithms, rules for capitalization, character normalization and other locale-sensitive character operations.

Identifying Characters

For more general information, see the Unicode article on Wikipedia.

132 questions
25
votes
4 answers

Line height with unicode characters

Some Unicode characters cause the line they are displayed on to be quite large. For example, a grave accent "̀ " adds about 2.5 lines of space above and below it. Other characters that cause this behavior include the Greek letter "ϕ" (phi) or…
Patrick Steele
  • 1,105
  • 9
  • 10
25
votes
1 answer

How do I set up font fallback in a robust way?

TL;DR: What's a simple way to reliably say: use Consolas as the default font, FreeMono for the characters unsupported by Consolas, and Symbola for characters unsupported by both? Since my main programming font does not cover all the mathematical…
Clément
  • 3,924
  • 1
  • 22
  • 37
18
votes
3 answers

Exporting unicode characters to pdf using latex from org mode

Hello Good people of Emacs! I'm having trouble exporting unicode math symbols from buffer (org-mode) to pdf file. 1. Problem Description: Symbols are inserted to the buffer as unicode characters (via TEX input method or company-math) Here is…
Empty_Mind
  • 1,341
  • 12
  • 27
17
votes
3 answers

Tell a dash (-), an en-dash (–) and an emdash (—) apart

A dash (-), an en-dash (–) and an emdash (—) are different but difficult to tell apart. This causes problems e.g. when writing programs. Is there some way to tell them apart easier in emacs? Thanks.
Tim
  • 4,987
  • 7
  • 31
  • 60
16
votes
1 answer

How can I draw in unicode with artist-mode?

I'd like to produce unicode art with artist-mode. But it appears that artist-mode only supports ASCII art. How can I configure artist-mode to produce instead of +---------------+ | | | +-------+--+ | | | | …
Flow
  • 564
  • 6
  • 17
14
votes
5 answers

Fuzzy completion when inserting Unicode characters

I just discovered C-x 8 RET as a way to insert Unicode characters by Unicode name or hex code. I have not (yet?) memorised all Unicode character names, so I don't always find the right character. There is, for example, the character "SNOW CAPPED…
user2005
13
votes
4 answers

How to display Unicode UTF-8 as Unicode?

I have some UTF-8-encoded text files which display strange escape codes in Emacs. For instance, this text: In ista quaestione primo exponam quid intelligendum est per hoc nomen ‘Deus’; secundo, respondebo ad quaestionem. Shows like this in…
NVaughan
  • 1,481
  • 12
  • 27
12
votes
5 answers

Ways to unobtrusively vary text rendering?

I'm writing an emacs extension for use with speech recognition, and I'm looking for help with a particular feature. Some words the speech recognizer (Dragon) recognizes consistently poorly -- it doesn't matter how many times you train it, it will…
Joseph Garvin
  • 2,061
  • 8
  • 21
11
votes
1 answer

unicode.txt slowness

Moving around point (using the cursor keys) in Xah’s unicode.txt in fundamental-mode is noticeably slower than in an ordinary text file. Are the many non-ASCII characters the issue? Anything else? About: GNU Emacs 25.2.1 (x86_64-w64-mingw32) of…
feklee
  • 1,029
  • 5
  • 17
10
votes
1 answer

How can I save UTF-8 files with a Byte Order Mark?

I am trying to configure Emacs to save UTF-8 files with a Byte Order Mark. (Yes, I know that the BOM is evil and unnecessary for UTF-8 files. However, Microsoft has decided they know better, so I want to make sure I'm able to save files with…
Scott Weldon
  • 2,695
  • 1
  • 17
  • 31
9
votes
1 answer

Force a single font for all unicode glyphs

I'm using GNU Emacs 24.4.1 in a GUI on OS X. I want to force every character to be displayed using just a single font rather than allowing Emacs to choose a supposedly most appropriate. I understand that no font will include every glyph, but I use…
karl
  • 257
  • 1
  • 4
9
votes
1 answer

Coding System utf-8 on Mac - Which one and why as default?

I want to change my default encoding system from non defined to UTF-8 (I think that would be useful). Now I have seen many different UTF-8 coding systems: mule-utf-8 mule-utf-8-dos mule-utf-8-mac mule-utf-8-unix prefer-utf-8 …
Rainer
  • 897
  • 10
  • 16
8
votes
1 answer

Insert character by its Unicode name

From the documentation of insert-char, I cannot see why (insert-char "GREEK SMALL LETTER EPSILON") doesn't work. Is there a non-interactive way to insert a character given its Unicode name?
Toothrot
  • 3,204
  • 1
  • 12
  • 30
8
votes
3 answers

Fast unicode symbol insertion?

Currently, I'm inserting unicode characters (mainly math symbols) using TeX input method. This is cumbersome, since, for each character, I have to do following: Switch to TeX input method pressing C-\ type latex expression like \Bbb{R} or…
Empty_Mind
  • 1,341
  • 12
  • 27
7
votes
2 answers

Idiomatic way of extending keymap for inserting unicode symbols?

I often find myself needing to insert Unicode characters that have no default binding in iso-transl-ctl-x-8-map, i.e., characters that can't be inserted using C-x 8 followed by one or more letters/punctuation characters. To add bindings to the C-x 8…
itsjeyd
  • 14,586
  • 3
  • 58
  • 87
1
2 3
8 9