3

Is there a way to get a character by name (e.g. GREEK SMALL LETTER LAMBDA) instead of by code point (\uX3BB)?

Basil
  • 12,019
  • 43
  • 69

2 Answers2

4

You can use insert-char for interactive usage and

(cdr (assoc-string INPUT (ucs-names) t))

for usage in elisp programs. Thereby, INPUT is the character name string, e.g.,

(setq INPUT "GREEK SMALL LETTER LAMBDA").

Please, see the doc strings of insert-char, ucs-names, and assoc-string for more information.


Remark:

I am using org-entities to input non-ascii characters. Many translations of LaTeX-sequences into utf8 are already implemented there. I added some more that I often need. Then I wrote a command latex-to-utf8 that translates a LaTeX control sequence before point into the corresponding utf8-character. I bound this command to f8.

(defvar latex-to-utf8-scripts)
(setq latex-to-utf8-scripts
      '((?+ ?⁺ ?₊)
    (?- ?⁻ ?₋)
    (?0 ?⁰ ?₀)
    (?1 ?¹ ?₁)
    (?2 ?² ?₂)
    (?3 ?³ ?₃)
    (?4 ?⁴ ?₄)
    (?5 ?⁵ ?₅)
    (?6 ?⁶ ?₆)
    (?7 ?⁷ ?₇)
    (?8 ?⁸ ?₈)
    (?9 ?⁹ ?₉)))

(require 'org-entities)
(mapc (lambda (x) (add-to-list 'org-entities-user x)) '(("bull" "\\bullet" nil "•" "*" "*" "•")
                            ("plusminus" "\\pm" t "±" "+-" "+-" "±")
                            ("sqrt" "\\sqrt" t "√" "sqrt" "sqrt" "√")
                            ("lessless" "\\ll" t "&LessLess;" "<<" "<<" "⪡")
                            ("lesseq" "\\leq" t "&lesseq;" "<=" "<=" "≤")
                            ("greatereq" "\\geq" t "&greatereq;" ">=" ">=" "≥")
                            ("nbsp" "\\medspace" t "&nbsp;" " " " " " ")
                            ("greatergreater" "\\gg" t "&GreaterGreater;" ">>" ">>" "⪢")
                            ("neg" "\\neg" t "&neg;" "/" "/" "¬")
                            ("mapsto" "\\mapsto" t "&mapsto;" "|->"  "|->" "↦")
                            ("times" "\\times" t "&times;" "x" "x" "×")))



(defun latex-to-utf8 ()
  "Replace latex symbol before point by corresponding utf8 symbol."
  (interactive)
  (cond
   ((looking-back "\\\\[[:alpha:]]+" (line-beginning-position))
    (let* ((latex-sym (match-string 0))
       (utf8-sym (loop for org-sym in (append org-entities-user org-entities)
               if (and (listp org-sym) (string-equal latex-sym (nth 1 org-sym))) return (nth 6 org-sym))))
      (if utf8-sym
      (replace-match utf8-sym))))
   ((looking-back "\\([_^]\\)\\([+-]\\|[+-]?[0-9]+\\)" (line-beginning-position))
    (let ((scr_idx (if (string-equal (match-string-no-properties 1) "^") 1 2))
      (digits (string-to-list (match-string 2))))
      (replace-match "")
      (loop for digit in digits do
        (insert (nth scr_idx (assoc digit latex-to-utf8-scripts)))
        )))))

(global-set-key [f8] 'latex-to-utf8)
Tobias
  • 32,569
  • 1
  • 34
  • 75
  • `insert-char`is bound to `C-x 8 RET`, by the way. – asjo Jun 11 '16 at 22:44
  • For usage in elisp programs, I'd recommend just using `?λ`. After all, with `insert-char` you can easily insert the λ. BTW, I use `insert-char` so much I have made a shorter, easier to type keybinding for it. I find `C-x 8 RET` really awkward to type. – Harald Hanche-Olsen Jun 12 '16 at 07:57
  • Regarding the remark : you should give Tex input method a try. – YoungFrog Jun 14 '16 at 05:09
1

The question isn't very clear (Is the intended application an interactive command which prompts the user? Is it a non-interactive Elisp program? Do we know the character's name ahead of time? Will it be given as a string? Etc.), but here are some additional ways of mapping character names to code points in Emacs 26.

Emacs 26.1 introduced both the function char-from-name:

(char-from-name "GREEK SMALL LETTER LAMDA")      ; => ?λ
(char-from-name "Greek small letter Lamda" t)    ; => ?λ

and character name escape sequences in literals:

?\N{GREEK SMALL LETTER LAMDA}                    ; => ?λ
?\N{Greek small letter Lamda}                    ; => ?λ
"Lamda: \N{GREEK SMALL LETTER LAMDA}"            ; => "Lamda: λ"
"Lamda: \N{Greek small letter Lamda}"            ; => "Lamda: λ"

These two features are documented under (elisp) Character Codes and (elisp) General Escape Syntax, respectively.

Note that (ucs-names), as mentioned in Tobias' answer, is no longer an alist in Emacs 26, but rather a hash table, so it is now accessed as:

(gethash "GREEK SMALL LETTER LAMDA" (ucs-names)) ; => ?λ

But you're better off using char-from-name instead.

Basil
  • 12,019
  • 43
  • 69