1

I'm looking for a simple, generic way to input arbitrary unicode characters in a text document on the terminal(e.g. in a terminal editor). A basic method I can imagine is having a simple text(utf-8) file containing two columns, the character's name or description, and the character itself. Then I can have a simple script to lookup a character through this file, e.g. using dmenu or similar.

Are there similar methods available that would take care of this for me? Otherwise where can I find such mappings(names, utf-8 value) for common unicode characters(e.g. smileys, greek characters, mathematical symbols)?

This seems like such a basic common need yet I'm having difficulty finding simple solutions readily available.

Thanks!

  • 1
    A number of these specific subsets of Unicode can be addressed by enabling Compose Key functionality. For example " g a" gets me "α" – Chris Davies Dec 09 '22 at 08:20

2 Answers2

5

If using screen (tmux probably has similar capabilities), you could build a table of characters in:

 U+1F9D9 mage
 U+1F9DA fairy
 U+1F9DB vampire
 U+1F9DC merperson
 U+1F9DD elf
 U+1F9DE genie
 U+1F9DF zombie
 U+1F9E0 brain

format with something like:

perl -Mcharnames=full -C -e '
   for $i (0xa0 .. 0xd7ff, 0xe000 .. 0x10ffff) {
     printf "%c U+%04X %s\n", $i, $i, lc charnames::viacode $i
   }' | zstd > ~/.cache/all-chars.zst

and add:

altscreen on
bindkey \33u exec .!. sh -c "zstdcat ~/.cache/all-chars.zst|fzf|awk '{printf \"%s\",substr(\$0,1,1)}'"

(assuming a multibyte aware awk, so not mawk).

to your ~/.screenrc.

Then press Alt + u within screen to bring up a fzf dialog to look the character up.

Screencast of screen solution

It doesn't work so well though when used within an application that uses the alternate screen which would be the case of most terminal text editors, as then fzf which also uses the alternate screen clobbers it and then switches away from it to the normal screen.

Instead of using exec which hijacks the current screen window, you could start fzf in a separate screen window and get the result stuffed into the current one. You could even split the screen to show that fzf window in addition to the current one:

altscreen on
bindkey \33u eval 'split -v' focus "screen sh -c 'screen -X eval focus \"stuff $(zstdcat ~/.cache/all-chars.zst|fzf|sed s/.//2g)\" only'"

(altscreen not necessary in that case).

screencast of solution using split screen

If in a X11 environment (using a X11 terminal emulator), you could configure ibus-typing-booster as an alternative input method which can also be configured to look-up Unicode characters by name.

Screencast using ibus-typing-booster

0

So I got to understand that I can get latest versions of Unicode Character Database here: https://www.unicode.org/Public/UCD/latest/ucd/. Then I thought about using python to generate such a mapping.

I can iterate through arbitrary codepoints and print their name(if they have one):

import unicodedata as ucd
for n in range(0x0000, 0xffff):
    try:
        print(chr(n), ucd.name(chr(n)))
    except: pass

So this goes towards building lookup tables for codepoint ranges I'm interested in.