How can I find the common name for a particular glyph?

Question

Sometimes, I'd like to know the name of a glyph. For example, if I see −, I may want to know if it's a hyphen -, an en-dash –, an em-dash —, or a minus symbol −. Is there a way that I can copy-paste this into a terminal to see what it is?

I am unsure if my system knows the common names to these glyphs, but there is certainly some (partial) information available, such as in /usr/share/X11/locale/en_US.UTF-8/Compose. For example,

<Multi_key> <exclam> <question>         : "‽"   U203D # INTERROBANG

Another example glyph: .

score 30 · Accepted Answer · edited Apr 27 '15 at 13:30

30

Try the unicode utility:

$ unicode ‽
U+203D INTERROBANG
UTF-8: e2 80 bd  UTF-16BE: 203d  Decimal: &#8253;
‽
Category: Po (Punctuation, Other)
Bidi: ON (Other Neutrals)

Or the uconv utility from the ICU package:

$ printf %s ‽ | uconv -x any-name
\N{INTERROBANG}

You can also get information via the recode utility:

$ printf %s ‽ | recode ..dump
UCS2   Mne   Description

203D         point exclarrogatif

Or with Perl:

$ printf %s ‽ | perl -CLS -Mcharnames=:full -lne 'print charnames::viacode(ord) for /./g'
INTERROBANG

Note that those give information on the characters that make-up that glyph, not on the glyph as a whole. For instance, for é (e with combining acute accent):

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}

Different from the standalone é character:

$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E WITH ACUTE}

You can ask uconv to recombine those (for those that have a combined form):

$ printf 'e\u0301b\u0301' | uconv -x '::nfc;::name;'
\N{LATIN SMALL LETTER E WITH ACUTE}\N{LATIN SMALL LETTER B}\N{COMBINING ACUTE ACCENT}

(é has a combined form, but not b́).

edited Apr 27 '15 at 13:30

Gilles 'SO- stop being evil'

829,060

answered Apr 27 '15 at 12:08

Stéphane Chazelas

544,893

What is unicode? I don't appear to have that installed (and can't find it in the Arch Linux repos). Also, what on earth is exclarrogatif? [EDIT: I get that here too, although my system is not French.] – Sparhawk Apr 27 '15 at 12:13
2

@Sparhawk, contraction of exclamatif and interrogatif. recode was written by a French-Canadian guy in the early 80s. – Stéphane Chazelas Apr 27 '15 at 12:17
@StéphaneChazelas: Does L need in CLS? It makes the answer wrong if something like LC_ALL was set to non UTF8 locale. – cuonglm Apr 27 '15 at 12:20
@cuonglm, if you have LC_ALL=C, you have no business entering characters other than ASCII ones. If your locale is for instance LC_ALL=fr_FR.iso885915@euro and you enter echo é | perl..., that é will be written as 0xe9, not UTF-8 encoding. And you want perl to tell you about that é, not about UTF-8 characters that won't be found in the input since the locale doesn't use that charset. Try printf '\xe9' | _ALL=fr_FR.iso885915@euro perl... for instance. – Stéphane Chazelas Apr 27 '15 at 12:24
@StéphaneChazelas: I think in the OP's question, he know the symbol, he wants symbol to name. Just copy the symbol to terminal and get the name back. When pasting ‽, he want INTERROBANG instead of LATIN SMALL LETTER A WITH CIRCUMFLEX (when set LC_ALL=C). – cuonglm Apr 27 '15 at 12:33
@cuonglm, and what I'm saying is that if his locale is LC_ALL=fr_FR.iso885915@euro and he pastes é (0xe9 in that locale), -CS would give him the wrong answer. – Stéphane Chazelas Apr 27 '15 at 13:03
Ah, got it. But -CLS doesn't always give the right answer as I shown in my above comment, right? Do we have any work around? – cuonglm Apr 27 '15 at 13:05
@cuonglm, -CLS with LC_ALL=C should give you the right answer for all the valid characters in the C locale. é and ‽ are usually not present in the C locale, there's no way you can express them there. – Stéphane Chazelas Apr 27 '15 at 13:09
2

@Sparhawk http://kassiopeia.juls.savba.sk/~garabik/software/unicode/ — available as the unicode package on Debian, no idea about packaging on Arch. – Gilles 'SO- stop being evil' Apr 27 '15 at 13:31
Why printf instead of simply echo in the first some examples? – Paŭlo Ebermann Apr 27 '15 at 22:21
1

@PaŭloEbermann Why is printf better than echo?. Now that you asked, you're expected to read the whole answer. There will be a test. – terdon Apr 27 '15 at 22:44
@terdon thanks for the link, I did read it all. – Paŭlo Ebermann Apr 27 '15 at 23:29
Slightly off-topic, but @StéphaneChazelas, what does the %s represent in some printf statements? – Sparhawk Apr 28 '15 at 00:51
1

@Sparhawk %s is like a placeholder, called a format specifier (or conversion specifier). printf will replace it with the succeeding arguments, treating it as a string (as opposed to a number, for example) (generally how you would expect with C's printf() function). See the docs (http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/V1_chap05.html). – muru Apr 28 '15 at 05:57
@StéphaneChazelas: Well, perl6 seems to be better in this case. It detect the Unicode characters by its graphemes, its codepoints, its encoding's code units, or the bytes that make up the encoding. – cuonglm Dec 07 '15 at 03:59

cuonglm · Answer 2 · 2015-12-07T04:39:30.273

You can use Perl viacode function from charnames module:

$ printf ‽ | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
INTERROBANG
$ printf  | perl -Mcharnames=:full -CLS -nle 'print charnames::viacode(ord)'
COW

charnames was first released with perl v5.6.0

With Perl 6 will be production ready on this Christmas day, it's worth to mention it here, since when it has the best support for Unicode characters I have ever seen. You only need to call uniname method/routine:

$ printf ‽ | perl6 -ne 'say .uniname'
INTERROBANG

é (e with combining acute accent) and standalone é character both give you:

# e with combining acute accent
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

# standalone é
$ printf é | perl6 -ne 'say .uniname'
LATIN SMALL LETTER E WITH ACUTE

(.uniname is the shorthand for $_.uniname)

score 5 · Answer 3 · answered Apr 27 '15 at 12:01

The best way I know is through Perl's uniprops. It comes with Perl's Unicode::Tussle module. You can install it with

sudo perl -MCPAN -e 'install Unicode::Tussle'

You can then run it on any glyph you want to test:

$ uniprops  ‽
U+203D ‹‽› \N{INTERROBANG}
    \pP \p{Po}
    All Any Assigned InPunctuation Punct Is_Punctuation Common Zyyy Po P
       General_Punctuation Gr_Base Grapheme_Base Graph GrBase Other_Punctuation
       Pat_Syn Pattern_Syntax PatSyn Print Punctuation STerm Term
       Terminal_Punctuation Unicode X_POSIX_Graph X_POSIX_Print X_POSIX_Punct

$ uniprops  
U+1F404 ‹› \N{COW}
    \pS \p{So}
    All Any Assigned InMiscPictographs Common Zyyy So S Gr_Base Grapheme_Base Graph
       GrBase Misc_Pictographs Miscellaneous_Symbols_And_Pictographs Other_Symbol
       Print Symbol Unicode X_POSIX_Graph X_POSIX_Print

@cuonglm yes, but the Tussle module includes all sorts of fancy tools and uniprops is far, far easier to type than explicitly calling the module. It also provides more info than just the name. — terdon, Apr 27 '15 at 12:12

score 4 · Answer 4 · answered Apr 27 '15 at 12:10

4

You can use unicode, which also outputs some more information than just the name:

# unicode –
U+2013 EN DASH
UTF-8: e2 80 93  UTF-16BE: 2013  Decimal: &#8211;
–
Category: Pd (Punctuation, Dash)
Bidi: ON (Other Neutrals)

answered Apr 27 '15 at 12:10

Marco

33,548

What is unicode? I don't appear to have that installed (and can't find it in the Arch Linux repos). – Sparhawk Apr 27 '15 at 12:14
3

@Sparhawk on my Debian, it's just a Python script installed by the unicode package. You should be able to get it by downloading the source package from the Debian repos. – terdon Apr 27 '15 at 12:19

score 1 · Answer 5 · edited Apr 27 '15 at 12:04

1

Create a bash script with this:

#!/bin/bash
awk -F ":" '{print $2}' /usr/share/X11/locale/en_US.UTF-8/Compose | grep "$1" | awk -F "#" '{print $2}'

Name it as you want, for example, namechar and give it executing permissions.

Now, you can call for example:

./namechar @

and the result will be:

COMMERCIAL AT

edited Apr 27 '15 at 12:04

terdon

242,166

answered Apr 27 '15 at 12:02

jcbermu

4,736
18
26

This is good but only matches a susbset of characters, not full unicode. For example, it fails on ``, and produces repeated results for €. The last could be fixed by piping through | sort -u. – terdon Apr 27 '15 at 12:08
Yes, @terdon is correct. (That's why I said "partial" in the question.) This file only contains glyphs mapped to the Compose key. – Sparhawk Apr 27 '15 at 12:15

How can I find the common name for a particular glyph?

5 Answers5

Linked