Try the unicode utility:
$ unicode ‽
U+203D INTERROBANG
UTF-8: e2 80 bd UTF-16BE: 203d Decimal: ‽
‽
Category: Po (Punctuation, Other)
Bidi: ON (Other Neutrals)
Or the uconv
utility from the ICU package:
$ printf %s ‽ | uconv -x any-name
\N{INTERROBANG}
You can also get information via the recode
utility:
$ printf %s ‽ | recode ..dump
UCS2 Mne Description
203D point exclarrogatif
Or with Perl:
$ printf %s ‽ | perl -CLS -Mcharnames=:full -lne 'print charnames::viacode(ord) for /./g'
INTERROBANG
Note that those give information on the characters that make-up that glyph, not on the glyph as a whole. For instance, for é
(e with combining acute accent):
$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E}\N{COMBINING ACUTE ACCENT}
Different from the standalone é character:
$ printf é | uconv -x any-name
\N{LATIN SMALL LETTER E WITH ACUTE}
You can ask uconv
to recombine those (for those that have a combined form):
$ printf 'e\u0301b\u0301' | uconv -x '::nfc;::name;'
\N{LATIN SMALL LETTER E WITH ACUTE}\N{LATIN SMALL LETTER B}\N{COMBINING ACUTE ACCENT}
(é has a combined form, but not b́).
unicode
? I don't appear to have that installed (and can't find it in the Arch Linux repos). Also, what on earth isexclarrogatif
? [EDIT: I get that here too, although my system is not French.] – Sparhawk Apr 27 '15 at 12:13exclamatif
andinterrogatif
.recode
was written by a French-Canadian guy in the early 80s. – Stéphane Chazelas Apr 27 '15 at 12:17L
need inCLS
? It makes the answer wrong if something likeLC_ALL
was set to non UTF8 locale. – cuonglm Apr 27 '15 at 12:20LC_ALL=fr_FR.iso885915@euro
and you enterecho é | perl...
, that é will be written as 0xe9, not UTF-8 encoding. And you want perl to tell you about that é, not about UTF-8 characters that won't be found in the input since the locale doesn't use that charset. Tryprintf '\xe9' | _ALL=fr_FR.iso885915@euro perl...
for instance. – Stéphane Chazelas Apr 27 '15 at 12:24‽
, he wantINTERROBANG
instead ofLATIN SMALL LETTER A WITH CIRCUMFLEX
(when set LC_ALL=C). – cuonglm Apr 27 '15 at 12:33LC_ALL=fr_FR.iso885915@euro
and he pastesé
(0xe9 in that locale),-CS
would give him the wrong answer. – Stéphane Chazelas Apr 27 '15 at 13:03-CLS
doesn't always give the right answer as I shown in my above comment, right? Do we have any work around? – cuonglm Apr 27 '15 at 13:05unicode
package on Debian, no idea about packaging on Arch. – Gilles 'SO- stop being evil' Apr 27 '15 at 13:31%s
represent in someprintf
statements? – Sparhawk Apr 28 '15 at 00:51%s
is like a placeholder, called a format specifier (or conversion specifier). printf will replace it with the succeeding arguments, treating it as a string (as opposed to a number, for example) (generally how you would expect with C'sprintf()
function). See the docs (http://pubs.opengroup.org/onlinepubs/9699919799//basedefs/V1_chap05.html). – muru Apr 28 '15 at 05:57