what is the difference between xim and uim?

Question

I found (while facing this problem) two word related to input-method which are xim & uim. I know the name only: X Input Method and Universal Input Method.

I want to know What is the difference in use, functionality and working between xim and uim?

score 5 · Answer 1 · edited Jun 11 '20 at 14:16

The biggest difference is that most input systems are implemented server-client-wise, uim is just a library.

Most users don't need an input method system at all or only need simple, table-based converters. Such users don't require or are unwilling to install a complex input method system, so we want to keep uim simple.

See the Official Github Page for further clarifications.

Uim is an input method module library which supports various scripts and can act as a front end for a range of input methods, including anthy, canna, prime, or skk (for japanese), pinyin (for chinese), byeoru (for korean), and m17n (for many other languages). most of its functions are implemented in scheme, so it's very simple and flexible.Source

Now what about XIM? XIM is a pretty obsolete input method protocol which both ibus and fcitx implement for legacy support reasons only. There is no real reason why you would want to use XIM nowadays over any of those two. The only reason why you would want to set GTK_IM_MODULE="xim" is to override GTK's hardcoded ComposeKey settings. Source

score 4 · Answer 2 · answered Jul 26 '20 at 07:33

There are many different ways to answer the question. Here is my incomplete, biased answer.

TL;DR xim is old, outdated, and not suited for a multi-lingual world using Unicode for text interchange. uim was build to address all those limitations and is the way forward for multilingual keyboard input.

We will start our story with the ASCII character set. In the 1970's and before ASCII was of 2 ways to encode a range of characters into 8 bits, the other contender being EBCDIC. In my mind, 2 things lead to ASCII winning out over EBCDIC. The first was that ASCII only needed 7 bits, allowing for the 8th bit to be use for parity. Back when data connections were over modems that transmitted 300 bits per second and frequently had errors, achieving 12.5% compression was significant, along with providing error detection and error correction. EBCDIC used all 8 bits. The other thing was that EBCDIC was designed and promoted by IBM, the big bad monopoly of the day (many say that Windows did not earn their near-monopoly position, they were simply given IBM's to get IBM out of hot water for their own monopolistic activities.), while ASCII came out of teletypes, the AT&T has a lot to do with, and AT&T also was not friendly with IBM and invented Unix.

For whatever reason, ASCII won on the internet, Unix, and eventually Windows and Apple computers. However, as these tools spread, there started to be a need to include more than the standard Roman characters. The ethnocentric but practical solution that gained traction at the time was to use the 128 character space left open by ASCII (256 characters being representable by 8 bits, or a byte, a natural unit of computer processing) to add non-English variants of the Roman alphabet, such as à, ç, ñ, Ö, and so forth. This was good for the time, but provided serious headaches for Unicode to come.

Anyway, XIM is old, in some ways the original input method for X11, and was built in this zeitgeist where the assumption was that "characters" were represented by 8-bits plus a character encoding table. (The character encoding table would allow for different interpretations of the 128 non-ASCII character codes. Famously Windows used Windows-1252 while Apple used Mac Roman. The ISO tried to standardize things with ISO-8859-1 but that did not work out, which lead to Unicode and UTF-8, but still, getting ahead of ourselves.)

Over the years since then, the Unicode consortium has done a remarkable job of designing a character encoding system that covers all the graphemes of all the languages in the world (plus some fictional ones) while still maintaining backward compatibility with decades of legacy code that expected one character per byte. Like xim. Because ASCII was written for English and English still dominates in computing, a lot of people just kept on using the old tools, so you still find them around today.

Adding support for Unicode is hard, because languages are complex and varied and have zillions of rules. Some go left-to-right, some go right-to-left, some go both ways, and some even got top-to-bottom. Then you have extra marks that go on characters, like ¨ which in German is an umlaut as in ü but in Dutch is a trema as in ë. So in German, how do you sort u vs ü? Does the same rule apply in Dutch with e vs ë? and it goes on and on. So there are lots of tools with varying support for these kinds of things.

Most notably for xim vs uim, characters that made the early cut, like î and ñ, are represented by single Unicode "code points", which is what most software considers a character these days. If you look up ñ in Unicode you will see it is a code point called "LATIN SMALL LETTER N WITH TILDE". But at some point, Unicode realized there are far too many combinations of letters and marks to make single code points of them all. So they came up with the idea of "combining diacritical marks". This was a way to combine a letter with a mark to come up with a marked letter. What a human would consider a character, and Unicode calls a grapheme, can now be made up letter followed by one or more combining marks. This gives us a way to create a wide variety of graphemes, as famously abused at the end of this post. There are still problems, as now you can type ñ (U+00F1) or ñ (U+006E U+0303) and they look the same and mean the same thing but computers treat them differently. But at least now you can type G̃ or g̃ without having to petition the Unicode consortium to add another pair of characters.

What this does, though, is break the underlying assumption upon which xim and much of the related code was built on, that a single character was the same as a single code point. X11 and xim were able to adapt to ~n outputting a single character ñ, but that is as far as they could go without a major rewrite. uim is that rewrite, built in a Unicode world to handle all of the Unicode complexity.

To use uim instead of xim, it can be as easy as adding the following to your profile, xprofile, or xinitrc:

export GTK_IM_MODULE=uim
export QT_IM_MODULE=uim
uim-xim &
export XMODIFIERS=@im=uim

It varies somewhat from system to system, but you can find more complete instructions tailored to your specifics via Google.

I always like knowing the history :) – Pandya Jul 26 '20 at 07:40 — Pandya, Jul 26 '20 at 07:40

what is the difference between xim and uim?

2 Answers2

Linked