1

I am writing a console program in C.

I expect the Terminal that my program is running in to have its character encoding set to UTF-8. This means that I am sending UTF-8 encoded strings to the Terminal, and expecting to receive UTF-8 encoded strings from the Terminal.

But if the Terminal was set to another character encoding (other than UTF-8) while my program is running, then my program will stop working as expected.

So is there a way to know what character encoding the Terminal is set to from within my program (so that I can change my program behavior accordingly)? And even if there is such a way, should I even bother making my program work with multiple character encodings, or is it enough to only make it work with UTF-8?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

-3

UTF-8 has several pitfalls and for this reason is not the typical encoding in central Europe.

Writing programs that assume UTF-8 is bad practice as you may not be able to even know where a "character" ends in the byte stream.

A decent program calls:

setlocale(LC_ALL, "")

at startup and later uses functions like:

mbtowc(&wc, input, amt)

to convert multibyte input read from stdin or files.

It then processes the data as wide characters and converts it back to multibyte data via:

wctomc(output, wc)

then the output is printed to e.g. stdout.

schily
  • 19,173
  • UTF-8 is the only sensible external encoding for Unicode text. Your answer does not consider how to choose between different encodings, and thus does not answer the question at all. – Johan Myréen Jun 06 '18 at 16:58
  • You are mistaken. Unicode causes problems that people did not expect. Many people for this reason use ISO-8859-1. The question does not ask how to set up a different encode, just how to deal with different encodings. So my answer is a good starter for further reading, by e.g. using the man program on the mentioned interfaces. – schily Jun 06 '18 at 17:03