Questions tagged [character-encoding]

Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.

Use this tag when you know that you are dealing with characters or character sets that are represented differently.

A frequent issue is when a file (particularly one meant to be executed as a ) is saved on a Microsoft Windows platform, then transferred to a Unix platform:

Other useful questions on the site are:

For further explanation around character encodings, see the Wikipedia entry.

408 questions
13
votes
4 answers

Which terminal encodings are default on Linux, and which are most common?

I need to make a decision regarding whether a complicated commercial program that I work on should assume a particular terminal encoding for Linux, or instead read it from the terminal (and if so, how). It's pretty easy to guess which system and…
Alan
  • 241
12
votes
1 answer

`^M` at the end of each line of text files generated under Windows

I was wondering why if you open a textfile made in Windows notepad under unix you will find that it has ^M where there should be a new line? My understanding is that in Windows, every line is ended with \r\n, ie 0x0D0A in ASCII, while ^M has ASCII…
Tim
  • 101,790
4
votes
2 answers

Unix character set conversion

I'm confused by character-sets in Unix. I have a CSV file downloaded via SFTP: $ file -ib myfile text/plain; charset=us-ascii The purpose for this character-set quest is that the data within file is seen like: Flyers: Video Center While I…
4
votes
1 answer

Change Text File Encoding without knowning the source encoding

I want to change the charset encoding for a file in unix with a single command but since this will be an automated process it's impossible for me to know the source encoding. So I want a command that will change the encoding to UTF-8 for any source…
3
votes
1 answer

How can I convert full-width characters to half-width characters (and vice versa)?

Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here: $ iconv -l | grep -i full -> nothing $ iconv -l | grep -i…
malat
  • 3,032
2
votes
0 answers

how to guess and rename file from deleted rar has invalid encoding

i have many files and folders that extracted from many rar from jdownloader, and deleted after extract, some of it names are: �������[�ς݂��ς݂� - �Ȃ��ꂭ������ (invalid encoding) �R�e���} (invalid encoding) ??? - ? ??? ? ���΂ɂ����� (invalid…
Kokizzu
  • 9,699
2
votes
1 answer

How to fix the UTF-8 character encoded filenames which don't look good in sub directories

I have filenames like Käyttöohje.pdf. This should be Käyttöohje.pdf. I can do the conversion of all files in sub directories with the command: convmv -f utf8 -t iso-8859-1 -r --notest * This converts Käyttöohje.pdf to Käyttöohje.pdf. The…
George
  • 23
2
votes
1 answer

What encoding am I using? $LANG doesn't have an encoding

It seems like typically: echo $LANG results in something like this: en_US.UTF-8 What encoding is used when the result does not specify an encoding? echo $LANG en_US How do I figure out what the default encoding is? Using CentOS and Redhat…
2
votes
0 answers

iconv cannot replace Ø

It appears that iconv cannot, for example, replace the letter Ø. It was also noted in the second answer to https://stackoverflow.com/questions/3371697/replacing-accented-characters-php I have two questions: Can I make iconv tell me which diacritics…
MERose
  • 527
  • 1
  • 10
  • 24
0
votes
0 answers

Repairing mixed encoding

I got some files containing Finnish text with mixed encoding, something one would get by (echo Mäntysalo ; echo Mäntysalo | recode utf-8..iso-8859-1) > problem.txt. Is there a "right" way to correct files to one encoding from a command line? In…
0
votes
1 answer

How to write and read in a different encoding from the terminal?

I'm using terminator on debian, and its enconding is set to UTF-8. In most cases this isn't a problem since almost everything that's recent has the encoding set to utf-8 aswell. When I connect to a specific mysql database that has a different…
wadge
  • 121
0
votes
1 answer

Change from two different encodings to UTF-8

awk -F : '$1 ~ /[[:digit:]]+[[:alnum:]]*[[:digit:]]+/ && ($3>6200) {print $5" --- "$1" --- "$3;count++} END{print"\n----------\nSuma znalezionych rekordów:"count"\n----------\n"}' /etc/passwd|iconv -f ISO8859-2 -t UTF-8 So my problem is that when I…
0
votes
1 answer

How do I detect the character encoding of a text

I have a mySQL database that contains accented characters that are being displayed incorrectly by a HTML page. The problem is that I do not trust the encoding that the database is reporting for the tables, because the whole thing was migrated from…
Duck
  • 4,674
0
votes
1 answer

iconv can't convert circled numbers to/from Japanese encodings

The unicode range of circled digits (U+2460 .. U+2468) cannot be converted to, or from, any of the Japanese encodings (EUC-JP, Shift-JIS, ISO-2022-JP), even though they exist there, and I run across them all the time. % echo ①②③③④⑤⑥⑦⑧⑨ | iconv -f…
oals
  • 371
0
votes
2 answers

iconv cannot convert given characters

I just wanted to convert a txt file to UTF-8, since cat displays it correctly, but vi or gedit doesn't: $ cat test.txt ># >‹ | || ° ├── └── _ __ $ iconv -f WINDOWS-1253 -t UTF-8 test.txt ># >β€Ή | || Β° β”iconv: illegal input sequence at position…
evachristine
  • 2,613
1
2