Questions tagged [character-encoding]

Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.

Use this tag when you know that you are dealing with characters or character sets that are represented differently.

A frequent issue is when a file (particularly one meant to be executed as a shell-script) is saved on a Microsoft Windows platform, then transferred to a Unix platform:

Which terminal encodings are default on Linux, and which are most common?

I need to make a decision regarding whether a complicated commercial program that I work on should assume a particular terminal encoding for Linux, or instead read it from the terminal (and if so, how). It's pretty easy to guess which system and…

character-encoding

asked Feb 02 '14 at 23:30

Alan

votes

1 answer

`^M` at the end of each line of text files generated under Windows

I was wondering why if you open a textfile made in Windows notepad under unix you will find that it has ^M where there should be a new line? My understanding is that in Windows, every line is ended with \r\n, ie 0x0D0A in ASCII, while ^M has ASCII…

character-encoding

asked Jul 30 '11 at 00:53

Tim

101,790

votes

2 answers

Unix character set conversion

I'm confused by character-sets in Unix. I have a CSV file downloaded via SFTP: $ file -ib myfile text/plain; charset=us-ascii The purpose for this character-set quest is that the data within file is seen like: Flyers:Â VideoÂ Center While I…

character-encoding

asked Dec 03 '14 at 10:39

Abhishek

votes

1 answer

Change Text File Encoding without knowning the source encoding

I want to change the charset encoding for a file in unix with a single command but since this will be an automated process it's impossible for me to know the source encoding. So I want a command that will change the encoding to UTF-8 for any source…

character-encoding

asked Nov 25 '14 at 21:37

user3393046

votes

1 answer

How can I convert full-width characters to half-width characters (and vice versa)?

Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here: $ iconv -l | grep -i full -> nothing $ iconv -l | grep -i…

character-encoding

asked Mar 01 '24 at 08:18

malat

3,032

votes

0 answers

how to guess and rename file from deleted rar has invalid encoding

i have many files and folders that extracted from many rar from jdownloader, and deleted after extract, some of it names are: ��[�ς݂��ς݂� - �Ȃ��ꂭ�� (invalid encoding) �R�e��} (invalid encoding) ??? - ? ??? ? ��΂ɂ�� (invalid…

character-encoding

asked Sep 08 '13 at 10:26

Kokizzu

9,699

votes

1 answer

How to fix the UTF-8 character encoded filenames which don't look good in sub directories

I have filenames like KÃ¤yttÃ¶ohje.pdf. This should be Käyttöohje.pdf. I can do the conversion of all files in sub directories with the command: convmv -f utf8 -t iso-8859-1 -r --notest * This converts KÃ¤yttÃ¶ohje.pdf to Käyttöohje.pdf. The…

character-encoding

asked Jan 11 '22 at 07:52

George

votes

1 answer

What encoding am I using? $LANG doesn't have an encoding

It seems like typically: echo $LANG results in something like this: en_US.UTF-8 What encoding is used when the result does not specify an encoding? echo $LANG en_US How do I figure out what the default encoding is? Using CentOS and Redhat…

character-encoding

asked Nov 12 '15 at 17:38

sixtyfootersdude

votes

0 answers

iconv cannot replace Ø

It appears that iconv cannot, for example, replace the letter Ø. It was also noted in the second answer to https://stackoverflow.com/questions/3371697/replacing-accented-characters-php I have two questions: Can I make iconv tell me which diacritics…

character-encoding

asked Dec 25 '14 at 18:29

MERose

votes

0 answers

Repairing mixed encoding

I got some files containing Finnish text with mixed encoding, something one would get by (echo Mäntysalo ; echo Mäntysalo | recode utf-8..iso-8859-1) > problem.txt. Is there a "right" way to correct files to one encoding from a command line? In…

character-encoding

asked Nov 15 '22 at 10:17

Jori Mäntysalo

votes

1 answer

How to write and read in a different encoding from the terminal?

I'm using terminator on debian, and its enconding is set to UTF-8. In most cases this isn't a problem since almost everything that's recent has the encoding set to utf-8 aswell. When I connect to a specific mysql database that has a different…

character-encoding

asked Apr 21 '20 at 09:59

wadge

votes

1 answer

Change from two different encodings to UTF-8

awk -F : '$1 ~ /[[:digit:]]+[[:alnum:]]*[[:digit:]]+/ && ($3>6200) {print $5" --- "$1" --- "$3;count++} END{print"\n----------\nSuma znalezionych rekordów:"count"\n----------\n"}' /etc/passwd|iconv -f ISO8859-2 -t UTF-8 So my problem is that when I…

character-encoding

asked Mar 28 '20 at 09:24

Martyna Michalska

votes

1 answer

How do I detect the character encoding of a text

I have a mySQL database that contains accented characters that are being displayed incorrectly by a HTML page. The problem is that I do not trust the encoding that the database is reporting for the tables, because the whole thing was migrated from…

character-encoding

asked May 02 '17 at 21:19

Duck

4,674

votes

1 answer

iconv can't convert circled numbers to/from Japanese encodings

The unicode range of circled digits (U+2460 .. U+2468) cannot be converted to, or from, any of the Japanese encodings (EUC-JP, Shift-JIS, ISO-2022-JP), even though they exist there, and I run across them all the time. % echo ①②③③④⑤⑥⑦⑧⑨ | iconv -f…

character-encoding

asked Apr 09 '15 at 11:40

oals

votes

2 answers

iconv cannot convert given characters

I just wanted to convert a txt file to UTF-8, since cat displays it correctly, but vi or gedit doesn't: $ cat test.txt ># >‹ | || ° ├── └── _ __ $ iconv -f WINDOWS-1253 -t UTF-8 test.txt ># >β€Ή | || Β° β”iconv: illegal input sequence at position…

character-encoding

asked Apr 22 '14 at 11:22

evachristine

2,613

2 Next