How encode to UTF-8 content of buffer?

Question

Windows 10, Emacs 25.1

When I open file that not encoding in UTF-8 I get the next incorrect result:

I try by command set-buffe-file-coding-system but it not help.

And here result:

As you can see this command not help.

To fix this I need to encoding content to UTF-8.

Note: If I open this file by Notepad++ than it show correct:

Here test case when I apply function "find-file-with-coding-system" with KOI8-R of @Tobias, but it not help.

Also I try by emacs-Q, but result same.

Here base64-encode-region part of my file:

MQowMDowMDo0MCwzODMgLS0+IDAwOjAwOjQzLDkyOArR5eLl8Cwg5+Dv4OQsIP7jLCDk7urr4OTi
4Ony5S4KCjIKMDA6MDA6NTAsMzkzIC0tPiAwMDowMDo1Miw1MjcKyuDq4u4g6PHq4Pgg5OAg6uDm
5fg/CgozCjAwOjAwOjUyLDY4NyAtLT4gMDA6MDA6NTYsMTU1Csrg6uLu8u4g6uDn4PUuCgo0CjAw
OjAwOjU2LDMxNSAtLT4gMDA6MDE6MDAsMjg1Cs3g5P/i4Pgg8eUg5OAg7vbl6+XlLgoKNQowMDow
MTowMCw0NDUgLS0+IDAwOjAxOjAzLDE2MgrS7uLgIO3lIOUg5O7x8uDy+vft7i4KCjYKMDA6MDE6
MDMsMzIyIC0tPiAwMDowMTowNSw2MjMKwOruIPLoIOTw5ezl+OUuLi4KLSDK4PDrLg==

How I can do this?

Your problem is actually related to the *decoding* of the file at `find-file` and not to encoding. — Tobias, Dec 12 '17 at 15:26

Tobias · Accepted Answer · 2017-12-13T23:44:04.413

Emacs decodes a text file with an appropriate coding system when you open it with find-file. If you have to tell emacs for some reason what coding system it should use for decoding the file you should do so before you open the file.

You can set the coding system for the next command with Options -> Multilingual Environment -> Set Coding Systems -> For Next Command. That next command can be find-file for your file subtitle.srt.

The file is then read with the coding system of your choice.

If you often need to open files with specified coding system you can also use the following command. Paste the code into your init file and restart emacs. Afterwards you can use the new menu item Find File With Coding System in the File menu.

(defun find-file-with-coding-system (file coding-system)
  "Find FILE with CODING-SYSTEM."
  (interactive (list
                (read-file-name "File Name:")
                (read-coding-system "Coding System:")))
  (let (buf (coding-system-for-read coding-system))
    (if (setq buf
              (catch :exit
                (while (setq buf (find-buffer-visiting file))
                  (if (y-or-n-p (format "Kill buffer \"%s\" visiting file \"%s\"?" buf file))
                      (kill-buffer buf)
                    (throw :exit buf)))))
        (error "Buffer \"%s\" is still visiting \"%s\". In that case `find-file-with-coding-system' does not work as expected." buf file)
      (find-file file))))

(require 'easymenu)
(easy-menu-add-item menu-bar-file-menu nil ["Find File With Coding System" find-file-with-coding-system t] "Filesets")

Note that normally the coding system of the file is automatically detected. For newly created buffers and for buffers where the coding system cannot be uniquely identified the option default-buffer-file-coding-system determines which coding system is preferred.

As far as I know there is no predefined function changing the buffer file coding-system for decoding without re-reading the file.

That is what you actually asked me in one of your comments:

And the last question: Is it possible to correct decode file AFTER open?

Maybe, that function is not provided since the decoding is not necessarily injective and the translation of the coding system can fail.

Nevertheless, the next lisp function tries to change the coding system for decoding without re-reading the file.

(defun translate-buffer-encoding (target-encoding &optional source-encoding)
  "Translate buffer encoding from SOURCE-ENCODING to TARGET-ENCODING."
  (interactive (list (read-non-nil-coding-system "Target encoding:")))
  (unless source-encoding
    (setq source-encoding buffer-file-coding-system))
  (encode-coding-region (point-min)
            (point-max)
            source-encoding)
  (decode-coding-region (point-min)
            (point-max)
            target-encoding)
  (set-buffer-file-coding-system target-encoding))

@Alex The text about `default-buffer-file-coding-system` was only a remark about the normal way when nothing goes wrong. But the coding system selection goes wrong for you. You have to kill the buffer, set the coding system for the next command as I described it at the beginning of my answer and then call `find-file` for the file `subtitle.srt`. By the way it is strange that the automatic coding system selection goes wrong for you. That indicates that something is wrong with your file. — Tobias, Dec 12 '17 at 10:40
@Alex I've added a function which you can paste into your init file. It provides you with a new _File_ menu item _Find File With Coding System_ that should be fail-safe. If that also does not work your problem is probably not really encoding-related. At least not on the side of emacs. — Tobias, Dec 12 '17 at 14:45
@Alex I've added a test for buffers already visiting the file. The user is given a chance to kill those buffers. If he does not `find-file` just returns such a buffer without decoding the file. That potential source of error is now signaled. — Tobias, Dec 12 '17 at 15:29
@Alex That means the file **is** read with `utf-8` decoding. So your problem is probably not the decoding on emacs side. — Tobias, Dec 12 '17 at 17:44
@Alex Also note that you can read the raw bytes by `find-file-literally`. — Tobias, Dec 12 '17 at 18:04
I update my post. If I open file by Notedpad++ than file show correct. But if by Emacs than not correct. — Alex, Dec 12 '17 at 18:28

Tobias · Answer 2 · 2017-12-13T16:12:22.800

3

Your problem has nothing to do with utf-8. Your screenshot of Notepad++ indicates Windows-1251 encoding for that file as you can see at the right end of the status line of Notepad++. Therefore you have to select windows-1251 and not utf-8 as coding system. You can use my other answer for a first try with windows-1251 encoding.

Note, that 8-bit encodings can seldom be differentiated. So emacs has (almost) no chance to detect windows-1251 without further information. (I used the word almost here since there would always be a text analysis as a last resort.)

It seems that Notepad++ is setup for a Russian language environment. If you want to permanently setup your emacs for Windows-1252 you can consult the page on emacs Russification at emacswiki.

I cite here the most important settings for your init file.

(set-language-environment 'Cyrillic-KOI8)
(setq default-buffer-file-coding-system 'koi8-r)
(prefer-coding-system 'koi8-r)

(setq-default coding-system-for-read 'koi8-r)
;; (setq-default coding-system-for-write 'koi8-r) ;; Maybe not!
(codepage-setup 1251)
(define-coding-system-alias 'windows-1251 'cp1251)

edited Dec 13 '17 at 16:12

answered Dec 12 '17 at 21:32

Tobias

32,569
1
34
75

I try all `koi8` encoding, but nothing help. I try `koi8-r`, `cyrillic-koi8` and so on. Also I can't set default coding as `koi8-r` because in my init.el I use `(set-language-environment "UTF-8")` – Alex Dec 13 '17 at 08:13
@Alex Did you really use `find-file-with-coding-system` with the code I provided in my other answer? Changing the coding system after opening the file **does not work**. If you did everything exactly as I wrote we need the attention of someone more experienced with character encodings than me. Maybe Stefan or Drew can help. After you double-checked that my proposed answers do not work I would try to draw the attention of some moderator by flagging the question. Pityingly, I cannot put a bounty on this question as Stefan did with some other question. Don't know why I cannot do that. – Tobias Dec 13 '17 at 08:33
I my Emacs (25.1) no function `find-file-with-coding-system`. I do all as you describe. Set your code in my init file. But it not help. – Alex Dec 13 '17 at 08:45
@Alex If `find-file-with-coding-system` is not defined the init file has not been evaluated. For a test you can paste the code into the `*scratch*` buffer and run `M-x eval-buffer` for that buffer. Afterwards the menu item `Find File With Coding System` should exist in the `File` menu. Use that menu item to open your file. The command will ask you for the coding system. – Tobias Dec 13 '17 at 08:48
I do all of you instruction. It's not help. – Alex Dec 13 '17 at 09:21
@Alex We have to wait for tomorrow. Then I can put a bounty on this question. – Tobias Dec 13 '17 at 10:11
@Alex You wrote "I do all of you instruction. It's not help.". What is the exact reaction. Did you have the menu item and were you able to start `find-file-with-coding-system`? Is there a `R:` at the start of the modeline after you loaded the file with `find-file-with-coding-system`? – Tobias Dec 13 '17 at 10:18
1. Menu item is show. 2. I start `find-file-with-coding-system`. 3. Char `R` is show in mode line. But file is not correct decoded. – Alex Dec 13 '17 at 11:57
I update my post. Add screenshot – Alex Dec 13 '17 at 13:07
@Alex That screenshot looks strange. I would expect `R:` instead of `1:` at the beginning of the modeline. The `R:` stands for `koi-R` while `1:` stands for `iso-latin-1`. Could you retry with `emacs -Q`. That means that you start emacs with the command line option `-Q` which essentially prevents the loading of the customization files. Afterwards paste the lisp code into `*scratch*`, evaluate the buffer with ` evaluate-buffer ` and open the file with `find-file-with-coding-system` (e.g., via the menu). – Tobias Dec 13 '17 at 13:32
I update my post. Not help. If you want I can send you this file (test.srt). Maybe in your Emacs you can decode them. – Alex Dec 13 '17 at 15:17
@Alex You can provide a significant part of the file for everybody in your question. Use `find-file-literally` for the file, run `base64-encode-region` for a not too large but significant region of the buffer. Paste exactly that encoded region as code-block into the question. The inverse steps for testing would be: `find-file-literally` for a non-existing file, paste the base64-encoded stuff there, run `base64-decode-region` with the full buffer selected, save the buffer. Afterwards, open the just newly created file again with `find-file`. Maybe, cite that comment in the question. – Tobias Dec 13 '17 at 15:26
I update my post. Put part of file in base64 – Alex Dec 13 '17 at 16:01
@Alex: We just selected the wrong coding system. Use `windows-1251` instead of the `koi-R` stuff. Have fun! Maybe, you should adapt part of https://www.emacswiki.org/emacs/GnuEmacsRussification to your needs. – Tobias Dec 13 '17 at 16:07
Yes with `windows-1251` - it work!!! Thank you. – Alex Dec 13 '17 at 16:11
@Alex I edited the answer accordingly. Would be nice if you accepted it. – Tobias Dec 13 '17 at 16:12
And the last question: Is it possible to correct decode file AFTER open? – Alex Dec 13 '17 at 16:52
@Alex I've added a function `translate-buffer-encoding` to my other answer. (It fits better there.) – Tobias Dec 13 '17 at 23:47

How encode to UTF-8 content of buffer?

2 Answers2

Linked