9

When I copy non-ascii text from Windows and paste into Emacs, it shows up as an octal sequence. For example, if I paste ä into Emacs it shows up as \344.

I could type C-q 344 to get the ä back in Emacs. That's annoying, but it's tolerable if there's only one character. But if there are many characters turned into octal escape sequences, it would be convenient to run some command on a region to convert everything inside. Is there already such a command? If not, how would you write a function to do it?

[I set my default coding system to utf-8 in my .emacs file, and I use the same .emacs file on Windows and Linux. But the problem only happens when copying from a Windows application into Emacs. Copying from Emacs to another Windows application works fine.]

John D. Cook
  • 231
  • 1
  • 4
  • 1
    I think that what you want is `revert-buffer-with-coding-system` (see it's documentation). Emacs shows the characters this way because you copied them from an environment which was in different coding system (assuming ANSI with so-called high ASCII characters used to render Latin with diacritics), but your buffer must be using something like UTF-8 (for which ASCII characters with high bits set have no meaning, i.e. are invalid). – wvxvw Dec 18 '14 at 13:53
  • 1
    Or, maybe even `set-clipboard-coding-system`. Try `C-h a coding-system` to see what other functions in this group are available. – wvxvw Dec 18 '14 at 13:54
  • The \344 you see is the result of a configuration problem. Rather than a command to "fix" it after the fact, you should investigate why you get it in the first place. E.g. start with `emacs -Q` and if you see the problem there already, `M-x report-emacs-bug`. – Stefan Dec 18 '14 at 14:36
  • @Stefan Sometimes, "why you get it" is obvious, but that will not help you fixing it after the fact. For example, I just had this issue as a result of `insert-file-literally` (and it was too late to either undo or delete/reinsert the file). – T. Verron Dec 18 '14 at 15:11
  • @Stefan there could be so many misconfigurations outside Emacs that can cause this, to name a few: someone saved BOM into a file which was originally in some cp-12XX single byte encoding, which confused the source editor where the text was copied from, the source editor incorrectly reported the type of content in the clipboard etc. I used to see this a lot when editing some ancient ASP sources which were originally incorrectly encoded. – wvxvw Dec 18 '14 at 15:31
  • Of course, the problem could be a bug in the program from which the text is copied. But I think it's a good idea to start by making sure the problem is not on Emacs's side. There are also many ways to misconfigure Emacs, e.g. by blindly copying snippets of config from the web into your ~/.emacs. – Stefan Dec 18 '14 at 15:40
  • Thanks for the suggestions. The revert-buffer-with-coding-system and set-clipboard-coding-system functions have not solved the problem, though I may not be using them skillfully. When I start emacs with -Q the text pastes correctly. – John D. Cook Dec 18 '14 at 15:47
  • By the way, I have these lines in my .emacs file: (prefer-coding-system 'utf-8) (setq locale-coding-system 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (set-selection-coding-system 'utf-8) – John D. Cook Dec 18 '14 at 15:48

2 Answers2

4

It turns out the offending part of my .emacs file was (set-selection-coding-system 'utf-8). Once I removed that line, Emacs behaved as expected.

John D. Cook
  • 231
  • 1
  • 4
2

Once made this:

(defun umlaute ()
  "Fix wrongly inserted characters, commonly from pasting. "
  (interactive "*")
  (save-excursion
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\344\\|"(list 228)) nil 1)
      (replace-match "ä"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\304\\|"(list 196)) nil t 1)
      (replace-match "Ä"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\366\\|"(list 246)) nil t 1)
      (replace-match "ö"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\326\\|"(list 214)) nil t 1)
      (replace-match "Ö"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\374\\|"(list 252)) nil t 1)
      (replace-match "ü"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\334\\|"(list 220)) nil t 1)
      (replace-match "Ü"))
    (goto-char (point-min))
    (while (re-search-forward (concat "\\\337\\|"(list 223)) nil t 1)
      (replace-match "ß"))
    (goto-char (point-min))
    (while (re-search-forward "\\\201" nil t 1)
      (replace-match ""))))

from misc-utils.el at https://launchpad.net/s-x-emacs-werkstatt

Andreas Röhler
  • 1,894
  • 10
  • 10