0

I'm really confused about this issue since it should be doable, but i fail to understand it.

I have some file with broken encoding (a pdf file, actually), which I copy some text from. When I paste this text into emacs (or anywhere else) I get sth like "íàïðèìåð", while the correct variant should be cyrillic "например". My buffer encoding is utf-8 and the broken text has something to do with cp1251. My goal is - obviously - to repair this broken encoding and insert it in place of the broken pasted text.

I tried different combinations of encode-coding-string and decode-coding-string with utf-8 and cp1251, but it gave me some other broken encodings or just a bunch of spaces (that's kinda weird)

Sorry for my messy statement, I just really don't get all this encoding stuff.


What I achieved is a (half)solution for the reverse problem: if i write (encode-coding-string "например" 'cp1251) and press C-x C-e, it prints "íàïðèìåð" to the messages. It's strange, but if I try to insert result of this computation into a buffer I get some nonsense yet again... Maybe I miss another aspect of this problem...

heinwol
  • 101
  • 3
  • `(decode-coding-string (encode-coding-string "например" 'cp1251) 'cp1251)` gives `#("например" 0 8 (charset windows-1251))`. So you have kind of a round trip and you should try `(decode-coding-string garbage 'cp1251)`. But if you yank in Emacs maybe you should use the `cp1251` coding system for the yank command. – Tobias May 24 '21 at 19:13
  • @Tobias, thanks for the reply! Unfortunately, your solution didn't help. I tried changing things which interact with the clipboard like `set-clipboard-coding-system` (as suggested [here](https://stackoverflow.com/questions/22647517/emacs-encoding-of-pasted-text)) and `selection-coding-system` (as described [here](https://emacs.stackexchange.com/questions/372/character-encoding-when-copying-some-text-from-somewhere-to-emacs-and-saving-to)), but this didn't help either – heinwol May 25 '21 at 13:57

1 Answers1

0

It seems I was suspecting the wrong encoding. The following code (using dash)

(-> "íàïðèìåð"
    (encode-coding-string 'iso-8859-1)
    (decode-coding-string 'cp1251)
    insert)

inserts "например" into a buffer. Though I don't have a clue what's going on)

That's a total victory against these silly encodings, ladies and gentlemen!

heinwol
  • 101
  • 3