2

When I copy text from web pages, there are some special characters like ', - etc (not UTF-8 encoding). As you can see in the following screenshot:

enter image description here

I checked out some similar questions in this site:

But can't find what I want. I want to do query/replace all invalid UTF-8 encoding characters in buffer just like [M-%] in Emacs.

stardiviner
  • 1,888
  • 26
  • 45
  • This is an interesting question, especially in a sense that I don't know whether it is possible to find a regular expression that matches all bytes that constitute invalid UTF-8 encoding. What would be relatively easy though is to write a function that parses text in the buffer as UTF-8 and prompts for replacement when it encounters something it cannot parse. I think, I had a UTF-8 parser in Elisp somewhere, maybe I'll try to write something like this towards the end of the day. – wvxvw Nov 07 '18 at 06:07

0 Answers0