fix buffer invalid UTF-8 encoding characters by query/replace

Asked Nov 07 '18 at 01:33

Active Nov 07 '18 at 01:33

Viewed 348 times

When I copy text from web pages, there are some special characters like ', - etc (not UTF-8 encoding). As you can see in the following screenshot:

I checked out some similar questions in this site:

But can't find what I want. I want to do query/replace all invalid UTF-8 encoding characters in buffer just like [M-%] in Emacs.

asked Nov 07 '18 at 01:33

stardiviner

This is an interesting question, especially in a sense that I don't know whether it is possible to find a regular expression that matches all bytes that constitute invalid UTF-8 encoding. What would be relatively easy though is to write a function that parses text in the buffer as UTF-8 and prompts for replacement when it encounters something it cannot parse. I think, I had a UTF-8 parser in Elisp somewhere, maybe I'll try to write something like this towards the end of the day. – wvxvw Nov 07 '18 at 06:07

0 Answers0