21

Especially when copying text from things like Google docs, I would like Emacs to automatically remove smart double quotes, smart single quotes, and all manner of em-dash and en-dash characters, replacing them with their ascii equivalents.

Is there a way to configure Emacs to do this automatically? Or, baring that, a function I can call that will do it on the buffer or region?

Lee H
  • 2,697
  • 1
  • 16
  • 31
  • 1
    I like this idea. In the past I've used `(occur "[^[:ascii:]]")` to find non-ascii characters in a buffer for manual cleanup, but automatically replacing the common ones would be great. – glucas Nov 12 '14 at 14:50
  • Is there anywhere that might list all the 'smart' characters and their ascii equivalents? – Jonathan Leech-Pepin Nov 12 '14 at 17:29

2 Answers2

18

Based on SU : How to remove smart quotes in copy Paste

You can try something like the following:

(defcustom smart-to-ascii '(("\x201C" . "\"")
                ("\x201D" . "\"")
                ("\x2018" . "'")
                            ("\x2019" . "'")
                            ;; en-dash
                            ("\x2013" . "-")
                            ;; em-dash
                            ("\x2014" . "-"))
  ""
  :type '(repeat (cons (string :tag "Smart Character  ")
                       (string :tag "Ascii Replacement"))))

(defun replace-smart-to-ascii (beg end)
  (interactive "r")
  (format-replace-strings smart-to-ascii
                          nil beg end))

Using it as a defcustom to allow for adding/adjusting characters to match what is desired.

Jonathan Leech-Pepin
  • 4,307
  • 1
  • 19
  • 32
  • That won't really be a full solution, unicode has many symbols each for various kinds of quotes and dash-like-characters (e.g. non-breaking hyphen \u2011) and they all occasionally appear. I'm not even sure if an exhaustive list would stay exhaustive over time as unicode grows. – Peteris Nov 12 '14 at 20:30
  • 1
    @Peteris assuming the list was kept current (would need a list/reference of such) it would work in the long run. My selection was based entirely on those that Lee H mentioned. I was not trying to provide an exhaustive list in this case, simply a starting point that could be customized to fit any others that are retrieved. – Jonathan Leech-Pepin Nov 12 '14 at 20:54
  • After replacing whatever characters are defined in the alist, you could call `highlight-regexp` to highlight any remaining non-ASCII characters in the region. – glucas Nov 13 '14 at 15:11
8

To add to what @Jonathan posted, you can make that automatic (so that yanking does not even add those chars in the first place) by doing this:

(advice-add 'yank :after (lambda (&optional ignore)
                           (replace-smart-to-ascii (mark) (point)))
            '(name replace-smart))
Drew
  • 75,699
  • 9
  • 109
  • 225