4

I mean these:

enter image description here

I'd like to remove them, so I'm looking for a function which can find all these characters in the buffer which cannot be displayed properly with the current font and therefore show up like a rectangle.

Tom
  • 1,190
  • 7
  • 16
  • While it is certainly possible to remove certain characters from a buffer, the question would be more usable to future forum participants if it were *tweaked* a bit to identify those characters that cannot properly be displayed due to current encoding and font selected, etc. Once those characters are identified programmatically, then the user can choose whichever course of action is appropriate; e.g., setting the `buffer-display-table` to display those characters with another character that is supported, such as an ascii ☺ in lieu of `01F 91C` and ☻ for `01F 91B` as depicted in the photo. – lawlist Jul 26 '18 at 05:03
  • 1
    OK, I changed the title and reworded the question a bit to be more generic. I left in the removal, because that is what I want, but if someone posts an answer to find these characters then others can use that answer for other purposes. – Tom Jul 26 '18 at 05:11
  • Doesn't function `char-displayable-p` help here? `Return non-nil if we should be able to display CHAR. On a multi-font display, the test is only whether there is an appropriate font from the selected frame’s fontset to display CHAR’s charset in general. Since fonts may be specified on a per-character basis, this may not be accurate.` – Drew Jul 26 '18 at 16:01
  • `char-displayable-p` doesn't seem to do the right thing for this case. `(char-displayable-p #x1f91c)` returns `unicode`, even when my font can't display that character. – rpluim Jul 27 '18 at 12:47

2 Answers2

3

Robert Pluim proposed already a solution. The credit for describe-char-display belongs to him.

Here I detail what I meant in my comment to his answer.

I've got the impression that this solution is more efficient and has at least the same level of simplicity as his solution. But maybe such a statement is subjective.

(defun delete-non-displayable ()
  "Delete characters not contained in the used fonts and therefore non-displayable."
  (interactive)
  (require 'descr-text) ;; for `describe-char-display'
  (save-excursion
    (goto-char (point-min))
    (while (re-search-forward "[^[:ascii:]]" nil 1)
      (unless (describe-char-display (1- (point)) (char-before))
        (replace-match "")))))
Tobias
  • 32,569
  • 1
  • 34
  • 75
  • I like that it's simpler in the sense, that it doesn't have to check eol and tab separately and it does not check every character. It's quicker too. – Tom Jul 26 '18 at 11:01
  • It needs to do less per-character checking, which makes it faster, and it's probably more idiomatic than mine. – rpluim Jul 26 '18 at 11:15
2

It might be better to find fonts that can be used to display those characters, but if you really want to remove them:

(defun delete-non-displayable ()
  (interactive)
  (require 'descr-text) ;; for `describe-char-display'
  (save-excursion
    (goto-char (point-min))
    (while (not (eobp))
      (if (or (eolp)
              (looking-at "\t")
              (describe-char-display (point) (char-after)))
          (forward-char)
        (delete-char 1)))))
Tobias
  • 32,569
  • 1
  • 34
  • 75
rpluim
  • 4,605
  • 8
  • 22
  • 1
    You could make the code more efficient by only checking characters that match `(re-search-forward "[^[:ascii:]]" nil 1)` with `(describe-char-display (1- (point)) (char-before))`. – Tobias Jul 26 '18 at 09:45
  • Sure, but simplicity and correctness before speed. – rpluim Jul 26 '18 at 09:52
  • Thanks, it seems to do the job. The buffer is fed into MySQL which also have problems with these characters, that's why I'm removing them, because they are useless emojis anyway. (They could be replaced with ascii versions, but it's not really worth the effort.) – Tom Jul 26 '18 at 10:58
  • I accept your solution, because you posted it first. Thank you. – Tom Jul 26 '18 at 11:06