6

People more knowlegable than I have suggested that intelligent fuzzy matching and sorting for both minibuffer completion and in-buffer completion could be accomplished by implementing a completion-style.

A style that can match according to an arbitrary predicate and can sort according to an arbitrary comparison function would be ideal.

completion-styles is a variable defined in minibuffer.el. Its value is (basic partial-completion emacs22)

Documentation: List of completion styles to use. The available styles are listed in completion-styles-alist.

Note that completion-category-overrides may override these styles for specific categories, such as files, buffers, etc.

You can customize this variable.

This variable was introduced, or its default value was changed, in version 23.1 of Emacs.

I'm would personally love to see fuzzy in-buffer completion (à la Visual Studio, Sublime Text, Vim+Neocomplete, Vim+YCM. However, after reading minibuffer.el, I'm completely lost.

How can this be achieved?

There is a lot of interest in this problem:

PythonNut
  • 10,243
  • 2
  • 29
  • 75

2 Answers2

4

Some comments:

  • The Emacs completion system is focused on completion, i.e. actually putting characters into the buffer; while other systems are focused on narrowing down a list of candidate for display and letting the user make a choice.

  • Sorting is determined by the completion table (i.e. the source) and not the completion-style. I think one option would be to use completion-in-region-function and implement a different front-end, which may query the current completion-style via some to-be-defined interface about how to sort it's candidates.

  • How to write a completion-style is left as an exercise to ther user. Seriously, it is not very well documented, especially how to handle the boundaries right for filename completion.

Anyway, here is some working code.

(defun completion-naive-fuzzy-completion (string table predicate point
                                                 &optional all-p)
  (let* ((beforepoint (substring string 0 point))
         (afterpoint (substring string point))
         (boundaries (completion-boundaries beforepoint table predicate afterpoint))
         (prefix (substring beforepoint 0 (car boundaries)))
         (infix (concat
                 (substring beforepoint (car boundaries))
                 (substring afterpoint 0 (cdr boundaries))))
         (suffix (substring afterpoint (cdr boundaries)))
         ;; |-              string                  -|
         ;;              point^
         ;;            |-  boundaries -|
         ;; |- prefix -|-    infix    -|-  suffix   -|
         ;;
         ;; Infix is the part supposed to be completed by table, AFAIKT.
         (regexp (concat "\\`"
                         (mapconcat
                          (lambda (x)
                            (concat "[^" (string x) "]*?" (string x)))
                          infix
                          "")
                         ".*\\'"))
         (candidates (cl-remove-if-not
                      (apply-partially 'string-match-p regexp)
                      (all-completions prefix table predicate))))
    (if all-p
        ;; Implement completion-all-completions interface
        (when candidates
          ;; Not doing this may result in an error.
          (setcdr (last candidates) (length prefix))
          candidates)
      ;; Implement completion-try-completions interface
      (cond
       ((and (= (length candidates) 1)
             (equal infix (car candidates)))
        t)
       ((= (length candidates) 1)
        ;; Avoid quirk of double / for filename completion. I don't
        ;; know how this is *supposed* to be handled.
        (when (and (> (length (car candidates)) 0)
                   (> (length suffix) 0)
                   (char-equal (aref (car candidates)
                                     (1- (length (car candidates))))
                               (aref suffix 0)))
          (setq suffix (substring suffix 1)))
        (cons (concat prefix (car candidates) suffix)
              (length (concat prefix (car candidates)))))
       ;; Do nothing, i.e leave string as it is.
       (t (cons string point))))))

(defun completion-naive-fuzzy-try-completion (string table predicate point)
  (completion-naive-fuzzy-completion string table predicate point))
(defun completion-naive-fuzzy-all-completions (string table predicate point)
  (completion-naive-fuzzy-completion string table predicate point 'all))

(add-to-list 'completion-styles-alist
             '(naive-fuzzy
               completion-naive-fuzzy-try-completion
               completion-naive-fuzzy-all-completions
               "Simple naive-fuzzy completion, which never alters the string to complete, unless a unique match exists."))

;; (setq-local completion-styles '(naive-fuzzy))
politza
  • 3,316
  • 14
  • 16
  • I'm not entirely sure what you mean by the first point. For the 2nd point, is there any other way we can sort the candidates? Working code would be awesome. – PythonNut Jun 28 '15 at 13:48
  • Me neither, but compare ido/helm to partial-completion. Apropos sorting: Since you're able to redefine anything, it's just a matter of how ugly it's going to get. – politza Jun 29 '15 at 19:53
  • I think this pretty much answers as far as fuzzy matching is concerned... I'll post another question that involves sorting. – PythonNut Jun 29 '15 at 21:27
  • Good answer. I wonder who dared downvoting it. – Dmitry Jun 30 '15 at 20:48
  • @Dmitry if you look at the edit history, you will see that it was once a good deal less helpful. – PythonNut Jun 30 '15 at 20:57
  • @PythonNut Oh, ok. – Dmitry Jun 30 '15 at 21:27
  • @politza You should be able to improve its performance by using the trick from `ido.el` in the recent Emacs versions. See how `ido-set-matches-1` constructs its flex regexp near the end. – Dmitry Jun 30 '15 at 21:28
  • @Dmitry I already had that implemented, so I've added it. – PythonNut Jun 30 '15 at 21:35
2

Others will no doubt provide information more directly specialized for in-buffer text completion, which is what I think your question is mainly asking about. For my part, I offer this info.

  1. There are many kinds of matching that are loosely called "fuzzy". Some are quite simple; others are complex and sometimes costly in terms of computation. Find out about what different ones are, and judge whether you think they fit your different use cases - just as there are different kinds of fuzzy matching, so there are different uses of fuzzy matching/completion.

  2. Out of the box, completion-styles in vanilla Emacs provides some match methods that one could very loosely call "fuzzy", but which are not typically thought of as such - in particular, the so-called partial-completion style.

  3. A common "poor man's" fuzzy matching is sometimes used, which is cheap to use and easy to implement. It amounts to using a regexp that matches any succession of characters between a sequence of specified characters. That is, you type some characters in succession and they are matched in order against text that can contain arbitrary strings of characters between them.

    In Emacs, Ido mode calls this flex matching, and Icicles (see next) calls it scatter matching (the chars you type are matched in a scattered, but ordered way against the target).

  4. I am most familiar with my library Icicles, which supports several kinds of fuzzy matching whenever completion is used. The rest of my answer describes Icicles support for fuzzy matching.

    1. Icicles completion typically uses the minibuffer, even when you complete buffer text. This is different from most Emacs UIs for completing buffer text, which typically either (1) insert a completion candidate and let you hit a key to cycle to the next candidate or (2) pop up a menu of candidates for you to cycle through, to select one.

      Icicles uses the traditional buffer *Completions* to show completion candidates. You can cycle among them to select one, or you can type some more characters or otherwise edit your minibuffer input to narrow the selection and choose one.

      So for example, in Icicle mode if you complete buffer text using standard Emacs library completion.el's dynamic completion mode (M-RET) then whenever there are multiple completion candidates for the text you want to complete Icicles pops them up in buffer *Completions*. Likewise, if you use the dynamic abbreviation of standard Emacs library dabbrev.el (C-M-/). Or if you complete file or command names in shell mode.

      This page summarizes Icicles in-buffer completion of text.

    2. More typically, you use Icicles completion for input that responds to a minibuffer prompt - input that can be used for anything - not necessarily inserted in the current buffer as a completion of text that precedes it.

      But it does not matter which kind of completion you use in Icicles - in-buffer text completion or minibuffer input completion. For all completion with Icicles you can use fuzzy matching.

    3. And you can change the matching mode on the fly, at any time. The doc page about fuzzy Icicles completion covers this, and it is described further in #4, below. As far as I know, this feature is not available from other completion interfaces. Different kinds of matching can be more or less useful for different kinds of completion in different contexts.

    4. The Icicles matching methods are grouped on two different completion keys - by default TAB and S-TAB. The TAB methods I refer to loosely as "prefix" completion methods, and the S-TAB methods I refer to loosely as `apropos" completion methods. The various methods are described on the fuzzy matching page of the Icicles doc.

      • TAB groups (1) basic prefix matching, (2) the vanilla matching defined by Emacs option completion-styles (what you get with vanilla Emacs), (3) fuzzy matching, and (4) swank matching methods. C-( during completion cycles among these.

      • S-TAB groups (1) apropos, (2) scatter, (3) Levenshtein, (4) Levenshtein strict, and (5) Jaro-Winkler matching methods. M-( during completion cycles among these.

      By default (i.e., unless you cycle to another method), TAB does basic prefix completion and S-TAB does apropos completion, which uses regexp matching. You can use C-` to toggle whether apropos completion uses regexp matching or substring matching (escaping regexp special chars).

      (You can customize which method is used by default for TAB and for S-TAB. So for example, if you generally like vanilla Emacs completion, as controlled by option completing-styles, then you can make vanilla, instead of basic, be the default method for TAB.)

Drew
  • 75,699
  • 9
  • 109
  • 225
  • It's important to note that intelligent sorting is absolutely essential for effective flex/fuzzy searching. That's the reason that `scatter` in Icicles (which I'm a huge fan of, BTW) is simply not all that useful (as you point out in the Icicles EmacsWiki)—there are far too many collateral matches that interfere with the candidates you're probably looking for. An intelligent scoring algorithm like flx will ensure that `plp` _will_ get you `package-list-packages` as the first match and not `image-dired-toggle-dired-display-properties` (which is a hopelessly bad match). – PythonNut Jun 28 '15 at 21:52
  • @PythonNut: There are pros & cons to each fuzzy matching approach. Those that one might consider more "intelligent" can also be more rigid or more costly in terms of performance (scoring and sorting being one reason). Each has its place. In Icicles, you can easily combine multiple matchings, including fuzzy matchings (of the same or different kinds), and such combination is typically a lot easier, quicker, and more intelligent than relying on a single, sophisticated method. But you are right that I'm not a big fan of fuzzy matching, in general. – Drew Jun 28 '15 at 23:11
  • @PythonNut: The other thing to say is that users should control the sorting. In Icicles you can easily change the sort order. You might consider that flx sorting is great, but someone else, or perhaps even you in a different context, might want to see candidates ordered alphabetically, or by length, or... One size does not fit all. – Drew Jun 28 '15 at 23:16