3

I am trying to implement a minor mode that displays in the modeline the current word count and the number of new words since the file was last opened.

There are some existing implementations with this type of functionality (e.g., nanowrimo.el). But as far as I can tell, all of the implementations I've found so far seem to re-count the entire buffer after each change (i.e. after each character insertion or deletion, kill, yank, undo, etc.) This makes emacs a bit slow and unresponsive on on large files. This would seem to be a very common problem, but I haven't found a solution.

I am trying to keep track of the current word count in a variable and use before-change-functions and after-change-functions to update that count after every insertion, deletion, kill, yank, undo, etc. This involves counting only the changed character(s) and the immediately surrounding characters. (For example, if a space is inserted between two non-word characters, the word count increases by 1.) This approach is much faster/more efficient - but I haven't quite figured out all the details to make it work.

Is there existing code that already does this, either using change hooks or any other method? Or, any thoughts about a good approach to do this?

Drew
  • 75,699
  • 9
  • 109
  • 225
B. Bub
  • 49
  • 2
  • 1
    I would suggest using a system similar to `flyspell` "displacement-commands" and *also* an idle-timer. If idle, then check. If not idle, then if `this-command` is `eq` to the `last-command`, then don't check -- otherwise, check. The `pre-command-hook` can record `this-command` and the `post-command-hook` can check to see if `this-command` is `eq` to `last-command`. You can also have a list of commands that are excluded from the check, or a list of commands that are specifically included in the check. – lawlist Dec 04 '16 at 05:22
  • Be sure to read `(info "(elisp) Change Hooks")`, because these hooks may not work as you'd expect, i.e. they may not be called in a balanced manner in every case. – politza Dec 04 '16 at 10:13
  • 2
    How big does the file have to be for a word counting function to lag noticeably? From programming perspective, it would be much easier to re-count all words in a buffer, thus giving results independent of the state (you don't really want to try to guess all ways users may insert text in the buffer much less add hooks to fundamental text-inserting functions, unless absolutely necessary). – wvxvw Dec 04 '16 at 10:40
  • @lawlist That's a simple solution that might solve the problem (if I can find an idle time that works for me). Will try it. Could you say a bit about the logic behind checking only when `this-command` is not `eq` the `last-command`? – B. Bub Dec 04 '16 at 16:35
  • @politza I ran into exactly that issue. I think I can make it work by using before-change-functions to count deletions and after-change-functions to count insertions. What I'm having trouble with is figuring out how to make the hooks buffer local. I tried: `(make-variable-buffer-local 'before-change-functions)` but the function I added to `before-change-functions` seems to be called in all buffers. – B. Bub Dec 04 '16 at 17:07
  • If you type a word, Emacs sees each regular keypress as `self-insert-command` -- likely to be repetitive. An idle timer can be set to something like 0.5 seconds, so when you stop typing, the word count is checked. If you move left or right, then those are different commands, so the word count will be checked. If you delete a word or letter, that is also a different command. If `this-command` is `eq` to `last-command`, then it may be a repetitive series of the same thing. You might also want to turn it off for repetitive movements, such as left, right, up, down, scroll. – lawlist Dec 04 '16 at 17:16
  • @wvxvw It's noticeable with about 5000 words. Agree about the simplicity of counting all words. However, in my testing it seems as if all changes are captured in either `after-change-functions` or `before-change-functions`. Are there some buffer modifications that are not accounted for in those functions? – B. Bub Dec 04 '16 at 17:25
  • Here is a link to an interesting thread where I sought help to test the number of the previous same commands -- "**Test whether all elements/symbols of a list are the same (eq)**": http://emacs.stackexchange.com/questions/26771/test-whether-all-elements-symbols-of-a-list-are-the-same-eq/26772 I was toying with the idea of permitting a certain number of repetitive commands to trigger a function, and do something different (or nothing at all) otherwise. – lawlist Dec 04 '16 at 17:26
  • @B.Bub You should use `add-hook` with a non-nil local argument. – politza Dec 04 '16 at 18:14
  • It may be feasible to use `with-no-input` or even `iter-defun`. – politza Dec 04 '16 at 22:38
  • @politza Is this supposed to work: `(make-variable-buffer-local 'after-change-functions) (add-hook 'after-change-functions 'update-counts t)`? When I add this to the minor mode code and evaluate `after-change-functions` in various buffers, the `update-counts` function seems to be added to `after-change-functions` in the appropriate buffers only (`org-mode` buffers, on which the minor mode is activated). But the `update-counts` function runs in all buffers, including the minibuffer. Any thoughts on what I'm doing wrong? – B. Bub Dec 05 '16 at 01:24
  • @lawlist Thanks for the clear explanation. – B. Bub Dec 05 '16 at 01:26
  • @lawlist I've implemented the idle functionality and it works beautifully for small files. I'm realizing though that word-counting the entire buffer will still be an impediment on large files above 25,000 or so words. – B. Bub Dec 05 '16 at 05:07
  • Depending upon how important this feature is, you may wish to consider setting up a cache of the previous count and only looking at the current line or paragraph, adding, or subtracting to the cache. When leaving the current line and modifying the buffer, then deal with the new line in a similar way. Cut or paste would trigger a full recount, or get really fancy and examine what has been cut and what is being pasted to avoid a full recount. I've only given this a few minutes of thought, but it sounds completely doable and very efficient if done correctly. – lawlist Dec 06 '16 at 18:54
  • Hi all, thanks for the suggestions. I have a partial solution that works well enough for my use (though it does rely on the assumption that `before-change-functions` and `after-change-functions` will be called in pairs). What is the recommended next step on this site? Should I write up the solution and post it here? – B. Bub Dec 08 '16 at 14:24

1 Answers1

1

I went with an approach that uses before-change-functions and after-change-functions to update the word count after each buffer modification. A simplified version of the code is:

;; Function that counts words in buffer
(defun wc-buffer ()
...
)

;; Function that counts words in a region
(defun wc-region (rbeg rend)
...
)

(defvar-local a1 nil)
(defvar-local a2 nil)
(defvar-local curr-wc nil)

(defun init-function ()
  (interactive)
  (save-excursion
    (setq curr-wc (wc-buffer))
    )
  )

(defun wc-update-before (change-beg change-end)
  (setq pos1 (max 1 (1- change-beg)))
  (setq pos2 (min (point-max) (1+ change-end)))

  (setq a1 (wc-region pos1 pos2))
)

(defun wc-update-after (change-beg change-end prev-len)
  (if (bound-and-true-p a1) 
      (progn
    (setq pos1 (max 1 (1- change-beg)))
    (setq pos2 (min (point-max) (1+ change-end)))

    (setq a2 (wc-region pos1 pos2))

    (setq curr-wc (+ curr-wc (- a2 a1)))
    )
    nil)
  )

(init-function)
(add-hook 'before-change-functions 'wc-update-before nil t) 
(add-hook 'after-change-functions 'wc-update-after nil t)
(setq inhibit-modification-hooks nil)

(Code for the wc-buffer and wc-region functions needs to be added to use this.) The variable curr-wc holds the current word count. This runs efficiently and causes no noticeable slowdown even on large files (tested with 100,000 words). It seems to work correctly for all modification types I've tried (inserts, deletes, yanks, kills, and undos). The main concern with the approach is that, according to the documentation, the before-change-functions and after-change-functions are not necessarily called in pairs. But in my testing so far this doesn't seem to have affected the accuracy of the count.

B. Bub
  • 49
  • 2
  • 2
    Hi can you provide the whole answer! I am incapable to writing code for wc-buffer and wc-region. Can you please provide the whole solution? – Pandian Le Nov 23 '19 at 10:39