How to wrap line at some characters other than space?

Question

How to set Emacs to wrap long line (using Visual Line Mode) at -, _, (, ) too, instead of just at spaces.

A sample line:

This is a long line with long words like long-long-long-long-long-long-long-long-long-long-long-long-long-long-long-long-long-word, long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_long_word, (very)long(very)long(very)long(very)long(very)long(very)long(very)long(very)long(very)long(very)long(very)long-word.

Drew · Answer 1 · 2015-12-23T06:01:22.653

This is what you need to do (e.g., in the mode where you will be filling text):

(modify-category-entry ?- ?|)

That gives the hyphen character (-) the category of |, which means line breakable. Do the same for _, (, and ), if you want lines to be broken at each of those characters also. (But I suspect that you do not really want lines to be broken after ( -- just after ).)

You can use M-x describe-categories to see the available categories.

(I should say that I did not bother to try this with visual-line-mode (I don't use or like visual-line-mode), but only with ordinary text filling. If (modify-category-entry ?- ?|) is not sufficient for visual-line-mode then you might need to tinker a bit more with it or perhaps look at the visual-line-mode code to see how it does filling (if it is different from the usual filling).)

How I found this out:

I looked at the code that does filling. I started with just command fill-paragraph (bound to M-q), and I eventually got to function fill-move-to-break-point.
Function fill-move-to-break-point does this:

    (re-search-backward "[ \t]\\|\\c|.\\|.\\c|" linebeg 0)`

Not being familiar with \c, I looked in the Elisp manual, under regexps, node Regexp Backslash, and saw that "\cC matches any character whose category is C.". I followed the link there to node Categories.
I knew nothing about categories at this point, although I had some familiarity with character tables. So I read about categories.
I found, from M-x describe-categories and from the fill-move-to-break-point code, that | is the category for a character to be line breakable. So I tried modifying the category entry for the character - (hyphen) to be line-breakable (|). It worked.

It doesn't work with either `visual-line-mode` on (still wrap at spaces) or off (still wrap at window edge). — aggu, Dec 23 '15 at 09:57
@aggu Drew said **in the mode where you will be filling text** so you have to enable `auto-fill-mode` and/or use `fill-paragraph` (`M-q`). — JeanPierre, Dec 23 '15 at 11:45

score 3 · Answer 2 · answered Jun 16 '17 at 04:28

Looking at the source code, it doesn't seem to be possible, as the definition of white-space relevant for visual-line-mode appears to be hard-coded in the C part of the source code.

I'm not sure about my conclusions, but here is the sequence of my source archaeology (I'll be linking to github for ease of reference, though when actually carrying out the investigation I had used C-h f, C-h v and finally a recursive grep of the C code.)

visual-line-mode enables word-wrap by setting the variable word-wrap to t. (You can test this by just setting the variable to true, without enabling visual-line-mode itself.)

The variable word-wrap is defined in buffer.c:

DEFVAR_PER_BUFFER ("word-wrap", &BVAR (current_buffer, word_wrap), Qnil,
[...]

Guessing (and my terminology will be off, because I'm a horrible person who hasn't yet read all of the texinfo manual), this means that the elisp-accessible variable "word-wrap", sets the C variable word_wrap.

Searching for where word_wrap is used, we find:

it->line_wrap = NILP (BVAR (current_buffer, word_wrap))
  ? WINDOW_WRAP : WORD_WRAP;

i.e. if word_wrap is set in the current buffer, the it->line_wrap is set to WORD_WRAP (A ? B : C is a ternary operator...).

Looking further on, we get, among other things, this condition:

  if (it->line_wrap == WORD_WRAP && it->area == TEXT_AREA)
    {
      if (IT_DISPLAYING_WHITESPACE (it))
          may_wrap = true;
    [...]

I'm eliding quite a large amount of code, which I don't really, fully understand, but it seems that unless IT_DISPLAYING_WHITESPACE (it) then either may_wrap will be explicitly set to false, or we'll escape the block and do something interesting elsewhere.

IT_DISPLAYING_WHITESPACE in turn, is defined here:

#define IT_DISPLAYING_WHITESPACE(it)                    \
  ((it->what == IT_CHARACTER && (it->c == ' ' || it->c == '\t'))    \
   || ((STRINGP (it->string)                        \
    && (SREF (it->string, IT_STRING_BYTEPOS (*it)) == ' '       \
        || SREF (it->string, IT_STRING_BYTEPOS (*it)) == '\t')) \
       || (it->s                            \
       && (it->s[IT_BYTEPOS (*it)] == ' '               \
           || it->s[IT_BYTEPOS (*it)] == '\t'))         \
       || (IT_BYTEPOS (*it) < ZV_BYTE                   \
       && (*BYTE_POS_ADDR (IT_BYTEPOS (*it)) == ' '         \
|| *BYTE_POS_ADDR (IT_BYTEPOS (*it)) == '\t')))) \

suggesting that the only two recognised white-space characters are the space and the tab.

Hence, it appears that tabs and spaces are hard-coded as the only characters at which wrapping is allowed to occur, but as already written above, I might (hopefully) be wrong.

It seems that I was unfortunately correct: [see this comment by Eli Zaretskii](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13399#8) in a very interesting bug thread dedicated to trying to fix this. There seems to be an attempt at a fix [here](https://debbugs.gnu.org/cgi/bugreport.cgi?bug=13399#113), but apparently it doesn't work and there hasn't been any recent progress. (I'm not sure if I'm up to dealing with something like this properly...) — aplaice, Jun 16 '17 at 14:34

dlukes · Answer 3 · 2022-04-13T10:56:19.550

As of Emacs 28.1, this is now possible by setting word-wrap-by-category to t and customizing the categories of the relevant characters as detailed in Drew's answer.

You can either do this globally:

(setq word-wrap-by-category t)
;; Add the | (= line-breakable) category to the - char.
(modify-category-entry ?- ?| (standard-category-table))

Or on a buffer-local basis with a hook (e.g. for all Org Mode buffers):

(setq my-category-table (copy-category-table))
(modify-category-entry ?- ?| my-category-table)
(defun my-soft-wrap-hook ()
  (set-category-table my-category-table)
  (setq-local word-wrap-by-category t)
  ;; Presumably, you also want to enable visual-line-mode
  ;; or visual-fill-column-mode as part of this hook?
  (visual-line-mode))
(add-hook 'org-mode-hook #'my-soft-wrap-hook)

The latter might be more desirable if you use soft-wrapping in some modes (e.g. Org Mode) but hard-wrapping in others (e.g. programming modes), because modifying character categories also affects hard-wrapping (its original purpose), and you probably don't want a-long-elisp-variable-name to be split into multiple lines on - when applying hard-wrapping.

Here's a screenshot demonstrating how it works on your first example line:

How to wrap line at some characters other than space?

3 Answers3