10

I call count-words-region (M-x =) on US/RU/IPA string:

HelloПривheləʊ

The following message is printed:

Region has 1 line, 4 words, and 14 characters.

All symbols have w syntax, but differ in script:

(char-syntax ?H) ; ?w
(char-syntax ?П) ; ?w
(char-syntax ?ʊ) ; ?w
(aref char-script-table ?H)  ; script: latin
(aref char-script-table ?П)  ; script: cyrillic
(aref char-script-table ?ʊ)  ; script: phonetic

Does that mean that word boundary is defined not only by char syntax but also by char script?

I would like to disable this behavior for selected modes in order to be able to navigate across words but not across scripts. How can this be achieved?

UPDATE Useful further discussion on debbugs.

gavenkoa
  • 3,352
  • 19
  • 36

2 Answers2

8

This specific behaviour of forward-word can be controlled by the variables word-combining-categories and word-separating-categories. If you want to ignore the script completely, it is sufficient to add the pair (nil . nil) to the first list, e.g.

(let ((word-combining-categories (cons '(nil . nil)
                                       word-combining-categories)))
  (forward-word))

You can also change that variable with setq-local if you want the effect in a specific buffer.

YoungFrog
  • 3,496
  • 15
  • 27
2

Indeed, forward-word and backward-word also show there are several words here. It does make sense to me that characters from different scripts can't be in the same word, but the documentation should be made explicit about that (here). I suggest M-x report-emacs-bug about it.

If you just want to move accross "words" ignoring script, you can use skip-syntax-forward and skip-syntax-backward (described here)

JeanPierre
  • 7,323
  • 1
  • 18
  • 37