Does word syntax take script into account?

Question

I call count-words-region (M-x =) on US/RU/IPA string:

HelloПривheləʊ

The following message is printed:

Region has 1 line, 4 words, and 14 characters.

All symbols have w syntax, but differ in script:

(char-syntax ?H) ; ?w
(char-syntax ?П) ; ?w
(char-syntax ?ʊ) ; ?w
(aref char-script-table ?H)  ; script: latin
(aref char-script-table ?П)  ; script: cyrillic
(aref char-script-table ?ʊ)  ; script: phonetic

Does that mean that word boundary is defined not only by char syntax but also by char script?

I would like to disable this behavior for selected modes in order to be able to navigate across words but not across scripts. How can this be achieved?

UPDATE Useful further discussion on debbugs.

score 8 · Accepted Answer · answered Apr 19 '16 at 07:43

8

This specific behaviour of forward-word can be controlled by the variables word-combining-categories and word-separating-categories. If you want to ignore the script completely, it is sufficient to add the pair (nil . nil) to the first list, e.g.

(let ((word-combining-categories (cons '(nil . nil)
                                       word-combining-categories)))
  (forward-word))

You can also change that variable with setq-local if you want the effect in a specific buffer.

answered Apr 19 '16 at 07:43

YoungFrog

3,496
15
27

How did you find about these variables? I see no mention of them in the elisp manual... – JeanPierre Apr 19 '16 at 08:06
@JeanPierre I looked at the source (and 100% agree that it should be documented!) – YoungFrog Apr 19 '16 at 08:44
3

Please `M-x report-emacs-bug` to have the documentation updated. – phils Apr 19 '16 at 09:23

score 2 · Answer 2 · answered Apr 18 '16 at 19:48

Indeed, forward-word and backward-word also show there are several words here. It does make sense to me that characters from different scripts can't be in the same word, but the documentation should be made explicit about that (here). I suggest M-x report-emacs-bug about it.

If you just want to move accross "words" ignoring script, you can use skip-syntax-forward and skip-syntax-backward (described here)

Does word syntax take script into account?

2 Answers2

Linked

Related