7

When defining a syntax table for a major mode, I can define both word constituents and symbol constituents.

When should I use each? For example, given a programming language that writes variables in snake case foo_bar, should _ be part of a word or symbol?

Drew
  • 75,699
  • 9
  • 109
  • 225
Wilfred Hughes
  • 6,890
  • 2
  • 29
  • 59
  • 1
    The default `c-mode` that comes with emacs uses `_` as a symbol constituent but not a word constituent. So I expect there to be a good reason for that. Not to mention the plethora of navigation commands that will work as expected. Principle of least surprise and all that. – Vamsi Oct 11 '14 at 12:47

3 Answers3

4

Word constituents: ‘w’

Parts of words in human languages. These are typically used in variable and command names in programs. All upper- and lower-case letters, and the digits, are typically word constituents.

Symbol constituents: ‘_’

Extra characters used in variable and command names along with word constituents. Examples include the characters ‘$&*+-_<>’ in Lisp mode, which may be part of a symbol name even though they are not part of English words. In standard C, the only non-word-constituent character that is valid in symbols is underscore " _ ".

Courtesy of gnu.org

When should I use each? For example, given a programming language that writes variables in snake case foo_bar, should _ be part of a word or symbol?

It depends one what you want to achieve. I'll say, '_' should be part of word. That way, foo_bar will be treated as one word.

Please throw an eye to that question, there is an interesting answer talking about superword-mode and subword-mode

Nsukami _
  • 6,341
  • 2
  • 22
  • 35
  • 2
    Since c-mode treats `_` as a symbol constituent, why do you suggest treating it as a word constituent? – Wilfred Hughes Oct 11 '14 at 23:51
  • @WilfredHughes Sorry but, nobody here was talking about c-mode. Sure, he should be careful about the mode is working on. And as I told in my response, it's only for the case he may want to treat foo_bar as one word. Please check that [question](http://emacs.stackexchange.com/questions/983/treat-symbols-as-words-in-prog-modes) – Nsukami _ Oct 12 '14 at 00:21
  • 7
    Please don't use word syntax for characters like `_`, since that completely defeats the distinction between words and symbols. If you like your `M-f`, `M-b` and friends to move by identifiers rather than by words, then rather than change the major mode's syntax table you should use `superword-mode`. – Stefan Oct 12 '14 at 02:34
  • 5
    When writing your major-mode, _don't_ make symbols and words be the same thing. A lot of people _use_ the distinction, and you'd be making their life harder. People who don't like the distinction already have the option of `superword-mode`. – Malabarba Oct 12 '14 at 07:59
  • I do like to use `M-f` to move by symbols, so I bind it to `forward-symbol`. Anyway, @Stefan and @Malabarba: would you then suggest that `!` and `?` should also be symbol constituents? Sounds like words should only be [a-zA-Z0-9] in all programming languages. – Wilfred Hughes Oct 13 '14 at 22:35
  • 3
    @WilfredHughes: indeed, the definition of words should pretty much never be changed. – Stefan Oct 14 '14 at 01:43
4

This answer addresses your question title: "What's the difference between words and symbols". It does not speak only to the body of your question, which is about symbol syntax, which has been answered well by @Nsukami.

There are two very different meanings of the word symbol in Emacs:

  1. symbol syntax, which involves word-constituent characters plus symbol-constituent characters. This can apply to any mode, in particular any programming mode/language. @Nsukami described this meaning quite well.

  2. Emacs-Lisp symbols. This is not a syntax category. A Lisp symbol is a Lisp entity, or object, which has certain characteristics: a name, a plist (property list), and potentially a value and a function definition.

    By default, the value cell and function cell are empty, and the plist is is the empty list (nil). Lisp functions symbol-name, symbol-value, symbol-function, and symbol-plist return the components.

Note, BTW, that there is no mention of these things in the Emacs manual; they are covered only in the Elisp manual. And more significantly, perhaps: There is no index entry that matches "symbol" for the first meaning (syntax). All index entries that include the term "symbol" refer to the second meaning (Lisp symbol). In particular, there is no entry that contains both "symbol" and "syntax". Whether there should be such an index entry, I don't know (probably). But this can give you some idea of the importance to Lisp of symbols as Lisp objects.

For more info, see:

Drew
  • 75,699
  • 9
  • 109
  • 225
2

This is explained in the ELisp manual.

Emacs uses terminology from Lisp. What Lisp and Emacs calls a symbol is what many other programming languages call an *identifier. That's the name of a variable, function, etc.

A word is a run of letters or digits with no intervening punctation.

Most programming languages allow the underscore _ character in identifiers, in addition to letters and digits. Some languages, such as Lisp, allow more. In programming languages, characters such as _ that can appear in identifiers have the class “symbol constituent” _; letters and digits have the class “word constituent” w.

Commands that act on words treat a sequence of word constituents as a word. Commands that act on expressions (e.g. forward-sexp) move by whole identifiers at once. Syntax highlighting usually treats all symbol constituents identically.