What is a good strategy to locating function arguments in a buffer?

Question

I'd like to create a few routines for manipulating function arguments in buffers, which of course requires me to first locate the arguments. Suppose that I'm operating on a buffer with a programming language with a C-like syntax, then I could search for the commas to delimit arguments. The difficulty arises when you have arguments that are calls to other functions, because those can also include commas that I wouldn't want to consider, as in the following example (the carrots point to the commas that I want to find).

somefun(w, f(x1, g(y1, y2)), h(z1, z2))
         ^                 ^

Assuming that I've already located the start of the arguments list, my current plan is to repeatedly use forward-sexp to advance past the nested elements in the syntax tree and then check if the next non-whitespace character is a comma. But is there a more robust or idiomatic way to approach this?

How C-like is the language. Does [semantic](https://www.gnu.org/software/emacs/manual/html_node/emacs/Semantic.html) work? — Tobias, May 06 '19 at 09:10
@Tobias that would probably be the way to go, but since the number of languages that are supported by `semantic` are fairly limited, I think maybe it would be better to have an imperfect solution that is more widely applicable. In particular, the language that I use most, R, isn't supported. — dpritch, May 07 '19 at 02:27
I've used the `(forward-sexp)` solution in many Emacs modules, and it work surprisingly well. I would recommend you to use it over a "full parser" solution, as they tend to break when the program isn't syntactically correct, which they sometimes aren't when you're in the middle of an edit. — Lindydancer, Jun 06 '19 at 21:24

score 2 · Accepted Answer · answered Jun 06 '19 at 20:28

There is the general rule that any widely used programming language such as R has an Emacs language mode and those modes also have functions that parse function arguments. For R there is the huge package Emacs Speaks Statistics.

There is a parser ess-r-syntax. The comment marks it as "not yet stable". But, there are 4 people working on it including senior scientists. So I assume that it is better than homebrew.

Also note that there are already commands that go to the beginning of the function definition ess-goto-beginning-of-function-or-para and that complete function arguments. So parsing function arguments must already be built-in.

score 1 · Answer 2 · answered May 07 '19 at 02:41

Under the theory that some answer is better than none, I've posted my hacky solution to this, however I would welcome a better answer than this one.

(defun find-next-fcn-arg-separator ()
  "Find the next argument separator in a function call.

Move point to the next function argument separator.  Point is
expected to be at the opening parenthesis of the function
argument list or on one of the separators of the function
argument list.

Throws an error if there are no more top-level function
separators."

  ;; move past the current character and any whitespace
  (if (eolp)
      (beginning-of-line 2)
    (forward-char))
  (skip-chars-forward "[:space:]")

  ;; each iteration moves point across one sexp and any trailing whitespace.
  ;; Note that `forward-sexp' throws an error if we reach the closing
  ;; parenthesis of the function argument list, as desired.  The call to `eobp'
  ;; is to protect against an infinite loop in the event of a malformed function
  ;; or if the function is called outside of a function argument list.
  (while (not (eq (char-after) ?,))
    (forward-sexp)
    (skip-chars-forward "[:space:]")
    (when (eobp)
      (error "reached buffer end without finding the function end")))

  ;; if we've made it here then an argument separator was found
  (point))




(defun find-fcn-arg-separators ()
  "Find the function argument separator positions.

Returns a list of the function argument separator positions.
Assumes that point is at the opening parenthesis of the function
argument list."

  ;; find the position of the next function argument separator if any
  ;; separaotors remain, or return nil otherwise
  (defun try-find-next ()
    (ignore-errors (find-next-fcn-arg-separator)))

  ;; SEP-POSITIONS is used to track all of the found function argument separator
  ;; positions, while CURR-POSITIONS is bound to the next separator position
  (let ((sep-positions)
        (curr-pos (try-find-next)))

    ;; each iteration adds the most recently found function argument separator
    ;; position to SEP-POSITIONS, and searches for another separator position
    (while curr-pos
      (setq sep-positions (cons curr-pos sep-positions))
      (setq curr-pos (try-find-next)))

    (reverse sep-positions)))

If you use `(forward-comment (buffer-size))` instead of `(skip-chars-forward "[:space:]")`, your code will work when there are comments in the middle of the argument list. (Passing `buffer-size` as count is an idiom recommended in the Elisp manual to pass all comments.) — Lindydancer, Jun 06 '19 at 21:17

What is a good strategy to locating function arguments in a buffer?

2 Answers2