10

I'm working with strings which may have any number of prefix and suffix spaces, tabs, newlines, etc. Currently I have this:

(replace-regexp-in-string
 "^[^[:alnum:]]*\\(.*\\)[^[:alnum:]]*$"
 "\\1" my-string)
Drew
  • 75,699
  • 9
  • 109
  • 225
user23847
  • 101
  • 1
  • 3

3 Answers3

16

What's the idiomatic (or best) way to trim surrounding whitespace from a string?

The built-in library subr-x.el has included the inline functions string-trim-left, string-trim-right, and string-trim since Emacs 24.4:

(eval-when-compile (require 'subr-x))

(string-trim "\n\r\s\tfoo\n\r\s\t") ; => "foo"

Since Emacs 26.1 these inline functions also accept optional regexp arguments:

(eval-when-compile (require 'subr-x))

(string-trim "aabbcc" "a+" "c+") ; => "bb"

Since Emacs 28.1 these functions are preloaded (no need to load subr-x), and they are no longer inline.

Basil
  • 12,019
  • 43
  • 69
9

There is the string manipulation library s.el where trimming whitespace and newlines at the beginning and the end of a string is implemented as function s-trim. I cite that function here with its dependencies:

(defun s-trim-left (s)
  "Remove whitespace at the beginning of S."
  (declare (pure t) (side-effect-free t))
  (save-match-data
    (if (string-match "\\`[ \t\n\r]+" s)
        (replace-match "" t t s)
      s)))

(defun s-trim-right (s)
  "Remove whitespace at the end of S."
  (save-match-data
    (declare (pure t) (side-effect-free t))
    (if (string-match "[ \t\n\r]+\\'" s)
        (replace-match "" t t s)
      s)))

(defun s-trim (s)
  "Remove whitespace at the beginning and end of S."
  (declare (pure t) (side-effect-free t))
  (s-trim-left (s-trim-right s)))

Some differences to your first attempt

(replace-regexp-in-string
 "^[^[:alnum:]]*\\(.*\\)[^[:alnum:]]*$"
 "\\1" my-string)

are noteworthy:

  1. ^ as first char does not match the beginning of the string but the beginning of a line in the string. Similarly, $ matches not the end of the string but the end of a line. Use \` for the beginning of the string and \' for the end.
  2. Do not match stuff which you actually do not need to analyze. This regards the stuff \\(.*\\) which you match as the actual string to be returned. It may be long and you force replace-regexp-in-string to scan it.
  3. The character class [:alnum:] does not include characters of syntax class symbol. Therefore your function would also trim away characters that belong to this character class.
NickD
  • 27,023
  • 3
  • 23
  • 42
Tobias
  • 32,569
  • 1
  • 34
  • 75
  • Thanks for s.el! As to your three points: 1. I thought the escaped backquote and escaped apostrophe were for buffers. 2. Good point! 3. I'm not worried about non-alnum characters in this case, but in other cases I might be. – user23847 Jun 29 '19 at 22:44
  • @user23847 About 1.: The manual uses the phrase "string or buffer". I [cite](https://www.gnu.org/software/emacs/manual/html_node/emacs/Regexp-Backslash.html#Regexp-Backslash): \` matches the empty string, but only at the beginning of the string or buffer (or its accessible portion) being matched against. \' matches the empty string, but only at the end of the string or buffer (or its accessible portion) being matched against. – Tobias Jun 29 '19 at 22:58
1

string-trim has been moved to subr.el from subr-x.el as of this commit in March 2021


Note: Do not have enough rep to put this as a comment under @basil's answer.

debsingh
  • 41
  • 3