Is there function/package that allows to make "wide string literals" as defined by C/C++ standards. That is, where one has to replace all non-ascii characters with escaped "\x.." sequences.
Asked
Active
Viewed 128 times
1 Answers
1
The following function transforms unicode strings into literal c-strings. Note however that the transformation is encoding dependent.
For interactive use just write your c-string with multibyte characters place point in the string and call M-x string-to-cStr
.
(require 'seq)
(require 'cl-lib)
(defun string-to-cStr (string)
"Return STRING encoded as literal c-string.
For interactive use just write your string with multibyte characters place point in the string and call M-x `string-to-cStr'."
(interactive (string-to-cStr-interactive-form))
(let ((ret (seq-mapcat (lambda (ch) (if (<= ch 127) (format "%c" ch)
(format "\\x%.2x" ch)))
(encode-coding-string string buffer-file-coding-system) 'string)))
(when (called-interactively-p 'any)
(string-to-cStr-interactive-form string ret))
ret))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
(defun string-to-cStr-interactive-form (&optional string ret)
"Handles all the bits for the interactive use of `string-to-cStr'."
(if (stringp string)
(save-excursion
(kill-region (get-text-property 0 'start string)
(get-text-property 0 'end string))
(insert ret))
(let ((ppss (syntax-ppss)))
(cl-assert (derived-mode-p 'c-mode 'c++-mode))
(cl-assert (eq (syntax-ppss-context ppss) 'string))
(let* ((start (nth 8 ppss))
(end (save-excursion
(goto-char start)
(forward-sexp)
(point))))
(list (propertize (buffer-substring-no-properties start end) 'start start 'end end))))))
A small demonstration in an utf8 buffer:
printf("Test unicode characters such as ⇒ and ∫.");
/* becomes */
printf("Test unicode characters such as \xe2\x87\x92 and \xe2\x88\xab.");

Tobias
- 32,569
- 1
- 34
- 75
-
-
Now I wonder why you define `string-to-unibyte*` instead of just using the existing `string-to-unibyte`? :p – npostavs Jul 03 '17 at 18:23
-
@npostavs: Because of the following remark in the documentation of `string-to-unibyte`: "If STRING contains a non-ASCII, non-‘eight-bit’ character, an error is signaled." The original string contains multibyte characters. These are re-interpreted as a sequence of unibyte characters through `(set-buffer-multibyte nil)` in the starred variant `string-to-unibyte*`. The non-starred variant `string-to-unibyte` bails out in such a case. – Tobias Jul 03 '17 at 19:15
-
Ah, I think you really want `encode-coding-string` which will also let you choose the encoding (or use `buffer-file-coding-system` if you don't want to choose). – npostavs Jul 03 '17 at 20:12
-
-
@npostavs The help string of `encode-coding-string` didn't say that that function returns a unibyte-string but the info file does: "-- Function: encode-coding-string ... The result of encoding is a unibyte string." So I accepted your proposal. Thanks again. – Tobias Jul 04 '17 at 07:14
-
Thanks, provided function seems to do the job. Its strange emacs doesn't have something like this bundled. – dilettant Jul 05 '17 at 08:21