1

Is there function/package that allows to make "wide string literals" as defined by C/C++ standards. That is, where one has to replace all non-ascii characters with escaped "\x.." sequences.

dilettant
  • 135
  • 2
  • Note there were some old definitions in my code and I missed to require `seq`. Both problems are corrected now. – Tobias Jul 03 '17 at 15:51

1 Answers1

1

The following function transforms unicode strings into literal c-strings. Note however that the transformation is encoding dependent.

For interactive use just write your c-string with multibyte characters place point in the string and call M-x string-to-cStr.

(require 'seq)
(require 'cl-lib)

(defun string-to-cStr (string)
  "Return STRING encoded as literal c-string.
For interactive use just write your string with multibyte characters place point in the string and call M-x `string-to-cStr'."
  (interactive (string-to-cStr-interactive-form))
  (let ((ret (seq-mapcat (lambda (ch) (if (<= ch 127) (format "%c" ch)
                                        (format "\\x%.2x" ch)))
                         (encode-coding-string string buffer-file-coding-system) 'string)))
    (when (called-interactively-p 'any)
      (string-to-cStr-interactive-form string ret))
    ret))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(defun string-to-cStr-interactive-form (&optional string ret)
  "Handles all the bits for the interactive use of `string-to-cStr'."
  (if (stringp string)
      (save-excursion
        (kill-region (get-text-property 0 'start string)
                     (get-text-property 0 'end string))
        (insert ret))
    (let ((ppss (syntax-ppss)))
      (cl-assert (derived-mode-p 'c-mode 'c++-mode))
      (cl-assert (eq (syntax-ppss-context ppss) 'string))
      (let* ((start (nth 8 ppss))
             (end (save-excursion
                    (goto-char start)
                    (forward-sexp)
                    (point))))
        (list (propertize (buffer-substring-no-properties start end) 'start start 'end end))))))

A small demonstration in an utf8 buffer:

printf("Test unicode characters such as ⇒ and ∫.");

/* becomes */

printf("Test unicode characters such as \xe2\x87\x92 and \xe2\x88\xab.");
Tobias
  • 32,569
  • 1
  • 34
  • 75
  • @npostavs I've corrected my answer according to your comments. Thanks. – Tobias Jul 03 '17 at 16:32
  • Now I wonder why you define `string-to-unibyte*` instead of just using the existing `string-to-unibyte`? :p – npostavs Jul 03 '17 at 18:23
  • @npostavs: Because of the following remark in the documentation of `string-to-unibyte`: "If STRING contains a non-ASCII, non-‘eight-bit’ character, an error is signaled." The original string contains multibyte characters. These are re-interpreted as a sequence of unibyte characters through `(set-buffer-multibyte nil)` in the starred variant `string-to-unibyte*`. The non-starred variant `string-to-unibyte` bails out in such a case. – Tobias Jul 03 '17 at 19:15
  • Ah, I think you really want `encode-coding-string` which will also let you choose the encoding (or use `buffer-file-coding-system` if you don't want to choose). – npostavs Jul 03 '17 at 20:12
  • @npostavs `cl` was just an old habit. It is corrected. – Tobias Jul 04 '17 at 04:17
  • @npostavs The help string of `encode-coding-string` didn't say that that function returns a unibyte-string but the info file does: "-- Function: encode-coding-string ... The result of encoding is a unibyte string." So I accepted your proposal. Thanks again. – Tobias Jul 04 '17 at 07:14
  • Thanks, provided function seems to do the job. Its strange emacs doesn't have something like this bundled. – dilettant Jul 05 '17 at 08:21