7

Is it possible to search for a byte sequence in hexl-mode and possibly highlight it?

E.g. in the file below I want to search the byte sequence f9beb4d9. isearch does not work because it searches the presentation in the buffer and not the original file.

00000000: f9be b4d9 1d01 0000 0100 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 3ba3 edfd  ............;...
00000030: 7a7b 12b2 7ac7 2c3e 6776 8f61 7fc8 1bc3  z{..z.,>gv.a....
00000040: 888a 5132 3a9f b8aa 4b1e 5e4a 29ab 5f49  ..Q2:...K.^J)._I
00000050: ffff 001d 1dac 2b7c 0101 0000 0001 0000  ......+|........
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
gdkrmr
  • 175
  • 7
  • Please clarify what you mean by "search a byte sequence" with an example. – Stefan Dec 09 '18 at 14:18
  • @Stefan I think "search a byte sequence" has to be replaced by "search **for** a byte sequence". The OP should correct that. If the corrected version is right it is pretty clear what "search for a byte sequence" in a hexl-buffer means. One has to take into account that the hexl buffer actually represents the byte sequence of the original buffer. The user does not need to know how hexl works and that there is a fine difference between the byte sequence in the hexl buffer and the byte sequence in the original one. – Tobias Dec 09 '18 at 14:24
  • 1
    The main question is how does he intend to specify the sequence he's looking for. – Stefan Dec 09 '18 at 14:31
  • I hope the question is clear now. – gdkrmr Dec 09 '18 at 15:35
  • @Stefan, bonus points if you can search for both the string and hex representation :-) I am only interested in the hex representation of the bytes. – gdkrmr Dec 09 '18 at 15:46

2 Answers2

2

The following lisp code puts an entry "Hexl Isearch Mode" into the "Hexl" menu.

That menu item (de-)activates the minor mode hexl-isearch-mode. If you activate that mode isearch searches in the binary data instead of the hexl buffer.

The search string is read with read. So all escape sequences for lisp strings do work. As an example you can search for \x0a\x0d or \^M\n to search for dos line ends.

The code is not perfect.

  1. Let's say you search for a string ELF\x01 which only occurs at the beginning of a file. Furthermore, assume there is a string ELf\x00 later on in the binary. Then when you arrive at ELF\x0 with typing Emacs will find the later match and if you go on in typing ELF\x01 Emacs thinks there are no occurrences of that string because it already arrived at ELF\x0 which comes later in the file than ELF\x01. It is worth to do an overlapped search in such a case. (That problem is already fixed in the git-version of the package.)

  2. Only the byte sequence is correctly high-lighted in the hexl buffer not the string representation at the right-hand side.

  3. If the search string spans two lines in the hexl buffer the string representation at the end of the line and the address at the beginning of the line are also highlighted. That is not because they belong to the match but because they are in the way when highlighting the byte sequence.

(require 'hexl)

(defvar-local hexl-isearch-raw-buffer nil
  "Buffer with the dehexlified content of the hexl buffer for hexl-isearch-mode.
This variable is set in the original hexl-mode buffer.")

(defvar-local hexl-isearch-original-buffer nil
  "This variable is set in the buffer with the dehexlified content.
It points to the corresponding hexl buffer.")

(defun hexl-address (position)
  "Return address of hexl buffer POSITION."
  (save-excursion
    (goto-char position)
    (hexl-current-address)))

(defun hexl-isearch-startup ()
  "Prepare hexl buffer for `hexl-isearch'."
  (let ((original-buf (current-buffer)))
    (setq-local hexl-isearch-raw-buffer (generate-new-buffer " hexl"))
    (setq-local isearch-search-fun-function (lambda () #'hexl-isearch-fun))
    (with-current-buffer hexl-isearch-raw-buffer
      (set-buffer-multibyte nil)
      (setq-local hexl-isearch-original-buffer original-buf)
      (insert-buffer-substring original-buf 1 (buffer-size original-buf))
      (dehexlify-buffer))))

(defun hexl-isearch-end ()
  "Cleanup after `hexl-isearch'."
  (let ((isearch-raw-buffer hexl-isearch-raw-buffer))
    (setq-local hexl-isearch-raw-buffer nil)
    (when (buffer-live-p isearch-raw-buffer)
      (kill-buffer isearch-raw-buffer))))

(defun hexl-isearch-fun (string &optional bound noerror count)
  "Search for byte sequence of STRING in hexl buffer.
The arguments BOUND and NOERROR work like in `search-forward'."
  (when bound (setq bound (1+ (hexl-address bound))))
  (setq string (read (concat "\"" string "\"")))
  (let ((point (1+ (hexl-current-address)))
    match-data)
    (with-current-buffer hexl-isearch-raw-buffer
      (goto-char point)
      (setq point (funcall (if isearch-forward #'re-search-forward #'re-search-backward)
               (if isearch-regexp
                   string
                 (regexp-quote string))
               bound noerror count))
      (setq match-data (match-data t nil t)))
    (when point
      (prog1
      (hexl-goto-address (1- point))
    (set-match-data
     (mapcar (lambda (el)
           (if (integerp el)
               (hexl-address-to-marker (1- el))
             el))
         match-data))))))

(define-minor-mode hexl-isearch-mode
  "Search for binary string with isearch in hexl buffer."
  :lighter " hi"
  (if hexl-isearch-mode
      (progn
    (setq-local isearch-search-fun-function #'hexl-isearch-fun)
    (add-hook 'isearch-mode-hook #'hexl-isearch-startup t t)
    (add-hook 'isearch-mode-end-hook #'hexl-isearch-end t t))
    (setq-local isearch-search-fun-function #'isearch-search-fun-default)
    (remove-hook 'isearch-mode-hook #'hexl-isearch-startup t)
    (remove-hook 'isearch-mode-end-hook #'hexl-isearch-end t)))

(easy-menu-add-item hexl-mode-map '(menu-bar Hexl)
            ["Hexl Isearch Mode" (if hexl-isearch-mode (hexl-isearch-mode -1) (hexl-isearch-mode)) :style toggle :selected hexl-isearch-mode] "Go to address")
Tobias
  • 32,569
  • 1
  • 34
  • 75
2

If you use nhexl-mode (available from your neighborly GNU ELPA archive), then you can do C-s f9beb4d9 and it will search for the sequence of 4 bytes with codes f9 be b4 d9 (as well as for the 8 bytes text f9beb4d9 of course, and also the bytes at addresses that include f9beb4d9 in their hex representation).

Stefan
  • 26,154
  • 3
  • 46
  • 84
  • thanks for this, it works, but performance in large files is terrible. – gdkrmr Dec 10 '18 at 19:52
  • 1
    @gdkrmr: Part of my motivation for the development of nhexl-mode was to circumvent performance problems in hexl-mode in large files. But I'm not completely surprised you're experiencing performance issues, to be honest. Please report them via `M-x report-emacs-bug` giving as many details as possible (a URL to a sample large file might also be useful, since performance can be significantly affected by the file's contents). – Stefan Dec 10 '18 at 21:19
  • 1
    Hi Stefan, I am just searching the binary and translate addresses into positions of the hexl buffer. Looks like that does not have the performance issues. Maybe that is an alternative for hexl or nhexl? I've already fixed a problem in the git-repository: https://github.com/TobiasZawada/hexl-isearch. – Tobias Dec 11 '18 at 12:01