6

For me, one of Emacs' most useful functions is delete-duplicate-lines. I call it this way, which very helpfully leaves blank lines intact:

(defun delete-duplicate-lines-keep-blanks ()
  (interactive)
  (delete-duplicate-lines (region-beginning) (region-end) nil nil t)) 

However, I'd like to be queried. How can I direct Emacs to query me, showing me each duplicate line, and giving me the option to delete it or leave it intact?

Jordon Biondo
  • 12,332
  • 2
  • 41
  • 62
incandescentman
  • 4,111
  • 16
  • 53

2 Answers2

4

It seems to be too difficult to make the original delete-duplicate-lines behave in the way you want. Here's something that might do the job though:

(defun my/update-lines (bunches pos keep)
  (cl-loop with dec = (if keep 0 1)
           for line being the hash-key of bunches
           using (hash-value positions) do
           (puthash
            line
            (cl-loop for p in positions
                     if (< p pos) collect p
                     else if (> p pos) collect (- p dec))
            bunches)))

(defun my/suggest-delete-line (line)
  (let ((len (length line)))
    (move-overlay selection (point) (+ (point) len))
    (let* ((inhibit-quit t)
           (answer 
            (with-local-quit
              (read-key
               (format "Delete '%s%s'? [y]es/[n]o"
                       (substring line 0 (min len 13))
                       (cond
                        ((> len 16) "...")
                        ((> len 13) (substring line 13 len))
                        (t "")))))))
      (when (= answer ?y)
        (delete-region
         (point)
         (progn
           (move-end-of-line 1)
           (forward-char)
           (point))))
      answer)))

(defun my/delete-duplicate-lines (beg end)
  (interactive
   (if (region-active-p)
       (list (region-beginning) (region-end))
     (list (point-min) (point-max))))
  (let ((ignore-white (< (prefix-numeric-value current-prefix-arg) 1))
        (ignore-blank (< (prefix-numeric-value current-prefix-arg) 4))
        (bunches (make-hash-table :test 'equal))
        (selection (make-overlay 1 1)))
    (overlay-put selection 'face 'secondary-selection)
    (save-excursion
      (goto-char beg)
      (move-beginning-of-line 1)
      (cl-loop for lnum = (count-lines (point-min) beg)
               then (1+ lnum)
               for line = (buffer-substring-no-properties
                           (point)
                           (progn
                             (move-end-of-line 1)
                             (point)))
               while (< (point) end) do
               (forward-char)
               (unless
                   (or (and (string-match "[ \t]+" line) ignore-white)
                       (and (string-match "^$" line) ignore-blank))
                 (puthash line (cons lnum (gethash line bunches)) bunches))))
    (cl-loop for line being the hash-key of bunches 
             using (hash-value positions)
             unless (cdr positions) do
             (remhash line bunches))
    (cl-loop named :outer for line being the hash-key of bunches do
             (cl-loop for positions = (gethash line bunches)
                      while positions do
                      (cl-loop with continue = t
                               for pos in positions
                               while continue do
                               (goto-char (point-min))
                               (forward-line pos)
                               (recenter)
                               (cl-case (my/suggest-delete-line line)
                                 (?\C-g (cl-return-from :outer))
                                 (?y)
                                 (otherwise (setf continue nil)))
                               (my/update-lines bunches pos continue))))
    (delete-overlay selection)))

Which, certainly, could be improved, but at the first glance seems to do what you want.

wvxvw
  • 11,222
  • 2
  • 30
  • 55
  • Thanks! I tried this, answering yes to the queries, and it deleted all kinds of lines that weren't duplicates. – incandescentman Feb 17 '16 at 04:03
  • @incandescentman can you give an example file where it deletes non-duplicate lines? – wvxvw Feb 17 '16 at 06:08
  • PS: I've reworked the code to ignore blank and all-whitespace lines by default. – wvxvw Feb 17 '16 at 11:50
  • That seems to have fixed it! – incandescentman Feb 17 '16 at 21:54
  • Is there a way to make it `recenter-top-bottom`? It's a bit tough to scan for the cursor in different parts of the screen each time. – incandescentman Feb 17 '16 at 23:17
  • 1
    @incandescentman I do not know whether you got my version running already. But i think, the best way to inform the user of the region to be deleted is to mark it by some distinct text property. I use the face for the secondary selection for that purpose. – Tobias Feb 18 '16 at 22:17
  • @Tobias That sounds great, how would I implement that? – incandescentman Feb 18 '16 at 22:20
  • 1
    @incandescentman It is already implemented in the code of my answer. Ah, it is not text properties, it is an overlay. Sorry for that. Just search for `overlay-put`. – Tobias Feb 18 '16 at 22:23
  • Hi wvxvw, you could move an overlay to the line in question and use `recenter`. Just have a peep at my answer;-). On the other hand, I do not have your fancy truncated string;-) in the prompt of the query. Note, that I don't like code doubling. That is the main reason why I posted my alternative. Hope you are okay with that. Best regards, Tobias. – Tobias Feb 18 '16 at 22:49
  • @Tobias it's a bit late here now, but I'll do that tomorrow. – wvxvw Feb 18 '16 at 23:01
  • @incandescentman I've tried to incorporate some of the Tobias' suggestions, plus some minor issues (s.a. overlay being deleted once you give up interacting with this function). – wvxvw Feb 19 '16 at 13:58
  • This seems to delete both instances of a duplicated line if I'm not careful (ie if i answer yes twice), which is never desirable. Can we change this behavior, so that when the user is prompted, declining to delete one instance of a duplicate line means I necessarily and automatically want to preserve the other instance? – incandescentman May 02 '16 at 20:34
  • Also, if there's a long block of duplicated lines, this asks me line-by-line whether I want to keep each line. Is it possible to direct this function to look for entire duplicated sections, rather than duplicated lines? We'd define a duplicated section as the longest continuous sequence of duplicated lines. Thus some duplicated sections would be just one line, others would be many lines long. For each duplicated section, it would ask the user whether to keep it or discard (rather than asking for each line separately). I realize this is asking a lot, but it's the more realistic use case. – incandescentman May 03 '16 at 02:49
  • @incandescentman Sorry, I'm so slow to reply. I'll probably get back to working on this some time after Sunday. I think that 1 shouldn't be problematic, except that it may actually make sense: consider someone who wants to delete all duplicated lines that happen after the first one (which they want to keep), this means they'd answer "no" the first time, but "yes" for all later instances. This probably needs to have a more complicated way of answering the prompt: "yes to all", "yes to this one", "no to all", "no to this one". – wvxvw May 05 '16 at 09:20
  • @wvxvw Thanks! in my case, I never want to delete all instances of duplicates. for me, the only problem with duplicates is that they're duplicated. I'll always want to preserve on of them. – incandescentman May 06 '16 at 16:48
2

The following code advices delete-duplicate-lines to get what you want but retains as much of the behavior of the original version as possible.

EDIT: Note, that delete-region is replaced by the corresponding subroutine in the compiled lisp file sort.elc. So we need to reload the source code sort.el.gz to advice delete-region within delete-duplicate-lines.

Make sure that you have have emacs24-el installed on your system.

If you have sort.el instead of sort.el.gz you should adapt the load-library line in the code. You can detect where delete-duplicate-lines is defined by C-h f delete-duplicate-lines. If it just says compiled lisp function but does not give you a source file name then you have only the byte compiled file sort.elc. If you wanted to try the code below you need to get the corresponding source in some way. If the help page tells you that delete-duplicate-line is defined in sort.el, then click the corresponding link. It may be that the link leads you to sort.el or sort.el.gz anyway. In this case you are good with the code below.

(eval-when-compile
  (require 'cl-macs))

(defun delete-duplicate-lines-interactive (oldfun &rest args)
  "Make `delete-region' interactive in `delete-duplicate-lines'."
  (let ((reverse (nth 2 args))
    (adjacent (nth 3 args))
    (arg-interactive (nth 5 args)))
    (if arg-interactive
    (let ((ol (make-overlay 1 1)))
      (overlay-put ol 'face 'secondary-selection) ;; could be customizable
      (unwind-protect
          (let (continue
            (deleted-lines 0))
        (catch :quit
          (advice-add 'delete-region
                  :around
                  (lambda (delete-region-original start end)
                (if continue
                    (funcall delete-region-original start end)
                  (recenter)
                  (move-overlay ol start end)
                  (cl-case  (let (mark-active) (read-key "Delete line? ([y]es, [n]ext, [!] all, [q]uit, any other key is equivalent to next):"))
                    (?y
                     (funcall delete-region-original start end)
                     (setq deleted-lines (1+ deleted-lines)))
                    (?!
                     (setq continue t)
                     (funcall delete-region-original start end)
                     (setq deleted-lines (1+ deleted-lines)))
                    (?q
                     (throw :quit nil))))
                (when reverse (goto-char start)))
                  '((name . interactive)))
          (apply oldfun args))
        (message "Deleted %d %sduplicate line%s%s"
             deleted-lines
             (if adjacent "adjacent " "")
             (if (= deleted-lines 1) "" "s")
             (if reverse " backward" "")))
        (advice-remove 'delete-region 'interactive))
      (delete-overlay ol))
      (apply oldfun args) ;; non-interactive case
      )))

(advice-add 'delete-duplicate-lines :around 
        #'delete-duplicate-lines-interactive)

(load-library "sort.el") ;; Somehow `delete-region` is replaced in "sort.elc". Therefore load the source version again.

(defun delete-duplicate-lines-keep-blanks ()
  (interactive)
  (delete-duplicate-lines (region-beginning) (region-end) nil nil t t))
Tobias
  • 32,569
  • 1
  • 34
  • 75
  • How does it work? Do I need to somehow add `call-interactively` to `(defun delete-duplicate-lines-keep-blanks () (interactive) (delete-duplicate-lines (region-beginning) (region-end) nil nil t))`? – incandescentman Feb 17 '16 at 21:58
  • I changed the detection of interactive calls. Instead of relying on `called-interactively-p` I test the `interactive`-argument of `delete-duplicate-lines` now. The definition of `delete-duplicate-lines-keep-blanks` is shown at the end of the code. – Tobias Feb 17 '16 at 22:41
  • @incandescentman Sorry, I forgot to add your name to the answer. Therefore, I added this comment to notify you. – Tobias Feb 17 '16 at 23:09
  • I tried it and it deleted my lines without querying me. Emacs 24.5.1, org 8.3.3. – incandescentman Feb 17 '16 at 23:13
  • 1
    @incandescentman I have been able to reproduce the problem. I did not notice it before since I always debugged the stuff and therefore always loaded the source version of `sort.el.gz`. This also indicates the workaround. Load the source version of the library `sort`. I have added it to the code. Note, that the source of `sort` must be installed in the system. If you are writing elisp yourself this is most likely the case. – Tobias Feb 18 '16 at 05:35
  • 1
    @incandescentman I've corrected some minor issues to make the code work with `emacs -Q`. Happy deleting;-). The only minor issue that remains is the number of deleted duplicates in the final message. The original version of `delete-duplicate-lines` assumes at this point that all duplicate lines have been deleted. – Tobias Feb 18 '16 at 08:54
  • 1
    @incandescentman The small issues mentioned in the last comment are also avoided now. The final message is now generated by the advice. – Tobias Feb 18 '16 at 09:12
  • 1
    @incandescentman I have replaced `(load-library "sort.el.gz")` by `(load-library "sort.el")`. This should work for all source versions -- compressed and uncompressed ones. – Tobias Feb 18 '16 at 22:41