3

Sample text:

This is first line.a
This is over_second line.
This is over_fourth line.
This is third line.
This is over_fourth delete it.
This is over_fourth and one more.
This is over_second with another text.

Need to delete lines where partial match occur, i.e if over_second occurs in another line, then whole line should get deleted. So output will be as follows;

This is first line.
This is over_second line.
This is third line.
This is over_fourth line.

I could only come up with: over_\w+ for selecting section of text. But don't know how to recognize duplicates, and delete whole line.

Drew
  • 75,699
  • 9
  • 109
  • 225
msinfo
  • 177
  • 1
  • 6
  • Is the requirement "for each specific match for the regexp, retain only the first line in the buffer containing that text" ? – phils Nov 29 '18 at 22:11
  • Yes. Line with first instance of match should be preserved, while subsequent matches or remaining matches should get deleted. – msinfo Nov 29 '18 at 22:14
  • Once this process gets complete for over_second, same should be repeated for over_fourth. – msinfo Nov 29 '18 at 22:16

3 Answers3

2
  1. Try delete-duplicate-lines, which is part of distributed Emacs.

  2. Emacs Wiki page Duplicate Lines might help.

    • It points to a blog post about it.

    • It explains why interactive search-and-replace might not help.

    • It explains how to do it with Lisp, in various ways.

    • It explains how to do it with the UNIX / GNU/Linux command sort or unique.

Drew
  • 75,699
  • 9
  • 109
  • 225
  • These are good suggestions, but OP was asking about deleting *partial duplicates*, i.e., lines that contain the same string, but aren't full duplicates of each other. I don't think delete-duplicate-lines does that? – Tyler Nov 29 '18 at 22:48
  • @Tyler: Oh, right. Maybe the other stuff on that page will help in some way... – Drew Nov 30 '18 at 00:40
1

If you don't need to do this all the time, the quickest solution might be using interactive replacement: query-replace-regexp, bound to C-M-%. Start with point at the top of the buffer.

Note, if you want to delete entire lines, you'll need to include a newline in your regexp. You enter this at the prompt with C-q C-j.

So, for over_second, call C-M-%, then enter the regular expression:

C-qC-j.*over_second.*

This will match an entire line that contains the string over_second, and includes the previous new line.

Then enter the empty string (just type <enter>) for the replacement value.

The first line that matches the regexp should now be highlighted. This is the one you want to keep, so type n to tell Emacs to skip it. The next line will be highlighted. You can delete this by typing y (or <space>).

You can keep typing y until all the matches are deleted, or you can type ! to delete all remaining matches at once.

Tyler
  • 21,719
  • 1
  • 52
  • 92
0

I could swear this is a duplicate, but I couldn't find it.

Try this:

(defun my-delete-duplicate-matches (regexp)
  "Delete matching lines, except the first instance of each specific match."
  (interactive (list (read-regexp "Regexp: ")))
  (save-restriction
    (when (use-region-p)
      (narrow-to-region (region-beginning) (region-end)))
    (save-excursion
      (goto-char (point-min))
      (let ((matches (make-hash-table :test #'equal)))
        (save-match-data
          (while (re-search-forward regexp nil :noerror)
            (if (not (gethash (match-string 0) matches))
                (puthash (match-string 0) t matches)
              (forward-line 0)
              (delete-region (point) (progn (forward-line 1)
                                            (point))))))))))

Caveats:

  • If the same matching text appears twice on the first line in which it is found, the line will be deleted.

  • More generally, if a line contains both the first instance of a given match, and a repeat instance of an earlier match, the line will be deleted.

  • Multi-line patterns are not supported.

phils
  • 48,657
  • 3
  • 76
  • 115