7

What is the simplest way of getting the number of distinct repeated in a region?

For example, from

THIS IS LINE A
THIS IS LINE A
THIS IS LINE A
THIS IS LINE B
THIS IS LINE B
THIS IS LINE C

I would like to get

THIS IS LINE A    3
THIS IS LINE B    2
THIS IS LINE C    1

The output could be made over the same region (replacing the current selection).

rsenna
  • 315
  • 3
  • 9

2 Answers2

10

On Linux, and I assume Mac, you can pipe the region through the uniq shell command to get almost exactly what you want.

  1. Mark the region

  2. Sort the lines with M-x sort-lines

  3. Call shell-command-on-region with the prefix key: C-u M-|

  4. Enter uniq --count

The contents of the buffer will be replaced by:

  3 THIS IS LINE A
  2 THIS IS LINE B
  1 THIS IS LINE C

You can further automate this with keyboard macros etc., but this may be good enough as is.

EDIT: as @phils points out, you can do the sorting with a shell command instead of with the Emacs function. In this case, drop step 2, and for step 4 enter sort | uniq -c instead of just uniq -c.

Tyler
  • 21,719
  • 1
  • 52
  • 92
  • Nice! On a Mac `uniq` takes the `-c` option to prepend counts, and I don't think you need to sort before using `uniq`. (Also, the OP asked processing *the region*, not the whole buffer.) – Constantine Dec 11 '14 at 21:43
  • Thanks. On Linux `-c` and `--count` are synonyms, and you do need to sort, but maybe the Mac version uses different defaults. I will correct step 1! – Tyler Dec 11 '14 at 21:45
  • I just `ssh`'d into a box running`Ubuntu 14.04.1 LTS`: still no sorting needed for me. – Constantine Dec 11 '14 at 21:48
  • Strange. Did you try on a region with the lines not already in sorted order? I'm running Debian, and without sorting the results are not collated properly. – Tyler Dec 11 '14 at 21:52
  • My bad: I failed to test it with the input that is not sorted. – Constantine Dec 11 '14 at 21:53
  • 1
    Tyler: `C-u M-|` `sort | uniq -c` – phils Dec 11 '14 at 23:29
  • This is nice. Unfortunately I happen to use Windows at work, and that's where I'm in need of that. – rsenna Dec 12 '14 at 12:04
  • @rsenna: Well, my answer should work on Windows, too. :-) Just evaluate that code and then do `M-x insert-line-stats` after highlighting the region. – Constantine Dec 12 '14 at 16:04
  • @Constantine I'm aware it works on Windows, tested it with success. I +1'd both your answer and Tyler's btw. I ended up accepting Tyler's because it is surely *simpler* then yours (considering that I asked about the simplest way, and I did not specify that it should work in Windows). – rsenna Dec 12 '14 at 16:13
  • @rsenna: You are the one who posed – Constantine Dec 12 '14 at 16:16
  • @Constantine yup. I explaining all this because I really wasn't sure about which answer I should accept. Being an OP on Stack Exchange is hard sometimes... :) – rsenna Dec 12 '14 at 16:21
  • 1
    Ugh. I'm too slow to edit comments. Here's what I intended to say: "@rsenna: You're the one who posed the question; glad to know that it worked for you. (I don't care about reputation points; I appreciate a +1, but I *absolutely agree* that my answer does not give the "simplest way".)" – Constantine Dec 12 '14 at 16:23
5

I see three tasks here:

  1. Get a list of lines in a region, without duplicates.
  2. For each line in this list count how many times it occurred in the original region and collect this information.
  3. Insert the summary.

 

(defun uniqify-lines (beg end)
  "Return a list of lines in a region (without duplicates). Omit empty lines."
  (let ((text (buffer-substring beg end)))
    (with-temp-buffer
      (insert text)
      (delete-duplicate-lines (point-min) (point-max))
      (split-string (buffer-string) "\n" t))))

(defun count-duplicates (beg end)
  "Count duplicate lines in a region. Returns a list of the
    form ((line . count) ...)."
  (mapcar (lambda (str)
            (cons str (how-many (regexp-quote str) beg end)))
          (uniqify-lines beg end)))

(defun insert-line-stats (beg end)
  "Remove duplicate lines in the region. Append the number of
    occurences to each line in the result. Replaces current region."
  (interactive "r")
  (let ((stats (count-duplicates beg end)))
    (kill-region beg end)
    (mapc (lambda (line)
            (insert (format "%s %d\n" (car line) (cdr line))))
          stats)))
Constantine
  • 9,072
  • 1
  • 34
  • 49
  • I didn't know `how-many` or `delete-duplicate-lines` existed - sometimes it seems like you can just string english words together with hyphens and Emacs knows what to do! I suspect there's a built-in Emacs version of `uniq` as well, but I didn't find it. – Tyler Dec 11 '14 at 21:56
  • 2
    This is a very good answer. And since it does not depend on any external command, it also works in Windows. – rsenna Dec 12 '14 at 16:15