2

Suppose I have a section in a file which is in csv form, e.g.

     a1, bbb1,  cc1, ddd1,  ee1
    aa2,   b2, ccc2,  dd2,   e2
   aaa3,  bb3,   c3,   d3,  ee3
    aa4,   b4,  cc4, ddd4, eee4

Then, suppose I want to replace an entire column, say one of the a's, b's, c's, d's, e's with the same value. To make this a little more general, assume I want to do this for any delimiter, not just comma. So, it might be a space or some other character as delimiter.

Furthermore, I want to do this interactively, treating that section of the file as a region using emacs point and mark. I'd also prefer to do this without the assistance of a package. Are there any good approaches using just the standard tools in Emacs? I'm using Emacs 24.3, but I don't want a method that is very version-sensitive.

OPTIONAL EXTRA: To make this even more general, if one could use the same method to replace an entire row with the same value, that would be even better.

Faheem Mitha
  • 406
  • 4
  • 16
  • Related: http://emacs.stackexchange.com/a/7446/115 In that solution have a look at the link to Rectangle commands. `C-x r t` or `string-rectangle` will get what you want. – Kaushal Modi Jan 28 '15 at 12:19
  • @kaushalmodi Yes, I'm familiar with the rectangle commands, but I don't see how to apply them directly. I should be clear that the entries in the hypothetical delimited file aren't all the same length, and afaik the rectangle commands all operate based on the number of characters from the beginning of the line, so that would not apply directly. – Faheem Mitha Jan 28 '15 at 13:30
  • @kaushalmodi Changed the example so that the lengths of the text in the columns isn't the same on each line. – Faheem Mitha Jan 28 '15 at 13:38
  • The rectangle method applied regardless. Select the rectangle that covers the whole column, kill that rectangle and use string-rectangle to type in the identical string you want in all columns. If the delimiters are not aligned to begin with, use align-regexp before doing this. I'll post a detailed solution when I get a chance. – Kaushal Modi Jan 28 '15 at 13:56
  • @kaushalmodi Ah, Ok. I didn't think of aligning first. That sounds like a reasonable approach. – Faheem Mitha Jan 28 '15 at 14:47

3 Answers3

2

Solution

Quick summary

Align the text based on delimiters (, in this case), mark the column and type the identical text of your choice in that column using C-x r t.

Walkthrough

Let's start with a non-aligned version of your example table, assuming that you start from a csv file snippet.

a1,bbb1,cc1,ddd1,ee1
aa2,b2,ccc2,dd2,e2
aaa3,bb3,c3,d3,ee3
aa4,b4,cc4,ddd4,eee4

Select that block and align based on the delimiter using align-regexp:

C-u M-x align-regexp , RET RET RET y

If you'd like to understand that, there is a good explanation in this Emacs SE answer.

That will give you

a1   ,bbb1 ,cc1  ,ddd1 ,ee1
aa2  ,b2   ,ccc2 ,dd2  ,e2
aaa3 ,bb3  ,c3   ,d3   ,ee3
aa4  ,b4   ,cc4  ,ddd4 ,eee4

Let's say you want to replace the 2nd column with xyz.

First mark the second column. Place the point and mark as shown below by the black and white rectangles. The black rectangle is hiding the b character underneath.

a1   ,▮bb1 ,cc1  ,ddd1 ,ee1
aa2  ,b2   ,ccc2 ,dd2  ,e2
aaa3 ,bb3  ,c3   ,d3   ,ee3
aa4  ,b4  ▯,cc4  ,ddd4 ,eee4

Do C-x r t x y z RET ( string-rectangle command )

Result

a1   ,xyz ,cc1  ,ddd1 ,ee1
aa2  ,xyz ,ccc2 ,dd2  ,e2
aaa3 ,xyz ,c3   ,d3   ,ee3
aa4  ,xyz ,cc4  ,ddd4 ,eee4

Replacing values in the same row

There can't be a general approach for replacing the values in rows and columns. For replacing the values in a row, we can use the query-replace-regexp command or C-M-%.

Continuing from the previous result ...

a1   ,xyz ,cc1  ,ddd1 ,ee1
aa2  ,xyz ,ccc2 ,dd2  ,e2
aaa3 ,xyz ,c3   ,d3   ,ee3
aa4  ,xyz ,cc4  ,ddd4 ,eee4

Let's say you want to replace the values in the first row. So select that row and do C-M-% \b\w+\b RET uvw RET. The \b\w+\b selects a string with word boundaries and containing alpha-numeric characters.

Result

uvw   ,uvw ,uvw  ,uvw ,uvw
aa2  ,xyz ,ccc2 ,dd2  ,e2
aaa3 ,xyz ,c3   ,d3   ,ee3
aa4  ,xyz ,cc4  ,ddd4 ,eee4
Kaushal Modi
  • 25,203
  • 3
  • 74
  • 179
1

If you were using Emacs 24.4, I might recommend doing this with rectangle-mark-mode (C-x SPC), kill-ring-save (M-w) and kill-region (C-w).

With earlier versions, you could take largely the same approach by using CUA mode's rectangular selection and replacement tools. However, this is also pretty easy to do with a keyboard macro.

You can also switch modes to org mode, and convert the region to a table, make your manipulations and export it back to CSV. Just select the whole table and execute org-table-convert-region. This should work no matter the alignment. You can use M-<arrow keys> to move columns around, and M-S-left to delete a column completely. See here for working with org tables.

wdkrnls
  • 3,657
  • 2
  • 27
  • 46
  • See my edit above. Doesn't using rectangle commands assume that the text lengths between the delimiters is the same on every line? "You can also switch modes to org mode, and convert the region to a table, make your manipulations and export it back to CSV." sounds interesting. Could you give a sketch of how to do this? I'm not familar with org mode. – Faheem Mitha Jan 28 '15 at 13:33
  • Changed the example so that the lengths of the text in the columns isn't the same on each line. – Faheem Mitha Jan 28 '15 at 13:39
  • @FaheemMitha Here's a related emacs SE question: http://emacs.stackexchange.com/q/7612/115 in which solutions suggest using org-mode and multiple cursors to edit tables of text. – Kaushal Modi Jan 28 '15 at 14:02
  • Ok, thanks for the link. I'll play with org-mode. – Faheem Mitha Jan 28 '15 at 14:48
  • `M-x org-mode-delete-column` is easier to remember, though longer to type... – Faheem Mitha Jan 28 '15 at 15:19
  • This might be too trivial for a separate question, but is there a command for deleting multiple columns at once? – Faheem Mitha Jan 28 '15 at 16:11
  • A significant downside of this method is that org-mode doesn't seem to have a way to convert a region in-place from the org-mode table format back to csv. There is the export function, but that refuses to overwrite. :-( – Faheem Mitha Jan 28 '15 at 16:46
  • These methods only work for small tables. Eventually, you get to the level complexity that you might as well write an R script to do this. – wdkrnls Jan 28 '15 at 18:28
  • Sure, I could use R, but sometimes working interactively inside an editor is more convenient, especially for smaller files. More generally, I'm interested in learning more about what facilities Emacs offers for interactive text manipulation. – Faheem Mitha Jan 28 '15 at 19:16
0

To answer the question you added in the comments, here's something I could think about:

The idea

If this is something you are going to do on a regular basis, sifting the data through Org table with some additional functions may automate the process. For example, you could create an org file and paste the CSV data into it, then wrap the CSV with #+begin_example ... #+end_example macro. Then give it some id. (Look at the birthdays_csv for a concrete example).

Then, you could have added a function to process the CSV, an example of such function is csv-to-org. This function takes the CSV data and produces an Org table from it.

There are lots of manipulations possible on Org tables, so that you could apply any of them that you like to the produced table. Then, you have a function which complements the csv-to-org--org-to-csv to generate the CSV from the Org table once again. The advantage of doing something like this over doing this in some statistical language is that you always have the tables in front of you and can more accurately control what happens to the data when it travels back and froth from CSV to Org table.

The code

#+NAME: birthdays
| Name   | Age          | Date of Birth    | Income |
|--------+--------------+------------------+--------|
| Joe    | 30 years old | <1984-02-20 Mon> |  84000 |
| Mark   | 40 years old | <1974-04-15 Mon> | 120000 |
| Sophie | 44 years old | <1970-08-28 Fri> | 150000 |
| Anna   | 26 years old | <1988-02-29 Mon> | 160000 |
#+TBLFM: $2=(now()-$3)/365;%d years old

#+NAME: org-to-csv
#+BEGIN_SRC emacs-lisp :var birthdays=birthdays :results raw output :wrap example
   (cl-loop for row in birthdays do
            (pp (car row))
            (cl-loop for cell in (cdr row)
                     for i from 0 do
                     (princ ",")
                     (if (= i 2)
                         (princ (format "$%.2f" (/ cell 1000)))
                       (pp cell)))
            (terpri))
#+END_SRC

#+NAME: birhdays_csv
#+RESULTS:
#+BEGIN_example
"Joe","30 years old","<1984-02-20 Mon>",$84.00
"Mark","40 years old","<1974-04-15 Mon>",$120.00
"Sophie","44 years old","<1970-08-28 Fri>",$150.00
"Anna","26 years old","<1988-02-29 Mon>",$160.00
#+END_example

#+NAME: csv-to-org
#+BEGIN_SRC emacs-lisp :var birthdays=birhdays_csv :results raw output
  (with-temp-buffer
    (insert birthdays)
    (org-table-convert-region (point-min) (point-max))
    (princ (buffer-substring-no-properties (point-min) (point-max))))
#+END_SRC

#+RESULTS:
| Joe    | 30 years old | <1984-02-20 Mon> | $84.00  |
| Mark   | 40 years old | <1974-04-15 Mon> | $120.00 |
| Sophie | 44 years old | <1970-08-28 Fri> | $150.00 |
| Anna   | 26 years old | <1988-02-29 Mon> | $160.00 |
wvxvw
  • 11,222
  • 2
  • 30
  • 55
  • I'm unclear what your answer is supposed to do. I'm also the lines starting with # are for. Are they just comments/delimiters, or are they supposed to be functional code? Can you elaborate? Thanks. – Faheem Mitha Jan 28 '15 at 19:15
  • Oh, is this to convert from the org table format to csv in place? I was hoping for a way to do this with the built-in org mode commands. – Faheem Mitha Jan 28 '15 at 19:21
  • @FaheemMitha starting with `#+` are Org mode commands (macros actually) they have special interpretations, eg. `#+NAME:` gives a name to the following block, so that it can be referred to elsewhere. To answer your other question: this doesn't replace the column in place, but it can apply arbitrary complex transformation to all columns at once and output it as CSV, which you could then feed back to where you started. – wvxvw Jan 28 '15 at 21:02
  • I see. I'm not at all familar with org mode, sorry. You'll probably need to spell things out a bit more for me. – Faheem Mitha Jan 28 '15 at 21:07