1

I have a file in which the fields are separated by comma.

Some example input:

col1,"1,2",col3
col1,"1,2,3",col3
col1,"1  2,3",col3
col1,"1 "2,3"",col3

Now, I have to fetch the second column, so that I get:

"1,2"
"1,2,3"
"1  2,3"
"1 "2,3""

cut -d, -f2 file doesn't do what I want.

So, how can I retrieve column 2 from the above input?

gus
  • 11
  • Can you do anything about the format of the input? This seems to need almost a full-fledged CSV parser, and even then might not be perfect. Particularly, can you avoid or at least somehow escape the inner quotation marks, or use something other than a quotation mark to signify strings with whitespace? (It would be much easier if there's at least one specific character which only means "until this occurs again, treat everything as belonging to the same column".) – user Jul 22 '13 at 11:23
  • 1
    You may find hints on telling literal and separator commas apart in Remove comma between the quotes only in a comma delimited file. – manatwork Jul 22 '13 at 11:41

2 Answers2

0

If you have openoffice, you can use it to open the file, and while openeing set the "Seperator values" to comma. Then you can copy the second column.

Kartik
  • 2,004
  • Unfortunately at least my version of LibreOffice doesn't seem to like the inner quotation marks, and gets confused. – user Jul 22 '13 at 11:41
  • Actually, I have to write a shell script and then fetch a column and perform certain checks for each value in a column. So, Open Office is not going to help. – gus Jul 22 '13 at 11:41
0

Python has a csv module that will deal with this automatically. For instance, here's a short script that does what you want, reading from stdin:

import csv
import sys

for columns in csv.reader(sys.stdin, delimiter=","):
    print columns[1]

You can save this in a file, or alternatively, it shouldn't be too hard to call this from the command line:

... | python -c "import csv, sys; print [c[1] for c in csv.reader(sys.stdin, delimiter=',')]"
rahmu
  • 20,023