4

Possible Duplicate:
Is there a robust command line tool for processing csv files?

I can use cut to extract columns from a file:

$ cat foo
foo,bar,hello
bash,baz,world

$ cut -d\, -f2 foo
bar
baz

But what if a column has a delimiter in it that is safe inside an enclosure like this?

$ cat foo
foo,"hello, world",bar
bash,goodbye,baz

$ cut -d\, -f2 temp12
"hello
goodbye

Is there some way to tell cut to respect the enclosure of quote "" characters?

Cory Klein
  • 18,911

2 Answers2

3

Definitely not for GNU cut, at least:

/* The delimeter character for field mode. */ static unsigned char delim;

(as seen in GNU coreutils source)

Leonid
  • 963
1

cut could do it if you first preprocess its input to escape the characters inside the quotes (for instance, replace "_" with "_u" and "," with "_c" inside quotes, or replace every character with their 2-byte hex notation) and postprocess it to restore that escaping.

Something like:

perl -pe 's/"(.*?)"/"\"".unpack("H*", $1)."\""/ge' |
  cut -d, -f2 |
  perl -pe 's/"(.*?)"/"\"".pack("H*",$1)."\""/ge'

(assuming there's no escaped quotes inside (or outside) the quotes).

But, given the effort it would require, you might as well use a proper csv parser or do it whole with a perl-like regular expression engine.