1

I want to parse a csv file in zsh row by row and store it in an array (without the commas). Is it possible to import a single row into an array in zsh, then afterwards grab the next row?

The issue is I am using a large csv file and cannot import it all quickly. I tried using the code below:

arr_csv=() 
while IFS= read -r line 
do
    arr_csv+=("$line")
done < import.csv

but since the file is large, I want to read and store a single line (or access a single line).

I understand I could modify the code such that

arr_csv=() 
while IFS= read -r line 
do
    arr_csv=("$line")
    # some modifications
done < import.csv

but if I wanted to loop over the file it would be easier if I could use an index corresponding to the row from the csv file. Also, this method does not remove the commas separating the rows.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255

1 Answers1

1

I'd say that rather calls for using a proper programming language with CSV support such as perl / python... here rather than a shell.

But, if you have to use zsh and don't mind the newline and carriage return from individual cells being removed, you could use csvkit's csvformat to reformat the csv into something that zsh's read can deal with more easily:

< file.csv csvformat -SU3 -P'\' |
  while IFS=, read -A array; do
    typeset array # or anything with $array
  done

For instance, on an input like:

"foo bar ", "x,y", "blah""blah","new
line"
1,,2,"\\"

Which includes samples of typical potential pitfalls associated with csvs, that gives:

array=( 'foo bar ' x,y 'blah"blah' newline )
array=( 1 '' 2 '\\' )

Note the absence of -r so read does recognise \ as an escape character. Unfortunately, while csvformat escapes <newline> with \<newline>, for read, that's interpreted as a line continuation rather than an escaped newline.

If you know of two characters that never occur in the input, you could use those as field delimiter and record delimiter respectively. For instance, could be the ASCII Record Separator and Unit Separator control characters which would be seem fitting here.

us=$'\x1f' rs=$'\x1e'
< file.csv csvformat -SU3 -D$us -M$rs -Q$rs |
  while IFS=$us read -rd$rs -A array; do
    something with $array
  done

Which this time, on the same input gives:

array=( 'foo bar ' x,y 'blah"blah' $'new\nline' )
array=( 1 '' 2 '\\' )