82

I work with CSV files and sometimes need to quickly check the contents of a row or column from the command line. In many cases cut, head, tail, and friends will do the job; however, cut cannot easily deal with situations such as

"this, is the first entry", this is the second, 34.5

Here, the first comma is part of the first field, but cut -d, -f1 disagrees. Before I write a solution myself, I was wondering if anyone knew of a good tool that already exists for this job. It would have to, at the very least, be able to handle the example above and return a column from a CSV formatted file. Other desirable features include the ability to select columns based on the column names given in the first row, support for other quoting styles and support for tab-separated files.

If you don't know of such a tool but have suggestions regarding implementing such a program in Bash, Perl, or Python, or other common scripting languages, I wouldn't mind such suggestions.

Steven D
  • 46,160
  • https://github.com/eBay/tsv-utils looks nice but don't have much experience with it. One caveat: the philosophy of csv2tsv is lossy: convert inner tabs & newlines to spaces. – Beni Cherniavsky-Paskin Jun 20 '23 at 09:30

24 Answers24

76

I am probably a little bit too late, but there is another tool worth mentioning: csvkit.

It has a lot of command line tools that can:

  • reformatting CSV files,
  • convert to and from CSV from various formats (JSON, SQL, XLS),
  • the equivalent of cut, grep, sort and others, but CSV-aware,
  • join different CSV files,
  • do general SQL queries on data from CSV files.
romaia
  • 869
  • 7
    An excellent tool that meets the question criteria wonderfully (in particular it doesn't require jumping into a programming language and is well crafted to fit with other Unix utilities). – mm2001 Nov 04 '14 at 20:10
  • I'm looking for a csvkit alternative because csv does not support concatenating columns of the same file. – ptrcao Dec 21 '19 at 09:12
  • csvkit's csvjoin seems to be broken as of version 1.0.2 – Mihail Kostira Jan 28 '22 at 11:26
51

You can use Python's csv module.

A simple example:

import csv
reader = csv.reader(open("test.csv", "r"))
for row in reader:
    for col in row:
        print col
Wildcard
  • 36,499
dogbane
  • 29,677
22

Miller is another nice tool for manipulating name-based data, including CSV (with headers). To extract the first column of a CSV file, without caring about its name, you’d do something like

printf '"first,column",second,third\n1,2,3\n' |
  mlr --csv -N cut -f 1
Stephen Kitt
  • 434,908
  • 2
    Miller is very impressive. I'd compare it to awk, but highly DSV-aware. – Derek Mahar Apr 18 '18 at 21:59
  • mlr is hands down my favorite tool for handling structured data via the command line, and is likely the most performant of all of those I have tried. – signus Aug 04 '23 at 17:36
20

Sounds like a job for Perl with Text::CSV.

perl -MText::CSV -pe '
    BEGIN {$csv = Text::CSV->new();}
    $csv->parse($_) or die;
    @fields = $csv->fields();
    print @fields[1,3];
'

See the documentation for how to handle column names. The separator and quoting style can be tuned with parameters to new. See also Text::CSV::Separator for separator guessing.

  • Is there a one liner you can compact this into. I like perl, but only when I can invoke it directly from the command line rather than with a script – Sridhar Sarnobat Jan 26 '17 at 06:04
  • 2
    @user7000, unless your shell is (t)csh that command would work just fine at the prompt of your shell. You can always joins those lines together if you want it on one line. newline is generally just like space in the perl syntax like in C. – Stéphane Chazelas Sep 12 '17 at 12:07
  • I guess. Though squashing more than 2 lines into 1 isn't what I really mean by a one liner. I was hoping there was some syntactic sugar that would do some of it implicitly (like how the -e creates an implicit loop). – Sridhar Sarnobat Sep 12 '17 at 22:33
  • 1
    @SridharSarnobat with all due respect to the excellent Perl answers on this U&L webpage, you might like the Raku (a.k.a. Perl6) answer below, especially for Unicode work. – jubilatious1 Mar 04 '23 at 18:12
16

I'd recommend xsv, a "fast CSV command line toolkit written in Rust".

Written by Ripgrep's author.

Featured in How we made our CSV processing 142x faster (Reddit thread).

13

I've found csvfix, a command line tool does the job well. You will need to make it yourself however:

http://neilb.bitbucket.org/csvfix

It does all the things you'd expect, order/select columns, split/merge and many you wouldn't like generating SQL inserts from CSV data and diffing CSV data.

11

If you want to use the command-line (and do not create an entire program to do the job), you'd like to use rows, a project I'm working on: it's a command-line interface to tabular data but also a Python library to use in your programs. With the command-line interface you can pretty-print any data in CSV, XLS, XLSX, HTML or any other tabular format supported by the library with a simple command:

rows print myfile.csv

If myfile.csv is like this:

state,city,inhabitants,area
RJ,Angra dos Reis,169511,825.09
RJ,Aperibé,10213,94.64
RJ,Araruama,112008,638.02
RJ,Areal,11423,110.92
RJ,Armação dos Búzios,27560,70.28

Then rows will print the contents in a beautiful way, like this:

+-------+-------------------------------+-------------+---------+
| state |              city             | inhabitants |   area  |
+-------+-------------------------------+-------------+---------+
|    RJ |                Angra dos Reis |      169511 |  825.09 |
|    RJ |                       Aperibé |       10213 |   94.64 |
|    RJ |                      Araruama |      112008 |  638.02 |
|    RJ |                         Areal |       11423 |  110.92 |
|    RJ |            Armação dos Búzios |       27560 |   70.28 |
+-------+-------------------------------+-------------+---------+

Installing

If you are a Python developer and already have pip installed on your machine, just run inside a virtualenv or with sudo:

pip install rows

If you're using Debian:

sudo apt-get install rows

Other Cool Features

Converting Formats

You can convert between any supported format:

rows convert myfile.xlsx myfile.csv

Querying

Yes, you can use SQL into a CSV file:

$ rows query 'SELECT city, area FROM table1 WHERE inhabitants > 100000' myfile.csv
+----------------+--------+
|      city      |  area  |
+----------------+--------+
| Angra dos Reis | 825.09 |
|       Araruama | 638.02 |
+----------------+--------+

Converting the output of the query to a file instead of stdout is also possible using the --output parameter.

As a Python Library

You can even use rows in your Python programs, too:

import rows
table = rows.import_from_csv('myfile.csv')
rows.export_to_txt(table, 'myfile.txt')
# `myfile.txt` will have same content as `rows print` output

Hope you enjoy it!

Álvaro Justen
  • 415
  • 4
  • 7
  • 1
    The project looks interesting. I tried to read the docs but the following warning (as of May 2021) made it look unprofessional. You might want to beef up your site. Thanks.

    HTTPS-Only Mode Alert Secure Connection Not Available

    You’ve enabled HTTPS-Only Mode for enhanced security, and a HTTPS version of turicas.info is not available.

    https://support.mozilla.org/en-US/kb/https-only-prefs?as=u&utm_source=inproduct

    – Leonid May 24 '21 at 23:23
  • "rows" says it has parquet support, unique vs other tools – yzorg Oct 20 '22 at 16:46
  • No commits on develop branch for a while, possibly abandoned by original author. Any forks to recommend? – yzorg Oct 20 '22 at 16:46
9

I used csvtool once and it saved me a lot of time and trouble. Callable from the shell.

http://caml.inria.fr/cgi-bin/hump.en.cgi?contrib=447

nat
  • 91
9

If you want a visual / interactive tool in the terminal, I wholeheartedly recommend VisiData.

enter image description here

It has frequency tables (shown above), pivot, melting, scatterplots, filtering / computation using Python, and more.

You can pass csv files like so

vd hello.csv

There are csv specific options: --csv-dialect, --csv-delimiter, --csv-quotechar, and --csv-skipinitialspace for fine-tuned handling of csv files.

8

R is not my favorite programming language, but it is good for things like this. If your csv file is

***********
foo.csv
***********
 col1, col2, col3
"this, is the first entry", this is the second, 34.5
'some more', "messed up", stuff

Inside the R interpreter type

> x=read.csv("foo.csv", header=FALSE)

> x
                     col1                col2   col3
1 this, is the first entry  this is the second   34.5
2              'some more'           messed up  stuff
> x[1]  # first col
                      col1
1 this, is the first entry
2              'some more'
> x[1,] # first row
                      col1                col2  col3
1 this, is the first entry  this is the second  34.5

With regard to your other requests, for "the ability to select columns based on the column names given in the first row" see

> x["col1"]
                      col1
1 this, is the first entry
2              'some more'

For "support for other quoting styles" see the quote argument to read.csv (and related functions). For "support for tab-separated files" see the sep argument to read.csv (set sep to '\t').

For more information see the online help.

> help(read.csv)
Faheem Mitha
  • 35,108
  • 2
    I'm very familiar with R, but my goal was to have something I could use easily from Bash. – Steven D Apr 06 '11 at 15:03
  • 1
    @Steven: R can easily be run from the command line, in the same way as Python or Perl, if that is your only concern. See Rscript (part of the base R distribution) or the addon package littler. You can do #!/usr/bin/env Rscript or similar. – Faheem Mitha Apr 07 '11 at 18:21
  • 1
    Ah yes. I'm pretty proficient in R but hadn't used it much to create this type of utility. I have something working in Python but I may try to create something in R as well. – Steven D Apr 10 '11 at 01:05
7

One of the best tool is Miller. It is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON.

In example

echo '"this, is the first entry", this is the second, 34.5' | \
mlr --icsv --implicit-csv-header cat

gives you

1=this, is the first entry,2= this is the second,3= 34.5

If you want a TSV

echo '"this, is the first entry", this is the second, 34.5' | \
mlr --c2t --implicit-csv-header cat

gives you (it's possible to remove the header)

1       2       3
this, is the first entry         this is the second      34.5

If you want first and third column, changing their order

echo '"this, is the first entry", this is the second, 34.5' | \
mlr --csv --implicit-csv-header --headerless-csv-output cut -o -f 3,1

gives you

 34.5,"this, is the first entry"
aborruso
  • 2,855
4

cissy will also do command-line csv processing. It's written in C (small/lightweight) with rpm and deb packages available for most distros.

Using the example:

echo '"this, is the first entry", this is the second, 34.5' | cissy -c 1
"this, is the first entry"

or

echo '"this, is the first entry", this is the second, 34.5' | cissy -c 2
 this is the second

or

echo '"this, is the first entry", this is the second, 34.5' | cissy -c 2-
 this is the second, 34.5
slass100
  • 183
3

To use python from the command line, you can check out pythonpy (https://github.com/Russell91/pythonpy):

$ echo $'a,b,c\nd,e,f' | py '[x[1] for x in csv.reader(sys.stdin)']
b
e
  • GitHub repository https://github.com/Russell91/pythonpy no longer exists and nor does user https://github.com/Russell91. – Derek Mahar Jan 25 '23 at 14:27
2

The github repo Structured Text Tools has a useful list of relevant linux command line tools. In particular, the Delimiter Separated Values section lists several CSV capable tools that directly support the operations requested.

JonDeg
  • 301
1

There is also a Curry library for reading/writing files in CSV format: CSV.

1

csvq is a command-line tool that uses SQL to query CSV files:

$ echo '"this, is the first entry", this is the second, 34.5' |
  csvq --no-header "SELECT * FROM STDIN"
+--------------------------+----------------------+--------+
|            c1            |          c2          |   c3   |
+--------------------------+----------------------+--------+
| this, is the first entry |  this is the second  |  34.5  |
+--------------------------+----------------------+--------+
1

Using Raku (formerly known as Perl_6)

These answers use the Raku programming language, specifically in conjunction with its Text::CSV module. Answer below is in two parts: the first part reads a file linewise, the second part reads the file in all at once. Either method can be used to wrangle text, outputting RFC-4180 (and possibly RFC-4180/RFC-7111) conforming files.


Linewise: Reading a file linewise, as a 'one-liner' accepting commandline input:

~$ raku -MText::CSV -ne 'my $csv = Text::CSV.new; $csv.parse($_); put $csv.strings.raku;'  MS.csv

Sample Input:

https://www.microsoft.com/en-us/download/details.aspx?id=45485

Sample Output:

["User Name", "First Name", "Last Name", "Display Name", "Job Title", "Department", "Office Number", "Office Phone", "Mobile Phone", "Fax", "Address", "City", "State or Province", "ZIP or Postal Code", "Country or Region"]
["chris\@contoso.com", "Chris", "Green", "Chris Green", "IT Manager", "Information Technology", "123451", "123-555-1211", "123-555-6641", "123-555-9821", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
["ben\@contoso.com", "Ben", "Andrews", "Ben Andrews", "IT Manager", "Information Technology", "123452", "123-555-1212", "123-555-6642", "123-555-9822", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
["david\@contoso.com", "David", "Longmuir", "David Longmuir", "IT Manager", "Information Technology", "123453", "123-555-1213", "123-555-6643", "123-555-9823", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
["cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
["melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]

Above uses Raku's -ne linewise flags to take input line by line, and output parsed strings. Above, drop the call to .raku to output parsed text without doublequotes/escapes. Alternatively, the code below takes input line by line using $csv.getline(), which stores data internally as a Raku array, which is then output:

Linewise: Raku code as a script (useful for data-wrangling):

use Text::CSV;

my @rows; my $csv = Text::CSV.new; my $fh = open "MS.csv", :r, :!chomp; while ($csv.getline($fh)) -> $row { @rows.push: $row; } $fh.close; $_.raku.put for @rows;

Linewise: Raku code as a 'one-liner' accepting STDIN:

~$ raku -MText::CSV -e 'my @rows; my $csv = Text::CSV.new;  \
                        while ($csv.getline($*IN)) -> $row { @rows.push: $row; };  \
                        $_.raku.put for @rows;' <  MS.csv

Sample Output (last two code examples above: below shows Raku's internal data representation):

$["User Name", "First Name", "Last Name", "Display Name", "Job Title", "Department", "Office Number", "Office Phone", "Mobile Phone", "Fax", "Address", "City", "State or Province", "ZIP or Postal Code", "Country or Region"]
$["chris\@contoso.com", "Chris", "Green", "Chris Green", "IT Manager", "Information Technology", "123451", "123-555-1211", "123-555-6641", "123-555-9821", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["ben\@contoso.com", "Ben", "Andrews", "Ben Andrews", "IT Manager", "Information Technology", "123452", "123-555-1212", "123-555-6642", "123-555-9822", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["david\@contoso.com", "David", "Longmuir", "David Longmuir", "IT Manager", "Information Technology", "123453", "123-555-1213", "123-555-6643", "123-555-9823", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["cynthia\@contoso.com", "Cynthia", "Carey", "Cynthia Carey", "IT Manager", "Information Technology", "123454", "123-555-1214", "123-555-6644", "123-555-9824", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]
$["melissa\@contoso.com", "Melissa", "MacBeth", "Melissa MacBeth", "IT Manager", "Information Technology", "123455", "123-555-1215", "123-555-6645", "123-555-9825", "1 Microsoft way", "Redmond", "Wa", "98052", "United States"]

If all you want is the first two lines (e.g. header plus first data row), change:

while ($csv.getline($*IN)) ...to... for ($csv.getline($*IN) xx 2)

Printing just a single row (e.g. the header) will return a \n newline separated list (useful for checking correct parsing):

"User Name"
"First Name"
"Last Name"
"Display Name"
"Job Title"
"Department"
"Office Number"
"Office Phone"
"Mobile Phone"
"Fax"
"Address"
"City"
"State or Province"
"ZIP or Postal Code"
"Country or Region"

Of course, individual columns can be output as well. Simply change the final put statement to:

.[2].raku.put for @rows;

...to print out the 3rd Column ("Last Name", zero-index = 2):

"Last Name"
"Green"
"Andrews"
"Longmuir"
"Carey"
"MacBeth"


Reading a file all at once: Raku's Text::CSV module provides a high-level csv(…) function to alter/validate CSV/TSV text, and output RFC-4180 (and possibly RFC-4180/RFC-7111) conforming files.

Convert a CSV file to a TSV file (reads stdin, outputs columns with internal whitespace as double-quoted by default):

$ cat MS.csv | raku -MText::CSV -e 'my @a = csv(in => $*IN); csv(in => @a, sep => "\t", out => $*OUT);' > MS.tsv

Sample Output (saved and re-opened in Vim w/ CSV module):

         "User Name"     "First Name"    "Last Name"        "Display Name"    "Job Title"                  Department    "Office Number"     "Office Phone"  "Mobile Phone"           Fax              Address      City     "State or Province"     "ZIP or Postal Code"    "Country or Region"
   chris@contoso.com            Chris          Green         "Chris Green"   "IT Manager"    "Information Technology"             123451       123-555-1211    123-555-6641  123-555-9821    "1 Microsoft way"   Redmond                      Wa                    98052        "United States"
     ben@contoso.com              Ben        Andrews         "Ben Andrews"   "IT Manager"    "Information Technology"             123452       123-555-1212    123-555-6642  123-555-9822    "1 Microsoft way"   Redmond                      Wa                    98052        "United States"
   david@contoso.com            David       Longmuir      "David Longmuir"   "IT Manager"    "Information Technology"             123453       123-555-1213    123-555-6643  123-555-9823    "1 Microsoft way"   Redmond                      Wa                    98052        "United States"
 cynthia@contoso.com          Cynthia          Carey       "Cynthia Carey"   "IT Manager"    "Information Technology"             123454       123-555-1214    123-555-6644  123-555-9824    "1 Microsoft way"   Redmond                      Wa                    98052        "United States"
 melissa@contoso.com          Melissa        MacBeth     "Melissa MacBeth"   "IT Manager"    "Information Technology"             123455       123-555-1215    123-555-6645  123-555-9825    "1 Microsoft way"   Redmond                      Wa                    98052        "United States"

Raku's Text::CSV module can handle alternative column separators (e.g. \t), double-quoted column values with embedded commas, newlines, etc. Most importantly, when wrangling text input you'll have access to Raku's powerful Regex pattern-matching interface, should you need to filter/substitute characters (e.g. column-names), rows, or columns. Each Raku code sample above correctly parses the OP's short sample text and outputs the following (or similar; declare Text::CSV.new(:allow_whitespace) to trim whitespace before/after each field):

["this, is the first entry", "this is the second", "34.5"]

See the URLs below for storing data in a hash (or array-of-hashes), and/or for setting column-names, sep-char, escape-char, formula-handling, binary, strict parameters, etc.

https://raku.land/github:Tux/Text::CSV
https://github.com/Tux/CSV
https://raku.org

jubilatious1
  • 3,195
  • 8
  • 17
0

An awk solution

awk -vq='"' '
func csv2del(n) {
  for(i=n; i<=c; i++)
    {if(i%2 == 1) gsub(/,/, OFS, a[i])
    else a[i] = (q a[i] q)
    out = (out) ? out a[i] : a[i]}
  return out}
{c=split($0, a, q); out=X;
  if(a[1]) $0=csv2del(1)
  else $0=csv2del(2)}1' OFS='|' file
Srini
  • 161
0

Throwing another one on the pile...

csvprintf

It works like printf(1).

For example, with this input:

FirstName,LastName,IdNumber
George,Washington,1
Betsy,Ross,2

And this command line:

$ cat input.csv | csvprintf -i '%{IdNumber}04d: %{LastName}s, %{FirstName}s\n'

You get this output:

0001: Washington, George
0002: Ross, Betsy

Disclaimer: I wrote this

Archie
  • 123
0

GNU Datamash works pretty well for me.

For example, here is a how to give a groupwise some of the values in column 3 grouped by the values in column 2. There are many other use cases.

$ echo "id,group,value
1,1,2
1,2,1" | datamash -t, --header-in --header-out --sort groupby 2 sum 3

GroupBy(group),sum(value) 1,3 2,1

cmc
  • 174
  • Note that datamash is not fully CSV-aware (does not know of CSV quoting rules), but that it works pretty well with comma-delimited input. – Kusalananda Oct 28 '22 at 06:59
0

You could also use Lua with the ftcsv rock:

$ lua -l ftcsv - filename.csv column_index <<SCRIPT
function getcol(data, headers, colindex, rowindex)
  if rowindex == nil then rowindex = 1 end
  if rowindex > #data then return end
  print(data[rowindex][headers[colindex]])
  return getcol(data, headers, colindex, rowindex+1)
end
data, headers = ftcsv.parse(arg[1], ",")
getcol(data, headers, tonumber(arg[2]))
SCRIPT
0

Another option that is quite powerful, and hopefully already installed on your system is sqlite3

sqlite3 '.import --csv file.csv t' 'select * from t;'
AdminBee
  • 22,803
Alpaca
  • 101
-1

go install pup is a cli csv parser that can product json and then you can run jq (a cli json parser) to format. Its quite powerful.

chovy
  • 2,069