20

I have a CSV file from which I need to remove one column from it.The problem is I have exported the CSV file without headers.So how can I remove the column from the CSV file.For example if I have the example.csv I want to remove the last column from it which is a boolean data and have the file as input.csv.

input.csv

1,"data",100.00,TRUE
2,"code",91.8,TRUE
3,"analytics",100.00,TRUE

output.csv

1,"data",100.00
2,"code",91.8
3,"analytics",100.00
countermode
  • 7,533
  • 5
  • 31
  • 58
priyanka3
  • 331
  • 2
    What's wrong with having no headers? – countermode Oct 19 '16 at 06:32
  • 1
    You should clearly indicate that your complete CSV is this simplistic, i.e. no fields spanning multiple lines. Either give an example matching [tag:csv] or use [tag:csv-simple]. CSV is much more complex format than most assume, and in only a extremely small subset of CSV files can be handled with "standard" text processing tools (sed, awk), the vast majority require parsing e.g. with Python/Perl/Ruby libraries. – Anthon Oct 19 '16 at 06:34
  • Almost the same question: https://unix.stackexchange.com/questions/34646/is-there-a-command-line-spell-to-drop-a-column-in-a-csv-file – Alexander Pozdneev Jul 29 '20 at 07:23

9 Answers9

33

To remove the fourth column,

$ cut -d, -f4 --complement example.csv > input.csv

Adjust the -f option to match the column number.

If the CSV file is more complicated, you could use some perl and the Text::CSV package,

$ perl -MText::CSV -E '$csv = Text::CSV->new({binary=>1}); 
  while ($row = $csv->getline(STDIN)) 
  {
    print "$row->[0],$row->[1],$row->[2]\n"
  }' < example.csv > input.csv
James
  • 520
  • 1
    This is incorrect! CSV files may have multiline cells and those would be broken by this command. – adius Jan 17 '20 at 10:54
  • @adius yes, the first command is pretty simple and can be broken by newlines, commas in quotes, etc.

    The second command handles a much wider range of CSV quirks, including multi-line cells.

    – James Jan 18 '20 at 11:25
8

I suggest using miller (AKA mlr). Miller is a small (< 1 MB) program written in C (without dependencies) for processing tabular data in a variety of formats (e.g., TSV, CSV, tabular JSON, key-value pairs) in a streaming fashion. It is available from most Linux distros' repositories (usually under the name miller). Its CSV parser handles multi-line cells (i.e., cells with embedded newline characters). Although it assumes CSV files with headers, it has options (--implicit-csv-header and --headerless-csv-output) that handle header-less CSV files. For your use-case, assuming you want to remove the fourth column (column indexing begins with 1), you could do something like

mlr --csv --implicit-csv-header --headerless-csv-output cut -x -f 4 input.csv > output.csv

If you want to remove the last column without having to count them, you could reverse each CSV line prior to processing the data, and reverse them again afterwards, like so:

< input.csv rev |
mlr --csv --implicit-csv-header --headerless-csv-output cut -x -f 1 |
rev > output.csv

Hope that helps.

sebbit
  • 81
7

Use the csv module in Python:

python -c 'import sys,csv
w = csv.writer(sys.stdout)
for row in csv.reader(sys.stdin):
    w.writerow(row[0:-1])' < input.csv > output.csv

If you want the output to contain say columns 2, 3, and 1, in that order, use

python -c 'import sys,csv
w = csv.writer(sys.stdout)
for row in csv.reader(sys.stdin):
    w.writerow([row[1], row[2], row[0]])' < input.csv > output.csv

Python array indexes start at 0, so the column numbers are counted starting at zero too, above; that's why they are 1, 2, 0 for columns 2, 3, 1, respectively.

  • This answer is perfect and it helps that you've answered both the include and exclude variants of the module. – aashima Jul 08 '21 at 09:46
4

sed:

sed 's/,[^,]\+$//' file.csv
  • ,[^,]\+ matches , followed by any number of characters except ,, and the match is replaced with empty string

awk:

awk -F, '{for(i=1; i<NF; i++) printf("%s%s", $i, FS); print ""}' file.csv
  • -F, sets field delimiter as ,

  • for(i=1; i<NF; i++) printf("%s", $i); print "" iterates over the fields except the last one, and print the output in desired format

Example:

% cat file.txt
1,"data",100.00,TRUE
2,"code",91.8,TRUE
3,"analytics",100.00,TRUE

% sed 's/,[^,]\+$//' file.txt                                     
1,"data",100.00
2,"code",91.8
3,"analytics",100.00

% awk -F, '{for(i=1; i<NF; i++) printf("%s%s", $i, FS); print ""}' file.txt
1,"data",100.00,
2,"code",91.8,
3,"analytics",100.00,
heemayl
  • 56,300
2

CSV format is sometimes tricky: CSV-parsers is the way to go.

Taking the simplistic approach, removing last field:

1) remove last field with Awk

awk 'BEGIN{FS=OFS=","}{NF--;print}'

2) regex with perl

perl -pe 's/.*\K,.*//'
JJoao
  • 12,170
  • 1
  • 23
  • 45
2

In addition to CSV-specific custom code in Python, Perl, or whatever else, there are projects, like miller, focused on processing CSV/TSV on the command line.

There's the Python project, csvkit, which aims to offer some CSV-specific variants of common CL tools like cut and grep. That project also inspired GoCSV which aims to be faster and more memory efficient.

They both have cut-like syntax for specifying columns/fields with ranges, like:

% csvcut -c 1-3 input.csv 
1,data,100.00
2,code,91.8
3,analytics,100.00
% gocsv select -c 1-3 input.csv 
1,data,100.00
2,code,91.8
3,analytics,100.00

I'm not sure how these two compare to miller, but I like how the syntax looks shorter/clearer.

Zach Young
  • 220
  • 2
  • 5
1

just change the column number in this awk command. Below command removes the 3rd column. ( $3 )

awk -F, '{$3=""}1' OFS=, test.txt  | sed "s/^,//;s/,$//;s/,,/,/"
Kamaraj
  • 4,365
1
cut -d "," -f 4 --complement input.csv > output.csv

Here, -d is followed by the delimiter i.e. semi-colon(;) within double quotes, and -f is followed by the number of the column to be removed i.e. 4. The column numbers are starting from 1 and proceed in this context.

The --complement option indicates the complement of the selection. (This option enables the cut command to display all the bytes/characters or fields except the selected bytes)

In this scenario, the complement of the cut operation is all the columns delimited by , except the 4th column.

AdminBee
  • 22,803
0

Very generic solution to remove number of last characters:

awk '{print substr($0, 1, length($0)-5)}' list1.txt
1,"data",100.00
2,"code",91.8
3,"analytics",100.00

-5 is the number of character will be removed