1

I am sorry if my question is more a typographical error, but I have been trying to sort this out for a while now and sadly, I cannot get this to work. Perhaps I should use the sed command, but I haven't figured out how to specify a column in sed and despite being a beginner, I have a bit more experience with the awk command.

So here is the goal; I have a CSV file, file1, that has a column (14) where some of the rows have null (blank) values, while other rows have values. I still want all the other columns in the output, but just to change the blank (empty [null]) columns in column 14 to have a new value of NA.

Example:

Column14
Value1
Value2

Value3

I am trying to use the awk command to locate any blank row in column 14 and if found enter a new text value of NA to the cell.

Here is the command I was trying, but my new file still has blank cells in column 14. I would appreciate any help. Thank you.

Command:

awk -F"," 'BEGIN {OFS=","} $14 == "" { $14 = "NA" }  {print}' file1 > file2


GOAL:

Example:

Column14
Value1
Value2
NA
Value3

Thank you all for taking the time to read and assist.

UPDATE

As requested, here is some sample data.

"employee_number","employee_login","is_active","send_pkg_email","send_na_email","last_name","first_name","department","title","phone_number","employee_type","email","charge_code","area_code","mailstop","roomid"
"103293","jsmith@company.com","Y","","","Smith","Jessica","","","+1 (650) 3530975","Employee","jsmith@company.com","","LOC0028.03","","03.C.01H"
"103295","fredl@company.com","Y","","","Long","Fred","","","+1 (415) 9449428","Employee","fredl@company.com","","LOC0025.01","","01.D.04B"
"103297","lcheng@company.com","Y","","","Cheng","Laura","","","+1 (650) 8623342","Contingent","lcheng@company.com","","","",""
"103307","cfb@company.com","Y","","","Brown","Chris","","","+1 (512) 9644927","Employee","cfb@company.com","","ATX0607.16","","16.B.10D"
"103310","stanwang@company.com","Y","","","Williams","Stan","","","+1 (650) 8048591","Employee","stanwang@company.com","","LOC0061.03","","03.D.01B"
cas
  • 78,579
Jesse
  • 25
  • 2
    Please [edit] your question and show some sample data that actually has (at least) 14 columns with both empty and non-empty values and that reproduces your problem. Check if your input file has rows that contain whitespace only in column 14. – Bodo Sep 25 '19 at 16:55
  • Your awk script works for me. – suspectus Sep 25 '19 at 17:03
  • @Bodo I added the sample data you requested. When I run my command my output file still shows the blank cells as blank "". – Jesse Sep 25 '19 at 21:35
  • Unlike a lot of "CSV" questions here, your input file is properly formatted CSV so you should probably use a real CSV parser to work with it (just splitting fields on commas isn't good enough). See Is there a robust command line tool for processing csv files? - personally, i'd use either csvkit or perl's Text::CSV module. miller is pretty good too. – cas Sep 26 '19 at 02:41

2 Answers2

1

Column 14 does not contain an empty string, but "", so you need to check for two quotes (which have to be escaped):

awk -F"," 'BEGIN {OFS=","} $14 == "\"\"" { $14 = "NA" } {print}' file1 > file2
Freddy
  • 25,565
  • Thank you for contributing. Between yours and the comments from @cas, I was able to get this working. Very much appreciated. – Jesse Sep 26 '19 at 18:36
1
$ perl -MText::CSV=csv -e '
  $csv = Text::CSV->new();
  while(my $row = $csv->getline(ARGV)) {
    $row->[13] = "NA" if ($row->[13] eq "");
    $csv->say(STDOUT, $row);
  };' input.csv

Note that perl arrays begin from 0, not 1 - so the 14th field is element 13 of the $row arrrayref.

employee_number,employee_login,is_active,send_pkg_email,send_na_email,last_name,first_name,department,title,phone_number,employee_type,email,charge_code,area_code,mailstop,roomid
103293,jsmith@company.com,Y,,,Smith,Jessica,,,"+1 (650) 3530975",Employee,jsmith@company.com,,LOC0028.03,,03.C.01H
103295,fredl@company.com,Y,,,Long,Fred,,,"+1 (415) 9449428",Employee,fredl@company.com,,LOC0025.01,,01.D.04B
103297,lcheng@company.com,Y,,,Cheng,Laura,,,"+1 (650) 8623342",Contingent,lcheng@company.com,,NA,,
103307,cfb@company.com,Y,,,Brown,Chris,,,"+1 (512) 9644927",Employee,cfb@company.com,,ATX0607.16,,16.B.10D
103310,stanwang@company.com,Y,,,Williams,Stan,,,"+1 (650) 8048591",Employee,stanwang@company.com,,LOC0061.03,,03.D.01B

The line with employee_number 103297 now has NA in the 14th field.

BTW, the output fields here are double-quoted only when necessary (e.g. when they contain a space. or if any of them contained a comma, they'd be quoted too). If you prefer all fields in the output to be quoted as in your input file, change the $csv = Text::CSV->new(); line to:

$csv = Text::CSV->new({always_quote => 1});

Text::CSV has numerous other options. e.g. if you use $csv = Text::CSV->new({always_quote => 1, strict => 1}); it will also trigger an error if any of the input rows have a different number of fields. See man Text::CSV for details.


Alternatively, there's a simple fix to your awk script:

awk -F"," 'BEGIN {OFS=","}; $14 == "\"\"" { $14 = "\"NA\"" };1' input.csv

This highlights a problem with just comma-splitting CSV files. It's impossible to distinguish between " characters as wrappers around field data and " characters being part of the field data...because there is no such distinction with this simple split method.

Field 14 isn't empty when you're just splitting the input line by commas. It contains two quote characters ("").

This awk one-liner will also break if any of the fields contain a comma character. That's another reason why it's better to use a CSV parser.

See Is there a robust command line tool for processing csv files?.

There's also a good awk csv parser at https://github.com/geoffroy-aubry/awk-csv-parser

cas
  • 78,579
  • Because my reputation is under 15, I cannot up-vote this, but this worked for me. I see your points around using a proper CSV parser and I will look into this as well. I can confirm the AWK works for me and I see my mistake in that I needed with the "" characters. Thank you very much for your assistance with this. – Jesse Sep 26 '19 at 18:36
  • @jesse i made a minor fix to the script. the previous version would modify field 14 if it was empty or if it contained 0. now it only modifies it if it is empty. – cas Sep 27 '19 at 00:09
  • Okay wonderful. Thank you for revisiting and making the adjustment @cas. – Jesse Sep 28 '19 at 01:17