1

I'm having some problems with a script parsing a .csv file

Information on .csv:

#1,13/8/2020,somedude@hotmail.com,otherdude@hotmail.com,,Subject,"Dear Dude,

Information have been updated. Please login to APP to review by 30th August 2020. Thank you.

Best regards, Mr. Mack",Pending,13/8/2020 12:35

This is how information should be partioned: MsgID,Date,To,CC,BCC,Subject,Body,Status,Timestamp

MsgID=#1
Date=13/8/2020
To=somedude@hotmail.com
CC=otherdude@hotmail.com
BCC=
Subject=Subject
Body=Dear Dude,
Information have been updated. Please login to APP to review by 30th August 2020. Thank you.
Best regards,
Mr. Mack
Status=Pending
Timestamp=13/8/2020 12:35

The problems I'm having are with the body part for two reasons, the "," in the text that breaks when I'm trying to run the script and the other reason is the line breaks in the text.

This is the script that I was doing:

#!/bin/bash

export TIMESTAMP="$( date '+%d/%m/%Y %H:%M:%S' )" INPUT=/tmp/test.csv OUTPUT=/tmp/test.csv.out OLDIFS=$IFS IFS=',' [ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; } while read msgid dat to cc bcc subject body status timesta do echo "MSG ID : $msgid" echo "Date : $dat" echo "To : $to" echo "CC : $cc" echo "BCC : $bcc" echo "Subject : $subject" echo "Body : $body" echo "Status : $status" echo "Timestamp : $timesta" echo $body | mail -s "$subject" $to -c $cc -b $bcc printf "$msgid,$dat,$to,$cc,$bcc,$subject,$body,Sent,$TIMESTAMP" >> $OUTPUT done < $INPUT IFS=$OLDIFS

  • What about using printf %q to escape strings in POSIX shell format? So the body will be single line with \n escapes and anything that could cause issue would be escaped? printf %q "$body" – Jakub Jindra Aug 14 '20 at 09:58
  • Why are you doing this in the shell? Bash, and all shells, are very bad tools for text parsing. And csv is a particularly hard format to parse, even with a proper language. Can't you use a csv parser? – terdon Aug 14 '20 at 10:06
  • Trying to use the printf %q "$body" I'll get the following output: "Dear\ Dude'''''''''' – Ruben Jose Aug 14 '20 at 10:06
  • I don't have access to a csv parser, I'm just a user of the server. – Ruben Jose Aug 14 '20 at 10:10
  • What operating system is the server running? Can you work with perl or python instead of the shell? Also, why are you setting IFS to ;, shouldn't that be ,? – terdon Aug 14 '20 at 10:15
  • Hello terdon, yes, that's just a typo when I copied it to here, in the script it's a "," It's running a RHEL6, I'm working on shell because I'm not comfortable with python. – Ruben Jose Aug 14 '20 at 10:24
  • 1
    , csv for email, really? Best regards, – alecxs Aug 14 '20 at 11:05
  • Then can you please fix the typo so people don't waste their time answering you to explain that you're using the wrong delimiter? – terdon Aug 14 '20 at 11:24

2 Answers2

3

The shell is just a very bad tool for text parsing. You should only use it for the most simple of tasks, and probably not even then. It is slow, inefficient, its very complicated syntax makes it error prone and it lacks the most basic text processing tools. On top of this, CSV is a complex format that allows nested and multi-line entries. Trying to parse a CSV file in the shell is just asking for trouble.

So, use a proper programming language with support for CSV parsing instead. For example, in python:

#!/bin/python3
import csv
import sys

with open(sys.argv[1], newline='') as csvfile: fieldnames = ("MsgID","Date","To","CC","BCC", "Subject","Body","Status","Timestamp") reader = csv.reader(csvfile, delimiter=',', for row in reader: for fieldName, value in zip(fieldnames,row): print("%s: %s" % (fieldName,value))

I saved the script above as foo.py and then:

MsgID: #1
Date: 13/8/2020
To: somedude@hotmail.com
CC: otherdude@hotmail.com
BCC: 
Subject: Subject
Body: Dear Dude,

Information have been updated. Please login to APP to review by 30th August 2020. Thank you.

Best regards, Mr. Mack Status: Pending Timestamp: 13/8/2020 12:35

As you can see, it has correctly recognized all fields. So, in order to also send the mail, you could do something like this (adapted from here):

#!/bin/python3
import csv
import sys
import subprocess
import os

def send_message(to, cc, bcc, subject, body): try: process = subprocess.Popen(['mail', '-s', subject, '-c', cc, '-b', bcc, to], stdin=subprocess.PIPE) except Exception as error: print(error) process.communicate(body)

with open(sys.argv[1], newline='') as csvfile: reader = csv.reader(csvfile, delimiter=',') for row in reader: (msgID, date, to, cc, bcc, subj, body, status, timestamp) = row send_message(to, cc, bcc, subj, body)

DISCLAIMER: You seem to be using BSD mail based on the options you have. I don't have access to a working system with that tool installed so I could not test the mail sending script above. It should at least serve as a good start though.

terdon
  • 242,166
0

You could use a tool as Miller (https://github.com/johnkerl/miller), specialized to work on structured text data.

In example starting from this input.txt file

#1,13/8/2020,somedude@hotmail.com,otherdude@hotmail.com,,Subject,"Dear Dude,

Information have been updated. Please login to APP to review by 30th August 2020. Thank you.

Best regards, Mr. Mack",Pending,13/8/2020 12:35

and running

mlr --implicit-csv-header --c2x cat then label MsgID,Date,To,CC,BCC,Subject,Body,Status,Timestamp input.txt

you will have

MsgID     #1
Date      13/8/2020
To        somedude@hotmail.com
CC        otherdude@hotmail.com
BCC
Subject   Subject
Body      Dear Dude,

Information have been updated. Please login to APP to review by 30th August 2020. Thank you.

Best regards, Mr. Mack Status Pending Timestamp 13/8/2020 12:35

If you want to remove carriage return in body field, you can run

mlr --implicit-csv-header --c2x cat then label MsgID,Date,To,CC,BCC,Subject,Body,Status,Timestamp then put '$Body=gsub($Body,"\n"," ")' input.txt

to have

MsgID     #1
Date      13/8/2020
To        somedude@hotmail.com
CC        otherdude@hotmail.com
BCC
Subject   Subject
Body      Dear Dude,  Information have been updated. Please login to APP to review by 30th August 2020. Thank you.  Best regards, Mr. Mack
Status    Pending
Timestamp 13/8/2020 12:35
aborruso
  • 2,855