-1

A file is modified by a script using an input file-

141,141_1,BAR,HONDA,ps2_0,unassigned,ps3_0,Unassigned,ps4_0,Unassigned,ps5_0,Unassigned,ps6_0,Unassigned,ps7_3,TILL WILL,.....

Input file-

141,ps7,TILL WILL

Now I need to search whether to column ps7_3 is updated with the correct value.

So from the input file, I separated the columns-

while read -r line;
do
sub1=$(echo $line|cut -f 1 -d ',');
sub2=$(echo $line|cut -f 2 -d ',');
sub3=$(echo $line|cut -f 3 -d ',');
sub4=$(echo $sub2'.*,'$sub3|sed -e "s/\(.*\)\r/'\1'/");
echo $sub1;
echo $sub2;
echo $sub3;
echo $sub4;
grep $sub4 modded_file.csv.dat;
done<input.csv

The output being-

141
ps7
TILL WILL
'ps7.*,TILL WILL'
grep: WILL': No such file or directory

But when I run grep 'ps7.*,TILL WILL' modded_file.csv.dat, it works. How can I grep a variable as shown above, in a file?

Shashank K R
  • 11
  • 1
  • 1
  • 4
  • Instead of fooling around alot with variables, cut, echo and sed, you could replace all by a simple sed -E "s/.*,(.*),(.*)/'\1.*,\2'/", but actually you don't want the single quotes to be part of the search pattern, so it's -E "s/.*,(.*),(.*)/\1.*,\2/". Escape the pattern variable with double quotes, like @Romeo indicated in his answer. – Philippos Apr 26 '17 at 07:39
  • Thanks, I replaced the variables with the sed command, but it generates a space at the beginning of the line which causes the grep to fail. – Shashank K R Apr 26 '17 at 08:23
  • What did you try? I do for pattern in "$(sed -E "s/.*,(.*),(.*)/\1.*,\2/" input.csv)"; do grep "$pattern" modded_file.csv.dat; done to iterate through your input file. You can also do the whole thing in one pass, but this requires deeper sed or awk experience – Philippos Apr 26 '17 at 08:40
  • Turns out that since the input file was windows based, it had carriage return for each line which I had to remove. Your solution worked perfect after this. Thanks! – Shashank K R Apr 26 '17 at 11:50
  • 1
    @ShashankKR That proves once more that it is always a bad idea to process CSV files (which only in a small subset of cases are line based) with line oriented tools. You should use Perl/Python/Ruby, that have libraries to deal with real CSV files. – Anthon Apr 26 '17 at 15:53

2 Answers2

3

As you can see from your input you have space symbol in variable sub4 so rewrite this line:

grep $sub4 modded_file.csv.dat;

to be

grep -- "$sub4" modded_file.csv.dat;

(Additions by @philippos) And $sub4 should not contain the single quotes ', because they would be considered part of the search pattern.

I think your misconception is the order of how quoting and expansion is performed: You think first variables will be expanded and then quoting is performed, so after expansion the single quotes of the variable would quote the string. But in fact quoting is done before variable expansion, so you need to quote the $sub

Romeo Ninov
  • 17,484
1

Running grep inside a loop is a massive antipattern. Try this instead.

awk -F "," 'NR==FNR { key[$1]=$2; value[$1]=$3; next }
    ($1 in key) && ($0 !~ "^" $1 ",.*," key[$1] "," value[$1] ",")' input.csv modded_file.csv.dat

I have not tried to understand why you want or expect the \r in there somehow so this probably requires some tweaking.

An Awk script consists of a sequence of *condition { action } pairs, which are applied each in turn on each input line. You can use next to skip the remaining script for this input line and skip to the next input, and you can omit the { action } part if you simply want to print the entire input line. (You can also omit the condition if you want to do something unconditionally.) Each line is split into fields which are available as $1, $2, etc within the script. -F "," sets the field separator to comma (the default is a sequence of whitespace).

The NR==FNR idiom is a common way to process two input files in Awk. The overall line number NR will be equal to the line number within the file FNR when you are processing the first input file, and false subsequently.

When we are reading the first file, we store the fields in two associative arrays, both keyed by the first field.

When we are reading the second file, we print every input line where the key is found in the key array, and the entire line does not match the expected regular expression (first field is key, followed by anything, followed by a comma, the column name we stored in key[$1], another comma, the expected value we stored in value[$1], and yet another comma).

In other words, this finds the lines where the expected condition was not met. Take out the ! if you want the matches instead.

tripleee
  • 7,699