-2

I am trying to make my bash script a function with Bash input parameter but AWK's syntax is causing a problem. Original AWK code

http://stackoverflow.com/a/19602188/54964
awk -F "\"*,\"*" '{print $2}' textfile.csv

Pseudocode with Bash parameter $1

file=$(awk -v colN="$1" -F "\"*,\"*" '{print $"${colN}"}' "${input}") 
# http://stackoverflow.com/a/19602188/54964 
# http://stackoverflow.com/a/19075707/54964

The problem is the part print $"${colN}".

Current output fails to catch the second column and takes the whole line etc

-0.21,-0.245
-0.205,-0.22

Having only print $colN is not correct, since it takes then always the first column regardless of the value in $1.

Example of the use case where I call it by bash code.bash 2; or complete script here which works if you do not hard-code which column to choose (1/2) in all two-column CSV files for the joined result of second columns

#!/bin/bash
ids=(101 118 201)
dir="/home/masi/Documents/CSV/"
index=0
for id in "${ids[@]}";
do
        input=$(echo "${dir}P${id}C1.csv")
        # take second column of the file here
        file=$(awk -v colN="$1" -F "\"*,\"*" '{print $colN}' "${input}") # http://stackoverflow.com/a/19602188/54964 # http://stackoverflow.com/a/19075707/54964

        Ecgs[${index}]="${file}"
        index=$index+1
done

Inputs multicolumn 1.csv 2.csv 3.csv

-0.21,-0.245
-0.205,-0.22

Wanted output

101,118,201
-0.245,-0.245,-0.245
-0.22,-0.22,-0.22

OS: Debian 8.5
Bash 4.30

  • you are using the $ two times – magor Nov 03 '16 at 16:32
  • it's a variable, as far as i remember, you don't need any $, you just refer to it as colN, and yes, to refer to the number of column, you use a $ – magor Nov 03 '16 at 16:34
  • 1
    tried using $colN ? – magor Nov 03 '16 at 16:36
  • @Masi, you're only asking for the second column to be printed. Did you want the first two columns to be printed? – Wildcard Nov 03 '16 at 16:44
  • Why do you want the columns in a Bash array? What's the actual final end result you want? – Wildcard Nov 03 '16 at 16:48
  • Also see Why is using a shell loop to process text considered bad practice? You might consider doing the entire thing in Awk, rather than using a Bash array. But I don't know the use case. – Wildcard Nov 03 '16 at 16:49
  • to save the 2nd column form each file, paste is a better solution, a solution was provided earlier to you – magor Nov 03 '16 at 16:51
  • @Masi, can you please provide a simple example input and example output? – Wildcard Nov 03 '16 at 16:55
  • Also, does your script actually work as written? I should think you would want >>, not >. If it's a working script, you might post it on http://codereview.stackexchange.com. – Wildcard Nov 03 '16 at 16:56
  • I will say this, though: I am absolutely certain that the entire script can be replaced with a single simple command. I'm just having trouble telling what you are actually trying to do with the script. So an example input and output would help a lot. – Wildcard Nov 03 '16 at 16:58
  • @Masi, but they're multi-column CSVs and you just want one column from each—right? – Wildcard Nov 03 '16 at 17:03
  • 1
    The example input/output is really bad. Are the values actually the same for the first and second field? Are there ever more than two fields in the input files? I'm thinking paste -d, /home/masi/Documents/CSV/P{101,118,201}C1.csv | awk -F, -v OFS=, '{print $2, $4, $6}' but if there is ever more than two fields in the input files that won't work as expected. – Wildcard Nov 03 '16 at 17:12
  • @Wildcard I opened the thread about the script validity in the code review http://codereview.stackexchange.com/q/146360/122105 I think it would be better to replace AWK part from something else because it causes the strange problem. – Léo Léopold Hertz 준영 Nov 07 '16 at 06:45

3 Answers3

1

Your example input has the same values in the first and second field for all files (and the same values for all files), which doesn't really help understand the exact use case. After all, if you really want the same value three times and you can get it from any field of any input file, you don't even need to check the other two files. You can just use:

cut -d, -f2 input.csv | paste -d, - - -

Of course this doesn't work for real input, just your example input. (Work on improving your example input/output for this type of question, it helps a lot.)


If we make the assumptions that:

  • You always have exactly three input files
  • Called input1.csv, input2.csv, input3.csv
  • With exactly two columns each
  • And you want the second column from each file

You can do this most easily with a combination of Awk and paste (and shell file globbing):

paste -d, input[123].csv | awk -F, -v OFS=, '{print $2, $4, $6}'

If those assumptions are wrong, blame poor input/output examples. ;)

Wildcard
  • 36,499
  • Can you show me how you make this function because it is the main topic here? - - Etc take second columns of each file. - - I think the outputs are correct but the behaviour with Bash input parameters is my interest, etc with your last command. – Léo Léopold Hertz 준영 Nov 03 '16 at 17:22
  • Assume you have a list of inputs. This is not working as expected paste -d"," ${input[@]} | awk -F, -v OFS=, '{print $2, $4, $6}' > /tmp/testShort.csv. – Léo Léopold Hertz 준영 Nov 03 '16 at 17:30
  • @Masi, just list them individually. paste -d, file1 file2 file3 | ... – Wildcard Nov 03 '16 at 17:32
  • 1
    @Masi, how are you setting the input array? If you do files=(file1.csv file2.csv file3.csv) you can follow it up with paste -d, "${files[@]}" | awk -F, -v OFS=, '{print $2, $4, $6}'. – Wildcard Nov 03 '16 at 17:35
  • input array is set like in the loop of the body. - - Is it equivalent to your files? Should I change the datastructure there? If so, how? - - This is the output from your proposal -0.245,, for the second line. Please, try to use Bash input parameters for the task, like I do. – Léo Léopold Hertz 준영 Nov 03 '16 at 17:36
0

To answer your question as stated, given

$ cat file
a,b,c
d,e,f
g,h,i
j,k,l

and a simple test script

$ cat col.bash
#!/bin/bash

awk -F, -vcol="$1" '{print $col}' file

you can verify that $col indeed references the desired column i.e.

$ ./col.bash 2
b
e
h
k

If that's not working in your case, then there are other factors at play. Regardless, there are far simpler ways of extracting columns from multiple files.

steeldriver
  • 81,074
0

Using Bash and AWK in the case will be very hard. I could not solve the problem by the solutions proposed here. You are going to have much problems with "/'/... so a single tool is necessary here.

Use just gawk as discussed in the thread ECG Bash selection tool.

# https://codereview.stackexchange.com/a/146370/122105
#!/usr/bin/gawk -f

# https://www.gnu.org/software/gawk/manual/html_node/Join-Function.html
@include "join.awk"

BEGIN {
    FS = "\"*,\"*";
    last_row = 0;
}

BEGINFILE {
    rows[0][ARGIND] = gensub(".*P([0-9]*)C.*", "\\1", "g", FILENAME);
}

{
    rows[FNR][ARGIND] = $col;
    if (FNR > last_row) { last_row = FNR; }
}

END {
    for (r = 0; r <= last_row; r++) {
        print join(rows[r], 1, ARGC - 1, ",");
    }
}

Please, read the complete answer of 200_success here with excellent explanations.