1

I have 20 excel files having 6x6 rows and columns, where the first row and column are string headers. Each of these files has 4 same columns and row headers except 1 different. I want to know how I can modify them so that the the columns and rows with the same header in each file will have the same order, while the different row and column will always be the last one.

So for example:

If a.csv looks like this:

    a   b   d   c   x
a   1   2   3   5   3
b   2   2   5   5   2
d   2   3   4   4   6
c   5   5   6   6   5
x   3   1   6   7   9

and b.csv looks like this:

    d   c   b   a   y
d   2   3   6   5   3
c   5   2   6   6   5
b   6   4   2   3   4
a   6   4   4   6   2
y   5   3   6   7   9

and c.csv looks like this:

    a   c   d   b   z
a   3   3   5   5   2
c   5   4   6   6   1
d   7   4   5   7   2
b   3   2   6   6   7
z   5   3   6   4   7

and so on.... for all the 20 files

This is how I want them to look:

a.csv:

    a   b   c   d   x
a   1   2   5   3   3
b   2   2   5   5   2
c   5   5   6   6   5
d   2   3   4   4   6
x   3   1   7   6   9

b.csv:

    a   b   c   d   y
a   6   4   4   6   2
b   3   2   4   6   4
c   6   6   2   5   5
d   5   6   3   2   3
y   7   6   3   5   9

same with c.csv and the rest of the excel files.

Tak
  • 529
  • Are the data columns TAB delimited or by spaces? (Specifically in the first line of each file.) – Janis Apr 13 '15 at 09:41
  • The data are inserted in the excel cells by default, I didn't add any spaces or delimiters. – Tak Apr 13 '15 at 09:47
  • My comment had this goal to understand: Is the first line: TAB a TAB b TAB c ..., or is it SPACES a SPACES b SPACES c .... AFAICT, excel can create the former syntax, which would have advantages, since then every line would have 6 elements, where the first element of each first line would be the empty string. (In case of SPACES we had to differenciate two cases; the header line with five fields, and the data lines with 6 fields). Also formatting would be easier in case of TABs. – Janis Apr 13 '15 at 10:04
  • Oh okay, no the first cell (1,1) is a string lets say "data" just any string (no spaces in it), sorry for not including this in my example. – Tak Apr 13 '15 at 11:02
  • I fear my point is still not answered. In Excel the first cell may stay empty, okay. My point is; how is that exported. If it's exported as TAB-separated or semicolon-separated, or similar, that's fine. But if it's separated in a way that leading spaces are there, you can't in the textfile (in the general case) distinguish an empty data(0,0) field from a non-existing data(0,0) field; the former will lead to an interpretation of there being only 5 fields in the text file, the latter of being 6 fields, where the first one is empty. - I will propose a solution below and we can discuss it then. – Janis Apr 13 '15 at 13:03
  • okay, I exported them from Matlab, they are a Matlab Table. – Tak Apr 13 '15 at 13:08
  • I don't know about the Matlab export/output format(s). You can post the output of od -c a.csv to be sure about the format. – Janis Apr 13 '15 at 13:11

1 Answers1

3

For any fixed ordering per call (a, b, c, d, in your example) you can use for example an awk program like this to accomplish the task:

awk -v order=',a,b,c,d' '

BEGIN {
    OFS = FS = "\t"
    n = split(order,ord,",")
    for (i=1; i<=n; i++) ind[ord[i]] = i
}
FNR==1 {
    for (i=2; i<=n; i++) prm[ind[$i]] = i
}
{
    out[$1] = sprintf("%s", $1)
    for (i=2; i<=n; i++) {
        out[$1] = out[$1] sprintf("%s%s", OFS, $prm[i])
    }
    out[$1] = out[$1] sprintf("%s%s", OFS, $NF)
}
ENDFILE {
    for (i=1; i<=n; i++) print out[ord[i]]
    print out[$1] ORS
}

' a.csv b.csv c.csv

Note: This code assumes a recent version (4.x) of GNU awk because of the ENDFILE condition. (If that is not available the solution requires an adjustment.)

Note 2: If you want to process the files individually (one data file per awk call) you can replace ENDFILE by END (which will run also in older awks).

The results for your sample data are:

    a   b   c   d   x
a   1   2   5   3   3
b   2   2   5   5   2
c   5   5   6   6   5
d   2   3   4   4   6
x   3   1   7   6   9

    a   b   c   d   y
a   6   4   4   6   2
b   3   2   4   6   4
c   6   6   2   5   5
d   5   6   3   2   3
y   7   6   3   5   9

    a   b   c   d   z
a   3   5   3   5   2
b   3   6   2   6   7
c   5   6   4   6   1
d   7   7   4   5   2
z   5   4   3   6   7
Janis
  • 14,222