I want to merge different columns within my file that share the same column header. The file looks like this and can be tab-separated or something else:
AaBbN CcDdEeN FfN AaBbN FfN
1 5 4
3 1 2
2 NA 1
1 3
3 2
NA 4
So there are numbers or the string "NA" in the fields. The result would look like this:
AaBbN CcDdEeN FfN
1 5 4
3 1 2
2 NA 1
1 3
3 2
NA 4
There are a lot of columns that are not ordered, so the title headers would need to be read automatically instead of manually specifying every single one. There are also a lot of empty fields. I've been looking into the paste
and join
commands to do the job. Especially join
seems to do what I need, except it works with separate files, whereas my columns are within the same file.
So I tried to separate the columns into separate files and then combine them with join
. I used an awk
command that I derived from here:
awk ' { for( i = 1; i <= NF; i++ ) printf( "%s\n", $(i) ) >i ".txt"; } ' file.txt
which gives me separate columns, but here I ran into the first problem. All columns with empty space between the header and data were not processed correctly. Instead, only the column header was present in these files.
My second problem is with join
: When I try to merge files back again, I get errors because the input is not sorted, which is of course impossible to do. Any sorting would destroy the relationship I am looking after.
So here I am at a dead end. Is there a more convenient way to merge the columns directly within a file?
Edit:
AdminBees solution comes closest to solving the problem, but the result is not quite right. Here is the result of the awk script applied to the example above. I made sure that all entries are tab separated with sed -i "s/[[:space:]]/ /g"
(tab inserted with CTRL+V and TAB).
AaBbN CcDdEeN FfN FfN
1 5 4
3 1 2
2 NA 1
1
3
NA
awk
, otherwise more effort will be needed ... – AdminBee Apr 09 '20 at 08:47