2

I have the input data with 5 columns separate by tabs:

Class1,Class2 info1 info2 info3 info4
Class3 info1a info2a info3a info4a
Class4,Class5 info1b info2b1,info2b2 info3b info4b

When I have, in the first column, data separate by comma, I would like to separate it by different rows with same information from the 2nd-5th columns, for example:

Class1 info1 info2 info3 info4
Class2 info1 info2 info3 info4
Class3 info1a info2a info3a info4a
Class4 info1b info2b1,info2b2 info3b info4b
Class5 info1b info2b1,info2b2 info3b info4b

I have no idea how to do that. Any suggestion?

Alex
  • 334
  • 5
  • 15

3 Answers3

1

awk solution:

awk '$1~/.+,.+/{ split($1,a,","); $1=""; sub(/^ */,"",$0); 
     for(i=1;i<=length(a);i++) print a[i],$0; next }1' file

The output:

Class1 info1 info2 info3 info4
Class2 info1 info2 info3 info4
Class3 info1a info2a info3a info4a
Class4 info1b info2b1,info2b2 info3b info4b
Class5 info1b info2b1,info2b2 info3b info4b

  • $1~/.+,.+/ - process line if the 1st column contains comma-separated items
  • split($1,a,",") - split the 1st column into array of items
  • It worked, but sometimes I have more than 2 variables in the first column (i.e. Class1,Class2,Class3,Class4, Class5...). Do I need to consider it as well? It looks like this command works only if I have 2 variables separate by comma in the first column, right? – Alex Jun 08 '17 at 13:56
  • @Paul, ok, see my update, it'll work for "more than 2 items" within the 1st column – RomanPerekhrest Jun 08 '17 at 14:10
1
perl -F'\t' -lane '$,="\t";
   print $_, @F for split /,/, splice @F, 0, 1;
' yourfile

Results

Class1  info1   info2   info3   info4
Class2  info1   info2   info3   info4
Class3  info1a  info2a  info3a  info4a
Class4  info1b  info2b1,info2b2 info3b  info4b
Class5  info1b  info2b1,info2b2 info3b  info4b
1

POSIX sed

TAB=$(printf \\t) NL=$(printf \\nn | sed -e '$!s/$/\\/')
sed -e "s/^\([^,${TAB}]*\),\([^${TAB}]*\)\(.*\)/\1\3${NL%?}\2\3/;P;D" yourfile

We first define TAB and newline variables able to be used in a double-quoted sed command in the absence of the escape sequence \t and \n not being available under POSIX sed on the lhs and rhs respectively of the s/// sed command.


Results

Class1  info1   info2   info3   info4
Class2  info1   info2   info3   info4
Class3  info1a  info2a  info3a  info4a
Class4  info1b  info2b1,info2b2 info3b  info4b
Class5  info1b  info2b1,info2b2 info3b  info4b