I have a file with the following format, each column separated by tabs:
C1 C2 C3
a b,c d
e f,g,h i
j k l
...
Now I need to have the number of lines according to the number of values separated by commas (if that's the case) in the 2nd column. The lines must have one of the values and not the others. The result would be this:
C1 C2 C3
a b d
a c d
e f i
e g i
e h i
j k l
...
...
As this is due to work asap, I've just made a don't do this at home script, reading line by line with a while
, due to my lack of skills in awk
, or not exploring other possible solutions with other tools. The script is as follows:
I'm revising the script in the meantime
# DON'T DO THIS AT HOME SCRIPT
> duplicados.txt
while IFS= read -r line; do
# get the value of the column of interest
cues="$(echo "$line" | awk -F'\t' '{ print $18 }')"
# if the column has commas then it has multiple values
if [[ "$cues" =~ , ]]; then
# count the commas
c=$(printf "%s" "$cues" | sed 's/[^,]*//g' | wc -c)
# loop according to the number of commas
for i in $(seq $(($c + 1))); do
# get each value of the column of interest according to the position
cue="$(echo "$cues" | awk -F',' -v c=$i '{ print $c; ++c }')"
# save the line to a file substituting the whole column for the value
echo "$line" | sed "s;$cues;$cue;" >> duplicados.txt
done
continue
fi
# save the single value lines
echo "$line" >> duplicados.txt
done < inmuebles.txt
With this I get the desired result (as far as I can tell). As you can imagine the script is slow and very ineficient. How could I do this with awk
or other tools?
A sample of the real data is like this, being the column of interest the number 18:
1409233 UNION VIAMONTE Estatal Provincial DGEP 3321 VIAMONTE -33.7447365;-63.0997115 Rural Aglomerado 140273900 140273900-ESCUELA NICOLAS AVELLANEDA
1402961 UNION SAN MARCOS SUD Estatal Provincial DGEA, DGEI, DGEP 3029, 3311, Z11 SAN MARCOS SUD -32.629557;-62.483976 / -32.6302699949582;-62.4824499999125 / -32.632417;-62.484932 Urbano 140049404, 140164000, 140170100, 140173100 140049404-C.E.N.M.A. N° 201 ANEXO SEDE SAN MARCOS SUD, 140164000-C.E.N.P.A. N° 13 CASA DE LA CULTURA(DOC:BERSANO), 140170100-ESCUELA HIPOLITO BUCHARDO, 140173100-J.DE INF. HIPOLITO BUCHARDO
1402960 UNION SAN ANTONIO DE LITIN Estatal Provincial DGEA, DGEI, DGETyFP 3029, TZONAXI, Z11 SAN ANTONIO DE LITIN 3601300101020009 360102097366 0250347 SI / SI -32.212126;-62.635999 / -32.2122558;-62.6360432 / -32.2131931096409;-62.6291815804363 Rural Aglomerado 140049401, 140313000, 140313300, 140483400, 140499800 140049401-C.E.N.M.A. N° 201 ANEXO SAN ANTONIO DE LITIN, 140313000-I.P.E.A. Nº 214. MANUEL BELGRANO, 140313300-J.DE INF. PABLO A. PIZZURNO, 140483400-C.E.N.P.A. DE SAN ANTONIO DE LITIN, 140499800-C.E.N.P.A. B DE SAN ANTONIO DE LITIN