Split a line and then rearrange it

Question

I have a file with 130 fields separated by semicolons. I want to rearrange them in some fashion.

Consider the below example:

File Sample.txt:

1;2;3;4;8;5;6;7;9;10;11;
11;12;13;14;18;15;16;17;19;20;21;

Required output (file req_op.txt):

1;2;3;4;5;6;7;8;9;10;11;
11;12;13;14;15;16;17;18;19;20;21;

Notice that the eighth element is misplaced. All I am doing is to streamline the line. The problem is there are 121 fields and so I am not able to use concise AWK commands to do this text manipulation in single line for the whole file.

I have tried the below and it is working. Can you suggest a more efficient or more readable solution? I request you to please also explain your solution.

Each field can have numbers and strings separated by space/string containing $, #, etc.

#!/bin/bash
file="sample.txt"
while read -r line
do
array=($(echo &quot;$line&quot; | sed 's/;/ /g'))

printf -v first '%s;' &quot;${array[@]:0:4}&quot;
printf -v last '%s;' &quot;${array[@]:8:12}&quot;
printf -v second '%s;' &quot;${array[@]:5:3}&quot;
printf -v third '%s;' &quot;${array[@]:4:1}&quot;

echo &quot;${first}${second}${third}${last}&quot; &gt;&gt; req_op.txt


done < $file

The actual number of fields:

Input:

1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|143|138|139|140|141|142|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162

output:

1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16|17|18|19|20|21|22|23|24|25|26|27|28|29|30|31|32|33|34|35|36|37|38|39|40|41|42|43|44|45|46|47|48|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74|75|76|77|78|79|80|81|82|83|84|85|86|87|88|89|90|91|92|93|94|95|96|97|98|99|100|101|102|103|104|105|106|107|108|109|110|111|112|113|114|115|116|117|118|119|120|121|122|123|124|125|126|127|128|129|130|131|132|133|134|135|136|137|138|139|140|141|142|143|144|145|146|147|148|149|150|151|152|153|154|155|156|157|158|159|160|161|162

I modified sed command shared by @Quasímodo; and now its working as expected.

sed -E 's~(([^\|]*\|){137})([^\|]*\|)(([^\|]*\|){5})~\1\4\3~' sample.txt

Use awk. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice — Ed Morton, Jul 27 '20 at 00:16
I wonder... is it always the 8th element, and only that, which is out of sequence, and is it always in the 5th position? Please ignore this comment if such is always the case. Other, consider something like perl -F';' -lne 'print join ";", sort { $a <=> $b } @F' sample.txt — , Jul 27 '20 at 05:00
yes only one element is out of position. Whose position is also fixed. — Sasuke Uchiha, Jul 27 '20 at 05:44
@SasukeUchiha I know you said awk but would you be open to a solution written in (SQL-like) query syntax? Specifically C#? — bit, Jul 27 '20 at 14:27
@SasukeUchiha I earnestly suggest you to rollback your question to revision number 3. That was the last version of the question that was still a quesiton. Now we have two questions to deal with, after three answers have been given. That is specially complicated because, small though the change looks like, | is a regex metacharacter, so it would require a complete refactory in all the answers. It's totally OK to open another question if you cannot adapt the answers to your real need. And don't worry, we all were new here once. — Quasímodo, Jul 27 '20 at 16:04

Quasímodo · Answer 1 · 2021-03-03T16:36:21.583

With Awk

awk 'BEGIN{FS=OFS=";"}{$8=$8 FS $5;$5=RS;sub(RS FS,"");print}' sample.txt > req_op.txt

Unwinded version, with comments:

awk '
  BEGIN{FS=OFS=";"} #Sets input (FS) and output (OFS) field separators
  {                 #For each line
    $8=$8 FS $5     #Append the 5th field after the 8th field
    $5=RS           #Put a newline (the record separator) in the 5th field
    sub(RS FS,"")   #Remove the newline and its following FS
    print           #Print the resulting line
  }     
' sample.txt > req_op.txt

Why was the record separator (in your case, a newline) chosen to temporarily replace the 5th field? Because it is the only character that for sure will not be in a record. Then, sub(RS FS,"") is certain to remove the 5th field, even if there is an empty field somewhere.

If you don't understand the sub line, remove it and see what happens with the output.

With Sed

With extended regex:

sed -E 's|(([^;]*;){4})([^;]*;)(([^;]*;){3})|\1\4\3|' sample.txt > req_op.txt

With basic regex, POSIX compliant, it is basically the same as above, but every (){} needs to be escaped (sigh!):
```
sed 's|$\([^;]*;$\{4\}\)$[^;]*;$$\([^;]*;$\{3\}\)|\1\4\3|' sample.txt > req_op.txt
```

s is the substitution command of sed. The character that follows it is the delimiter (I chose |). It delimits the regex slot, the replacement slot and the flags slot (which is empty in this case).

Some elements of the regex explained:

[^;]*;: Zero or more occurrences of any character except semicolon, followed by a semicolon.
([^;]*;){4}: The above expression is in a capture group and it should be repeated exactly 4 times.
(([^;]*;){4}): The above expression is in the outer capture group and is reproduced by \1 in the replacement expression; The inner capture group would be replaced by \2.

So, what happens in the first line 1;2;3;4;8;5;6;7;9;10;11; is

\1 gets 1;2;3;4;
\3 gets 8;
\4 gets 5;6;7;

and they are reordered as \1\4\3.

For more on back-references, read Using \1 to keep part of the pattern (that webpage is a nice sed tutorial by the way).

thank you for the explanation. Yes I do have empty fields. I will test it once. — Sasuke Uchiha, Jul 27 '20 at 05:40

score 6 · Answer 2 · answered Jul 26 '20 at 20:14

6

With Perl:

$ perl -F';' -lne 'splice @F, 7, 0, (splice @F, 4, 1); print join ";", @F' sample.txt 
1;2;3;4;5;6;7;8;9;10;11
11;12;13;14;15;16;17;18;19;20;21

See for example Splice to slice and dice arrays in Perl

answered Jul 26 '20 at 20:14

steeldriver

81,074

score 2 · Accepted Answer · answered Jul 27 '20 at 16:04

2

With perl, you can also do:

perl -F';' -lape '$_ = join ";", @F[0..3,5..7,4,8..10]' sample

Or for your actual input:

perl -F'[|]' -lape '$_ = join "|", @F[0..136,138..142,137,143..161]' input

answered Jul 27 '20 at 16:04

Stéphane Chazelas

544,893

very neat. Thank you – Sasuke Uchiha Jul 28 '20 at 05:01

score 0 · Answer 4 · answered Jul 30 '20 at 18:52

Python

#!/usr/bin/python
k=open('filename','r')
r=[]
v=[]
for i in k:
    r=[]
    v=[]
    j=i.strip().split(";")
    for g in j:
        if (g != ''):
            r.append(int(g.strip()))
    r.sort()
    e=r
    for d in e:
        v.append(str(d))
        v.append(str(";"))
    print "".join(v)

output

1;2;3;4;5;6;7;8;9;10;11;
11;12;13;14;15;16;17;18;19;20;21;

Split a line and then rearrange it

4 Answers4