0

I am trying to read a file in Linux and as soon as a "&" character is encountered, I am writing the output to another file, sending that file to another folder and then continuing to read the original file till next "&" and so on

Input xml file-

<Document>
<tag1>
<tag2>
</Document>
&
<Document>
<tag3>
<tag4>
</Document>
&
<Document>
<tag5>
<tag6>
</Document>

My code snippet -

while IFS= read -r line;do
     if [["$line" =="$delimeter"]];then
         echo "$line" | sed "s/delimeter.*//">> "$output_file"
         cp "$output_file" "$TARGET_FOLDER" 
         break
     else
         echo "$line" >> "$output_file"
     fi
done < "$input_file" 

However, the code is producing the entire file as the output instead of splitting by occurrence of delimeter, can I please be directed towards where I'm going wrong?

Expected Output - The first <Document> to </Document> (till &) section is put in output file, which is copied to TARGET_FOLDER. Then the next <Document> to </Document> section is copied and so on.

Thankyou for your help!

Kusalananda
  • 333,661

2 Answers2

1

Sounds like a job for csplit:

mkdir -p target &&
  csplit -f target/output. your-file '/^&$/' '{*}'

Would create target/output.00, target/output.01... files, splitting on lines that contain &.

If you just want one target/output file with the & lines removed, then that's just:

grep -vx '&' < your-file > target/output

Or if it's to send to an output file in target.xx directories:

csplit -f '' -b target.%02d/output your-file '/^&$/' '{*}'

Though note that the target.00..target.n directories must exist beforehand.

In any case, you don't want to use a shell loop to process text.

  • Hi, thanks for your help, the first suggestion from you is only generating one output file - output.00 – python6 Oct 11 '23 at 05:51
  • 1
    @python6 then the delimiter line likely contains not only &. Maybe it has whitespace or invisible characters around it such as a CR character if it comes from the Microsoft world, and you'd need to adapt the regexp (/^&$/) accordingly (like /^[[:space:]]*&[[:space:]]*$/ to allow any amount of whitespace including CR characters on either side of the &). – Stéphane Chazelas Oct 11 '23 at 05:54
  • great, works now – python6 Oct 11 '23 at 06:04
0

With awk:

awk 'BEGIN{RS="&"}{print $0 > ++c".xml"}' file.xml
ls -ltr