0

I have a lot of text files of transcripts. I have cleaned it to an extent. The last bit of cleaning is the following.

I have this in certain files *.txt

Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this
and I also said this.
Laura: did i say anything.

I need it like this.

Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.

I want to move any line not containing colon (:) to the previous line. In the end I want each line to have a character's dialogue which ends with a newline.

I looked at this question but I couldn't figure out what to do. I am open to any tools sed/awk/python/bash/perl.

4 Answers4

0

With Sed, you could append a line to the pattern space, check if the appended portion (from the added newline to the end of the pattern) contains only non-colon characters, and if so, replace the last newline by a space:

sed -e :a -e '$!N; s/\n\([^:]*\)$/ \1/;ta' -e 'P;D' file.txt
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.
steeldriver
  • 81,074
0

How about awk? It keeps a copy of the last line; if no colon found (NF == 1) it appends the actual line to the last one to print both in one go. $0 is set to the empty string so won't be remembered.

awk -F: 'NF == 1 {LAST = LAST " " $0; $0 = ""}; LAST {print LAST}; {LAST = $0} END {print LAST}' file
Gary: I said something.
Larry: I said something else.
Mr. John: I said this. And maybe this and I also said this.
Laura: did i say anything.
RudiC
  • 8,969
0

Another awk attempt:

BEGIN{RS=":";ORS=":"; # use ":", ie. change of speaker, to recognise end of record
      FS="\n"}        # OFS is still " ", so newlines in input converted to spaces in output
!$NF { ORS="" }       # detect last line (no next speaker) and don't append a :
NF>1 {$NF = "\n" $NF} # restore the newline before the speaker's name
{print}               # print the result
JigglyNaga
  • 7,886
0
sed -e '
   /:/{$!N;}
   /\n.*:/!s/\n/ /
   P;D
' file.txt

 Gary: I said something.
 Larry: I said something else.
 Mr. John: I said this. And maybe this and I also said this.
 Laura: did i say anything.