3

I have this line of code that reads a text file line by line.

The text file is sometimes generated by a Windows user, sometimes by a Unix user. Therefore, sometimes I see \r\n at the end of the line and sometimes I see only \n.

I want my script to be able to deal with both scenarios and reach each line separately regardless of whether the linebreak is \r, or \n, or \r\n, or \n\r.

while read -r textFileLines; do ... something ...; done < text_file.txt

This code works with \n\r (LF CR) at the end of each line, but does NOT work when I have \r\n at the end of the line!

TEST

  • Create a new text file using Notepad++ v7.5.4

    enter image description here

  • while read -r LINE; do echo "$LINE"; done < /cygdrive/d/test_text.txt

  • output in Terminal:

    first_line
    second_line
    third_string
    

Why isn't the fourth_output line not shown?

muru
  • 72,889
vivoru
  • 103

4 Answers4

1

If you have some files that are DOS text file and some that are Unix text files, you script could pass all data through dos2unix:

dos2unix <filename |
while IFS= read stuff; do
   # do things with "$stuff"
done 

Unix text files would be unmodified by this.

To additionally cope with Mac line breaks, I believe you should be able to do

dos2unix <filename | mac2unix |
while IFS= read stuff; do
   # do things with "$stuff"
done 

The last line is not outputted by your read loop since it's not terminated, and therefore not a line at all.

To detect whether a file has no terminating newline on the last line, and add one if it hasn't, in bash:

if [ "$( tail -c 1 filename )" != $'\n' ]; then
    printf '\n' >>filename
fi

Related:

Kusalananda
  • 333,661
  • 1
    This still does not output the last line (fourth_output) in the example text file I have provided! – vivoru Jul 17 '18 at 14:17
  • 1
    @vivoru That has nothing to do with the \r etc. read will only read complete lines. What you have there is a unterminated line. The file is therefore not a text file at all. What you could do is to always add a newline with printf '\n' >>file. – Kusalananda Jul 17 '18 at 14:21
  • 1
    @vivoru Or see https://unix.stackexchange.com/questions/31947 – Kusalananda Jul 17 '18 at 14:24
1

Why isn't the fourth_output line not shown?

In your image, the file is missing the newline at the end of the last line. read returns true only if it reads the delimiter (newline), and since that's not there at the end of the last line, read returns false, your loop ends, and the last incomplete line is not printed.

This has nothing to do with the carriage returns, the behaviour is the same even with just NL, if the last line is missing the NL.

Here, file1 has two lines with CRLF line endings:

$ cat -A file1
foo^M$
bar^M$
$ while read x ; do echo "<$x>"; done < file1
>foo
>bar

file2 is missing the line ending on the second line:

$ cat -A file2 ; echo
foo^M$
bar
$ while read x ; do echo "<$x>"; done < file2
>foo

If you want to have the loop also process the final line fragment, you'll have to check if the read variable contains any data when read itself returns failure:

$ while read -r x || [ "$x" ] ; do echo "<$x>"; done < file2
>foo
<bar>

If you want to get rid of the CR, you can remove it within the loop, with e.g. x=${x%$'\r'}; (in Bash/ksh/zsh), or preprocess the file with tr -d '\r' or dos2unix or such.

ilkkachu
  • 138,973
0

There are explicit tools available to do this. the more common one that can be used to strip \r\n from files is called dos2unix.

If this isn't available on your system you can use one of the following commands to do something similar against your textFileLines variable:

awk

$ echo "$textFileLines" | awk 1 RS='\r\n' ORS=

sed 1

$ echo "$textFileLines" | sed -e 's/\r//g'

sed 2

$ echo $textFileLines | sed $'s/\r//'

tr

$ echo "$textFileLines" | tr -d '\r'

There are of course many other ways to do this, these are just a few of the more common ones.

References

slm
  • 369,824
0

Execute:

$ [ -n "$(tail -c1 infile)" ] && echo >> infile
$ sed 's/\r$\|^\r//g;s/\r/\n/g' infile | while IFS= read -r line
> do echo "$line" ; done
DOS       line
second     DOS
old  mac   line
new  mac   line
end\n\rreverse
linux      line
new linux  line

All issues solved.


Description:

To correct the missing last newline use:

[ -n "$(tail -c1 infile)" ] && echo >> infile

Which will add a trailing newline only if required (won't change a correct file).

Then, you could convert

  • \r\n (DOS style) to \n (just remove a \r at the end of the line)
  • \n\r (invalid DOS style?) to one \n (remove \r at start of line)
  • and then (with pairs corrected) convert \r (old MAC) to \n

in just one call of (GNU) sed with:

sed 's/\r$\|^\r//g;s/\r/\n/g' infile

If the text file is like this test file:

$ cat infile
DOS       line
second     DOS
new  mac   line
end\n\rreverse
linux      line
new linux  line
no  end   line

$ cat -A infile
DOS       line^M$
second     DOS^M$
old  mac   line^Mnew  mac   line$
end\n\rreverse$
^Mlinux      line$
new linux  line$
no  end   line

$  od -An -tc infile
   D   O   S                               l   i   n   e  \r  \n
   s   e   c   o   n   d                       D   O   S  \r  \n
   o   l   d           m   a   c               l   i   n   e  \r
   n   e   w           m   a   c               l   i   n   e  \n
   e   n   d   \   n   \   r   r   e   v   e   r   s   e  \n  \r
   l   i   n   u   x                           l   i   n   e  \n
   n   e   w       l   i   n   u   x           l   i   n   e  \n
   n   o           e   n   d               l   i   n   e