12

I am loading a file into variables, the problem is that the file is formatted in Windows, I believe, so I get a ^M instead of a newline.

How do I modify it when the value is in the variable? I am aware that I can modify the source in VI (I use OS X, by the way), but I can't modify the original file, only read it, so I have to remove the ^M from the variable.

From my understanding, \n is not the same as ^M, so tr command won't work.

EDIT

It seems the question is not clear; so this is the clarification.

I do parse the file line by line; each line has a 2 values, separated by tab and at the end of each line, there is a ^M, it does look like this:

value1    value2^M
value3    value4^M
value5    value6^M
value7    value8^M

My workflow is pretty straightforward and simple: the txt file contain what you see above, the loop separate fields and for each line get the values; when I print the second value it has the ^M, which I would like to remove

while IFS=$'\t' read -r -a line
do
    Type1="${line[0]}"
    Type2="${line[1]}"
done < $TXTFILE

Which means taht when I print Type1 it is fine, but Type2 variable contain the ^M. I did use tr and it didn't work, I did use sed to remove the last character of the variable, and it didn't work. Hope this clarify my question. Thanks

rataplan
  • 768

4 Answers4

9

^M is a carriage return (CR), which can be specified as \r for tr or within $'…'. \n specifies a line feed (LF), which is ^J. A Unix line ending is LF, and a Windows line separator is the two-character sequence CR-LF, so Windows text files viewed under a Unix system such as Linux or macOS look like they have ^M at the end of each line except on the last line which is missing its final newline.

You can remove carriage returns from a file with tr with

tr -d '\r' <somefile.txt >somefile.txt.new && mv somefile.txt.new somefile.txt

or more simply with dos2unix.

To avoid modifying the files, you can check each line when you read it and strip CR at the end of a line. For example, if you're using read to parse tab-separated values, then strip CR at the end of the last field. The parameter expansion ${VAR%$'\r'} yields the value of VAR minus a trailing CR, and yields the value of VAR if it doesn't end with CR.

while IFS=$'\t' read -r -a line
do
    line[$((${#line[@]}-1))]="${line[$((${#line[@]}-1))]%$'\r'}"
    Type1="${line[0]}"
    Type2="${line[1]}"
done < "$TXTFILE"
  • Thanks for clarifying that /r is in fact the same as ^M; the solution works – rataplan Nov 27 '16 at 05:25
  • Note that OS/X comes with a very old version of bash, so I don't expect it would support line[-1] for which you need bash-4.3 or above. It comes with zsh that supports it (and has for decades), but note that in zsh, the first element is $line[1], not $line[0] (unless in ksh emulation). With older bash, you can always use line[${#line[@]}-1] – Stéphane Chazelas Nov 27 '16 at 10:59
  • +1 for the background explanation, but I think jiliagre's solution is much better, and much simpler. – Wildcard Nov 27 '16 at 16:33
8

Here is the simplest way to fix your script, simply add "carriage return" as a internal field separator for the read command:

while IFS=$'\t\r' read -r -a line
do
  Type1="${line[0]}"
  Type2="${line[1]}"
done < $TXTFILE
jlliagre
  • 61,204
7

Use (for short strings):

${var//$'\015'}

Example:

$ var=$'This is a test of a CR (\r) character'
$ echo "${var//$'\r'}"
This is a test of a CR () character

For longer strings you may need sed or awk.

0

A more generally useful way to convert the content of "DOS" files, which have no other content marker than having CR+LF line endings (in contrast to Linux' LF only).

For Ubuntu, first and once only, do

sudo apt install dos2unix

the use as indicated below, here with od used to verify the output

$ dos2unix < $TXTFILE | od -t x1z -w17 
0000000 76 61 6c 75 65 31 20 20 20 20 76 61 6c 75 65 32 0a  >value1    value2.<
0000021 76 61 6c 75 65 33 20 20 20 20 76 61 6c 75 65 34 0a  >value3    value4.<
0000042 76 61 6c 75 65 35 20 20 20 20 76 61 6c 75 65 36 0a  >value5    value6.<
0000063 76 61 6c 75 65 37 20 20 20 20 76 61 6c 75 65 38 0a  >value7    value8.<
0000104

$ cat $TXTFILE | od -t x1z -w18
0000000 76 61 6c 75 65 31 20 20 20 20 76 61 6c 75 65 32 0d 0a  >value1    value2..<
0000022 76 61 6c 75 65 33 20 20 20 20 76 61 6c 75 65 34 0d 0a  >value3    value4..<
0000044 76 61 6c 75 65 35 20 20 20 20 76 61 6c 75 65 36 0d 0a  >value5    value6..<
0000066 76 61 6c 75 65 37 20 20 20 20 76 61 6c 75 65 38 0d 0a  >value7    value8..<
0000110

This will translate not only the line endings but also other special characters, depending on the parameters to dos2unix or its counterpart unix2dos (which gets installed at the same time).

Hannu
  • 494
  • isn't dos2unix there by default? – phuclv Nov 27 '16 at 04:51
  • I did clearly specified that I know how to do it via file modification; plus I am not using linux but OSX, so to install dos2unix I have to use brew – rataplan Nov 27 '16 at 05:23
  • dos2unix isn't FIXATED to using FILE-modification, it is "a filter" and can be used in pipes; just like tr. It should also be preferred over tr as it handles charsets on a higher level, not only single byte codes. – Hannu Nov 27 '16 at 09:10
  • @Lưu Vĩnh Phúc, I'm using Ubuntu 16.04 and have a quite fresh install, and I had to install it. – Hannu Nov 27 '16 at 09:11