67

I have a text file named links.txt which looks like this

link1
link2
link3

I want to loop through this file line by line and perform an operation on every line. I know I can do this using while loop but since I am learning, I thought to use a for loop. I actually used command substitution like this

a=$(cat links.txt)

Then used the loop like this

for i in $a; do ###something###;done

Also I can do something like this

for i in $(cat links.txt); do ###something###; done

Now my question is when I substituted the cat command output in a variable a, the new line characters between link1 link2 and link3 are removed and is replaced by spaces

echo $a

outputs

link1 link2 link3

and then I used the for loop. Is it always that a new line is replaced by space when we do a command substitution??

Regards

John WH Smith
  • 15,880
user3138373
  • 2,559

5 Answers5

68

Newlines get swapped out at some points because they are special characters. In order to keep them, you need to make sure they're always interpreted, by using quotes:

$ a="$(cat links.txt)"
$ echo "$a"
link1
link2
link3

Now, since I used quotes whenever I was manipulating the data, the newline characters (\n) always got interpreted by the shell, and therefore remained. If you forget to use them at some point, these special characters will be lost.

The very same behaviour will occur if you use your loop on lines containing spaces. For instance, given the following file...

mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

The output will depend on whether or not you use quotes:

$ for i in $(cat links.txt); do echo $i; done
mypath1/file
with
spaces.txt
mypath2/filewithoutspaces.txt

$ for i in "$(cat links.txt)"; do echo "$i"; done
mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

Now, if you don't want to use quotes, there is a special shell variable which can be used to change the shell field separator (IFS). If you set this separator to the newline character, you will get rid of most problems.

$ IFS=$'\n'; for i in $(cat links.txt); do echo $i; done
mypath1/file with spaces.txt
mypath2/filewithoutspaces.txt

For the sake of completeness, here is another example, which does not rely on command output substitution. After some time, I found out that this method was considered more reliable by most users due to the very behaviour of the read utility.

$ cat links.txt | while read i; do echo $i; done

Here is an excerpt from read's man page:

The read utility shall read a single line from standard input.

Since read gets its input line by line, you're sure it won't break whenever a space shows up. Just pass it the output of cat through a pipe, and it'll iterate over your lines just fine.

Edit: I can see from other answers and comments that people are quite reluctant when it comes to the use of cat. As jasonwryan said in his comment, a more proper way to read a file in shell is to use stream redirection (<), as you can see in val0x00ff's answer here. However, since the question isn't "how to read/process a file in shell programming", my answer focuses more on the quotes behaviour, and not the rest.

John WH Smith
  • 15,880
  • Also let's say I am not using quotes, then when I am applying the for loop, is it implicit that variable i will hold the value as the first file until it reaches a space which tells it that first file ends?? – user3138373 Oct 27 '14 at 21:16
  • 2
    With all do respect to John WH Smith, I'm not sure who is upvoting the answer. for i in $(cat ..) is wrong. See the comment of jasonwryan. That is the way how you read lines from a file. cat(1) is used to concatenate multiple files together. It should NOT be used to feed file data to processes. There are far better ways to achieve this. The application might take a file as argument (eg. grep ^foo file); or you might want to use file redirection (eg. read line < file). – Valentin Bajrami Oct 28 '14 at 09:51
  • @val0x00ff - it's not wrong because you say it is, certainly. what is wrong about it? – mikeserv Oct 28 '14 at 09:54
  • 1
    @val0x00ff I used cat because that is what the OP was using in his question ;) The question isn't really about "how to read a file", but "why are newlines lost". As far as I'm concerned, I would always use read, which is why I edited my answer afterwards to add this solution. I understand that cat shouldn't be used to read a single file, but since it isn't the main topic, I didn't spend too much time on it. – John WH Smith Oct 28 '14 at 10:47
  • 1
    @mikeserv - it treats data as code, which is generally considered wrong in any language. Or, any language except bash, apparently. – geirha Oct 28 '14 at 12:53
  • @geirha - this is not a true statement at all. It delimits fields on specified delimiters. If it is such an unpopular behavior, how is it awk is so ubiquitous? From the POSIX rationale: If the IFS variable is unset or is , the operation is equivalent to the way the System V shell splits words. Using characters outside the \s \n \t set yields the KornShell behavior, where each of the non- \s \n \t is significant. This behavior .. was taken from the way the original awk handled field splitting. – mikeserv Oct 28 '14 at 13:05
  • @mikeserv - "Take the data in this file and split it into words based on the characters in IFS, then for each of those words that happen to contain glob characters, attempt to replace those words with matching filenames". That certainly doesn't sound like treating data as data. – geirha Oct 28 '14 at 13:09
  • Yes - @geirha - globbing is a problem. That is a very excellent point. This is why the shell offers the set -f option. You can either expand filenames with set +f or not do with set -f. I specifically address that in my own answer here. And, as far as I can tell, it's the only one here that mentions it. – mikeserv Oct 28 '14 at 13:11
  • https://stackoverflow.com/questions/613572/capturing-multiple-line-output-into-a-bash-variable – sancho.s ReinstateMonicaCellio Sep 14 '18 at 12:39
  • The IFS=$'\n'; is needed for looping properly the quotes will not do line by line properly without it (BASH4+) – Mike Q Mar 13 '19 at 17:53
44

The newlines were lost, because the shell had performed field splitting after command substitution.

In POSIX Command Substitution section:

The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution. Embedded characters before the end of the output shall not be removed; however, they may be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.

Default IFS value (at least in bash):

$ printf '%q\n' "$IFS"
$' \t\n'

In your case, you don't set IFS or using double quotes, so newlines character will be eliminated during field splitting.

You can preserve newlines, example by settingIFSto empty:

$ IFS=
$ a=$(cat links.txt)
$ echo "$a"
link1
link2
link3
cuonglm
  • 153,898
  • 1
    Note: If you use printf instead of echo you avoid the IFS issue entirely – Oly Dungey Sep 26 '19 at 10:22
  • 2
    @OliverDungey it's not about echo or printf, it's about double quote"$a". the original question is using for loop, that's when field splitting occurs after command substitution. – cuonglm Sep 27 '19 at 04:16
7

To add my emphasis, for loops iterate over words. If your file is:

one two
three four

Then this will emit four lines:

for word in $(cat file); do echo "$word"; done

To iterate over the lines of a file, do this:

while IFS= read -r line; do
    # do something with "$line" <-- quoted almost always
done < file
glenn jackman
  • 85,964
  • for loops iterate over arguments. If you do IFS=\n; for word in cat file; do echo "$word"; done you'll get two loops and two lines printed. $IFS applies globally all of the time in much the same way as it does to read - except that the read/\newline relationship is pretty special. – mikeserv Oct 28 '14 at 04:45
  • 1
    I upvoted the second explanation while IFS because it yet shows another way of feeding lines from a file. Again. @mikeserv about for word in $(cat file) is wrong and should not be used in bash scripts or any other form. Let me emphasise once again: Never do this: for x in $(command) or command or $var. for-in is used for iterating arguments, not (output) strings. Instead, use a glob (eg. *.txt), arrays (eg. "${names[@]}") or a while-read loop (eg. while read -r line). See http://mywiki.wooledge.org/BashPitfalls#pf1 and http://mywiki.wooledge.org/DontReadLinesWithFor – Valentin Bajrami Oct 28 '14 at 11:20
  • @val0x00ff - I don't think you understand - $IFS is about arguments. Specifically, $IFS splits fields into arguments - that's its job. There are potential problems with that approach - but they are handled as easily as set -f; IFS=$delimiter - that's all you need do. For example, you could do the very slow while read -r line thing or you could do set -f; IFS=\n; set -- $(cat file). If you did that you'd get an array of the file's non-blank lines each in tact in $1 $2 $3... "$@". The wooledge wiki is typically an awful source of information - you should try to wean off of it. – mikeserv Oct 28 '14 at 11:33
  • @val0x00ff - I should temper that last statement - the Dontreadlineswithfor thing is actually mostly correct - and there is an excellent point there made about memory requirements. What they don't mention there is that bash's horrible $IFS handling is not typical to most other shells. In any case, any file small enough to warrant a while read loop is no threat to your total RAM however you should choose to map it. If it is of any consequential size, then you should probably be looking to sed or similar for processing - the shell is not often a good solution in that case. – mikeserv Oct 28 '14 at 11:45
  • 1
    To properly set IFS to a newline, use ANSI-C quoting: IFS=$'\n' -- this (IFS=\n) sets IFS to the letter "n" – glenn jackman Oct 28 '14 at 15:17
  • That's true - it's a pain to represent a newline in a comment so I sometimes fudge it. The quoting method you mention is not I usually use - I just type single-quote, ctrl+v,ctrl+j,single-quote. – mikeserv Oct 29 '14 at 00:14
4

You can use read from bash. Also look for the mapfile

while read -r link
  do
   printf '%s\n' "$link"
  done < links.txt

Or using mapfile

mapfile -t myarray < links.txt
for link in "${myarray[@]}"; do printf '%s\n' "$link"; done
-2

The newlines are replaced with spaces because that's how echo works - it concatenates its arguments on spaces. echo replaces argument delimiters with a space. In truth, you can iterate with for over anything you want, but you have to specify the field delimiter first:

string=abababababababababababa IFS=a        
for c in $string
do printf %s "$c"
done

OUTPUT

bbbbbbbbbbb

But this isn't behavior unique to a for loop - this happens for any field split expansion:

printf %s $string
bbbbbbbbbbb

For example, if you want to print only the first 10 bytes from any non-blank line in a file...

###original:
first "line"
<second>"line"
<second>"line"
<second>line and so on%
(IFS='                                                       
'; printf %.10s\\n $(cat file))
###output
first "lin
<second>"l
<second>"l
<second>li

There is a reason I specify non-blank above - the \newline is one of three special bytes in $IFS. While everything else will get you an empty argument when 2 or more occur in succession, any sequence of spaces, tabs, or newlines can only ever evaluate to a single field.

And so for example:

(IFS=0;printf 'ten lines!%s\n' $(printf "%010d"))

ten lines!
ten lines!
ten lines!
ten lines!
ten lines!
ten lines!
ten lines!
ten lines!
ten lines!
ten lines!

But...

(IFS=\ ;printf 'one line%s\n' $(printf "%010s"))
one line

In both cases printf prints 10 filler characters - in the first case it prints 10 zeroes and in the second 10 spaces. In the first case every 0 generates a null field and the second printf gets 10 empty arguments for each of which it writes its format string, but all of the spaces printed in the second case amount to nothing at all.

You should note that this is not the only kind of field generation that the shell will do with unquoted expansions - by default it will also glob. Doing things like:

for line in $(cat file)

Can lead to very unexpected results because there is a very real chance that some of those lines will contain shell globs that match real files - and suddenly $line doesn't refer to a line of input any more but instead an on-disk filename.

If you plan to use $IFS for any splitting it is always a good idea to:

set -f

...first, which will instruct the shell not to glob while you do. When you are through you can re-enable it with set +f.

mikeserv
  • 58,310
  • 2
    I think The newlines are replaced with spaces because that's how echo works seems to be wrong. – cuonglm Oct 28 '14 at 03:18
  • 2
    @cuonglm - it could be clearer, but the \newlines are replaced with field delimeters, and echo replaces the field delimiters with spaces - it concatenates its arguments on spaces. That's how echo works. – mikeserv Oct 28 '14 at 04:29