For loop and reading a text file

Question

Why doesn't this for loop work like I expect it to?

Here is my script file (I am running zsh on a MAC computer with Ventura OS 13.2.1)

#!/bin/zsh
for user in "$(cat $1)";
do
        website="myWebSite.org/@/"
        url=$website$user
        echo "here is the url: $url"
done

here is my data file, it has four user's names, one on each line:

cat userList.txt 
user_1
user_2
user_3
user_4

I expected (want) the output to be:

myWebSite.org/@/user_1
myWebSite.org/@/user_2
myWebSite.org/@/user_3
myWebSite.org/@/user_4

Instead, here is the output I get:

./findusers userList.txt 
here is the url: myWebSite.org/@/user_1
user_2
user_3
user_4

I've googled for hours and cannot find anything even close to this type of problem. It's almost like the shell is operating on some file I don't see, I don't get why the echo command inside the for loop only executes once. Is this some version issue with stale files on my system? Any help appreciated, coding in zsh is not my day job, thanks!

Don't use a loop for processing data, in this case you use special tools for that, e.g. try: awk '{print "myWebSite.org/@/"$1}' userList.txt — Edgar Magallon, Mar 10 '23 at 02:31
If you want to use the for loop then remove the quotes from "$(cat $1)": for user in $(cat "$1"); but as I said it's not recommendable to use a loop — Edgar Magallon, Mar 10 '23 at 02:36
thanks! I will check those out. Of course I don't want to "print" the result, I want to use it along with curl to load a webpage...but I am guessing this is possible by perhaps: url = awk '{print "myWebsite.org/@/"$1"}'. ??? sorry, I am not experienced with this and I don't do it very often. thx — michel b, Mar 10 '23 at 02:46
The basic problem here is that for iterates over words rather than lines; see "Why you don't read lines with for" (it's in the BashFAQ/BashPitfalls, but applies reasonably well to zsh as well). Use while IFS= read -r user; do ... done <$1 instead. — Gordon Davisson, Mar 10 '23 at 06:23
Thanks! more excellent resources and responses! To be honest, I started out with the While IFS=, but then got so confused I tried switching to a for loop to see if it made any differences. The for loop is where I ended up after trying for too much time. I plan to consume all of the responses here and make things better while learning. Appreciate the reply!! and everyone's time to help out! — michel b, Mar 10 '23 at 15:17

Stéphane Chazelas · Answer 1 · 2023-03-10T08:11:13.827

You need to split that command substitution.

for user in $(<$1)

With $(...) unquoted, splits on characters of $IFS: space, tab, newline and nul by default¹. Here using the $(<file) Korn-like operator rather than $(cat -- $1) as an optimisation.

To split on newline (aka linefeed) only, either do the same but after IFS=$'\n' or use the f parameter expansion flag (short for ps[\n]):

for user in ${(f)"$(<$1)"}

Note the quotes to prevent IFS-splitting, and then the f flag to split on newline.

You could also use a while read loop:

while IFS= read -ru3 user; do
  ...
done 3< $1

One difference from the previous approaches is that it won't skip empty lines.

It will also skip characters after the last newline if any but those are not allowed in text files.

It avoids storing the whole file in memory but on the other hand means the file is going to be read one byte at a time as each read needs to make sure it does not read past the newline character that delimits the line.

With:

for user in "${(f@)$(<$1)}"

Or:

IFS=$'\n\n'
for user in $(<$1)

Empty lines are preserved except for trailing ones as command substitution strips all trailing newline characters.

To read all lines into an array, also considering empty lines and the non-line made of the bytes after the last newline if any and loop over it, that becomes quite awkward, you could use a helper function:

lines() {
  local ret
  reply=( "${(@f)$(cat -- "$@"; ret=$?; echo .; exit $ret)}" )
  ret=$?
  reply[-1]=( ${reply[-1]%.} )
  return $ret
}
lines myfile &&
  for line in "$reply[@]"; do
    something with "$line"
  done

Also note that echo should be avoided to output arbitrary data (though in the case of zsh, you can actually use echo -E - $data), better to use printf '%s\n' "$data" as in any other shell or print -r -- "$data" as in the Korn shell.

^{¹ note that contrary to other POSIX-like shells such as bash, zsh by default doesn't have that misfeature whereby the result is further subject to globbing, so you don't need set -o noglob there as you would in sh/bash/ksh for instance.}

Thank you!!! Where can I find more information on Command Substitution? This is one thing I came across while searching around, and I found it difficult to google search "command substitution" as a tool in shell scripts. Appreciate the response! — michel b, Mar 10 '23 at 15:25
@michelb see info zsh 'command substitution' (assuming manual in info format is available; on some systems, you need to install a zsh-doc package; see online for doc in the latest version). See also Understanding Bash's Read-a-File Command Substitution for the ksh-style $(<file) operator affiliated to command substitution. — Stéphane Chazelas, Mar 10 '23 at 15:54
@minseong, -r to prevent the backslash processing that read otherwise does by default (see Understanding "IFS= read -r line"), -u3 to specify the fd where to read the lines from. Here opening the file on the first fd above 2 (as 0, 1, and 2 are otherwise used for stdin, stdout and stderr). — Stéphane Chazelas, Feb 18 '24 at 17:31

Kusalananda · Answer 2 · 2023-03-10T14:04:36.190

2

You say (in comments) that you want to take each line in your input file, prepend a URL, and use it in calls to curl.

The arguably best way of doing this is to compose a configuration file for curl with several url lines on the form

url = http://some/url

This file is then passed to a single invocation of curl.

To do this:

curl --config <( sed 's|^|url = http://example.com/|' file )

Would you want to save the output from accessing each URL, maybe to some file called line.out (where line is the line read from your file), you just need to insert an output statement for each URL.

The following is using GNU sed:

curl --config <( sed 's|.*|url = http://example.com/&\noutput = &.out|' file )

Or, using awk instead:

curl --config <( awk '{ printf "url = http://example.com/%s\noutput = %s.out\n", $0, $0 }' file )

Note that these two last commands assume that we know that the lines in the input file contain simple words. If the strings contain absolute or relative pathnames, or patterns specific to the curl utility, these may have to be sanitised first.

edited Mar 10 '23 at 14:04

answered Mar 10 '23 at 12:27

Kusalananda

333,661

2

Or `LC_ALL=C sed 's/[\"]/\&/g; s|.*|url "http://example.com/&"\noutput "&.out"|' to avoid problems with lines containing characters special in the syntax of curl's config. Sanitizing the input may be a good idea to avoid that writing files outside of the current working directory. Also beware of potential problems if the input may contain NUL bytes. – Stéphane Chazelas Mar 10 '23 at 13:37
With curl 7.68.0 at least, I find that the exit status is non-zero only if the last URL can't be downloaded or the last output file can't be opened. – Stéphane Chazelas Mar 10 '23 at 13:44
Thank you for the detailed and advanced response! This one goes way over my head, but your time is much appreciated. In the end, I want to create a url that includes a username from the file containing the users, get the webpage at the url, and look to see if a particular word (text) exists on the webpage. If so, indicate as such (write to a file). If not, move on to the next username in the file. Once they showed me how to fix my original for-loop, I was able to accomplish what I wanted, albeit in a crude fashion. With your answer I will try to educate myself, I appreciate your time and info. – michel b Mar 10 '23 at 15:53

score -4 · Answer 3 · answered Mar 10 '23 at 04:17

-4

I also had problems with the for loop reading files

And for that I use a while loop:

FILE="some_items.txt"
LINES=$(cat $FILE|wc -l)
INDEX=0
while [ $INDEX -lt $LINES ]
do
   LN=$((INDEX + 1))
   ITEM="$(cat $FILE|head -n $LN|tail -n 1)"
   echo "current item is $ITEM"
   INDEX=$((INDEX + 1))
done

This works on zsh, bash and sh

answered Mar 10 '23 at 04:17

カルロサグ

11

1

This is extremely inefficient, since it reads and re-reads the beginning of the file over and over again, as many times as there are lines in the file. This makes it take quadratic time, i.e. proportional to the square of the number of lines in the file. It also creates three processes per iteration, which is slow as well. See "Counting lines or enumerating line numbers so I can loop over them - why is this an anti-pattern?" – Gordon Davisson Mar 10 '23 at 06:26
@GordonDavisson it may not be efficient but it works – カルロサグ Mar 10 '23 at 06:43
@GordonDavisson Everything in shell, bash, zsh, whatever is slow AF, it doesn't make much sense to make it more readable or more faster and of course it's horrible. But also, don't take the example in a literal sense: if you have a large job to do per line you should obviously use sleep, etc... – カルロサグ Mar 10 '23 at 06:53
1

Sure, the shells might be slow, esp. Bash, but launching external programs uselessly and re-reading the sama data over and over again is going to make it even slower, probably by orders of magnitude. Why not just use while read when that's actually a shell builtin and doesn't require starting piles if new processes? – ilkkachu Mar 10 '23 at 07:20

For loop and reading a text file

3 Answers3