-5

I noticed that the operator ">>" doesn't work well in my script and I don't know why. I have a script like this:

for file in $(ls folder)`
do
  echo $file >> text.txt
done

Into folder I got 91 elements, but only the first 87 elements were inserted into file.txt. I can't figure out what is wrong with this code, can anybody help me to understand, please?

EDIT

The script I wrote up there is very simplified, but I understood that it don't give a clear picture of the situation. So, here more details:

Into my folder I got 91 csv files that contains each one two columns: name and value. For every file I need to control if this value is greater than 2.500 and if is not 0.000. If one of this 2 condition is true, I append the file name and its value to a txt file that contains my discarded files, otherwise I append the file name and its value in a csv file that contains my chosen files. The code I use to control the value works well, but when I use >> to append the results in the txt or csv file, the last four results aren't appended and I can't understand why.

for file in $(ls folder)
do
     value=$(cat path/$file | awk -F, '{print $2}')
     discard=$(awk -v num1="$value" 'BEGIN { if (num1 > 2.500) print 1; else if (num1 == 0.000) print 0; else print 2 }')
 if [[ $discard -eq 1 || $discard -eq 0 ]]
 then
      echo ""$file" has value="$value"" >> path/discard.txt
      rm path/"$file"
 else
      echo ""$file",$value" >> path/selected.csv
      rm path/"$file"
 fi

done

This is a more complete version of my script.

EDIT 2

I corrected my script to fix the issue you've find in it. Still same problem. To be more clear, the files in my folder are csv files automatically generated by a program and they contains only a row with 2 columns: an ID and a float value. They all are very similar, so the problem isn't in there, also because I can see from terminal the script recognizes them and processes them well. I still don't know why append doesn't put the last 4 four files into the txt file.

for file in folder/*
do
     value=$(cat "$file" | awk -F, '{print $2}')
     discard=$(awk -v num1="$value" 'BEGIN { if (num1 > 2.500) print 1; else if (num1 == 0.000) print 0; else print 2 }')
 if [[ "$discard" -eq 1 || "$discard" -eq 0 ]]
 then
      echo ""$file" has value="$value"" >> path/discard.txt
      rm -- "$file"
 else
      echo ""$file",$value" >> path/selected.csv
      rm -- "$file"
 fi

done

Reducing the number of files, it works perfectly by the way.

  • 5
  • I edited my question to make it more complete – BelBillo007 Sep 22 '23 at 14:12
  • @KamilMaciorowski thank you for the link, it contains a lot of useful information. Unfortunately, none of this resolved my problem. – BelBillo007 Sep 22 '23 at 14:20
  • @KamilMaciorowski Also useless use of ls – Vilinkameni Sep 22 '23 at 15:16
  • @BelBillo007 Related: https://unix.stackexchange.com/questions/128985/why-not-parse-ls-and-what-to-do-instead – Vilinkameni Sep 22 '23 at 15:17
  • @KamilMaciorowski I tried to eliminate the ls command from my code, but the appending process still present the same issue I describe in my question – BelBillo007 Sep 22 '23 at 15:30
  • @Vilinkameni as I wrote on the last comment, I tried to eliminate ls command, but the issue is still there. The problem is not ls, is the append operator that doesn't work as it should – BelBillo007 Sep 22 '23 at 15:32
  • The append operator is fine. The problem won't be there. Can you give us a minimal reproducible example? Something we can run on our machines to test? Try with just one or two files and show that it doesn't work. – terdon Sep 22 '23 at 15:37
  • Not only is parsing the output of ls problematic, many other things can influence what echo does. The command echo is unreliable and depends not only on what is in the variable file, but also on the flavor of Unix and the shell used itself. – Vilinkameni Sep 22 '23 at 15:37
  • The last four? What are the values there? What happens if you try that in a directory with just the four? Can you post the files that exhibit the issue? Better yet, what does cat -A file or od -c file print for those files? – ilkkachu Sep 22 '23 at 19:27
  • While it's true that you don't need ls there, the issues from using it would show up as error messages when the script would try to open files with wrong names. Similarly for echo, it should be easy to notice if the output was corrupt (and the issue would really only come if the data contains backslashes, or if you had filenames like -e foo, with the space.) – ilkkachu Sep 22 '23 at 19:31
  • In any case, when your code doesn't work, the way to break it into smaller pieces, until you find the smallest possible piece that doesn't work. Then fix that (or post about it online). We can't really do that for you here, since we don't have your data. 91 files is a bit too much to post for someone else to look at, but not really too outlandish to check manually if you want to figure out what goes wrong. – ilkkachu Sep 22 '23 at 19:35
  • for file in folder/* and double quote your variables when you use them – Chris Davies Sep 22 '23 at 20:11
  • "Reducing the number of files, it works perfectly by the way." -- So, how did you reduce the number of files? I mean, which files did you remove? Because I suppose you're not saying that if you create 91 identical files, e.g. with for i in {1..91}; do echo "1,2" > folder/file$i.txt; done, then it doesn't work, but with 87 identical ones, it does? Or are you? (I just tried the exact script you posted with that exact command, it works ok and produces 91 lines of output.) I would still suggest looking very closely at the actual data. Or heck, just post the problematic set of files somewhere. – ilkkachu Sep 24 '23 at 17:43
  • 4
    I can't figure out what is wrong : You can! For debugging, put a set -x into your script and analyze the trace. This would also help you providing a simple reproducible example. – user1934428 Sep 26 '23 at 06:45
  • Whatever the problem is, if reducing the number of input files or changing the output file suffix makes it go away then you are just hiding the problem, not solving it, and it will come back to bite you sooner or later, probably sooner. Debug and fix it now while it's easy for you to reproduce. – Ed Morton Sep 27 '23 at 13:31

2 Answers2

3

There are two main issues here:

  1. Never do for file in $(ls). This is also known as bash pitfall number 1. First, it is fragile and, as you have seen, breaks on even slightly strange file names. What's worse, it isn't needed at all, it just makes the code more fragile and more complicated than it needs to be. You can just do for file in *.

  2. Always quote your variables. This is doubly important when dealing with file names that can contain any character except \0. and /. If you don't quote a variable, it will be expanded by the shell, so if it contains a space, the shell will see two values instead of one. For example:

To illustrate, consider a directory with these two files:

$ touch 'a file2' 'file1 *'
$ ls -l
total 0
-rw-r--r-- 1 terdon terdon 0 Sep 22 17:39 'file1 *'
-rw-r--r-- 1 terdon terdon 0 Sep 22 17:39 'a file2'

Now, if we try your original code, we get:

$ for file in $(ls); do echo $file; done
a
file2
file1
a file2
file1 a file2 file1 *

This is because, since you are not quoting, the shell receives a, file2, and then file1 and then *. So it treats each of them separately and at the end runs echo * which prints everything in the directory again. However, if you do it the right way, it all works:

$ for file in *; do echo "$file"; done
a file2
file1 *

Here's what should be (test it first, I cannot see your data) a working version of your script:

for file in folder/*
do
     value=$(awk -F, '{print $2}' "$file")
     discard=$(awk -v num1="$value" 'BEGIN { if (num1 > 2.500) print 1; else if (num1 == 0.000) print 0; else print 2 }')
 if [[ $discard -eq 1 || $discard -eq 0 ]]
 then
      printf '"%s" has value="%s"' "$file" "$value" >> path/discard.txt
      rm -- "$file"
 else
      printf '"%s",%s' "$file" "$value" >> path/selected.csv
      rm -- "$file"
 fi

done

terdon
  • 242,166
  • 1
    using one awk instance to fetch a value, and then another to compare it is still silly, though. Plus the way it's written, that looks like it'll break if the file has multiple lines. – ilkkachu Sep 23 '23 at 07:46
  • Thank you for your suggestion. I've corrected all the issues, but the problem still there. I don't know why, append doesn't work well. – BelBillo007 Sep 24 '23 at 12:31
  • @BelBillo007 it is really extremely unlikely that the problem is >>. That would mean you and only you have a bug in one of the oldest, most stable and basic aspects of the shell. Are the csv files generated by a tool running on Windows, perhaps? Also, What happens if you add echo "V: $value : $discard" before the if? Do you see the values you expect to see? – terdon Sep 25 '23 at 08:28
-3

I just tried using a CSV file instead of a TXT file for the discarded elements. Now the script works well.

for file in folder/*
do
     value=$(cat "$file" | awk -F, '{print $2}')
     discard=$(awk -v num1="$value" 'BEGIN { if (num1 > 2.500) print 1; else if (num1 == 0.000) print 0; else print 2 }')
 if [[ "$discard" -eq 1 || "$discard" -eq 0 ]]
 then
      echo ""$file" has value="$value"" >> path/discard.csv
      rm -- "$file"
 else
      echo ""$file",$value" >> path/selected.csv
      rm -- "$file"
 fi

done

Thank you to all for the help and the advices!

  • 2
    This seems to be mostly a copy of the code in terdon's answer but with broken quoting that leaves some variables unquoted in the calls to echo. – Kusalananda Sep 26 '23 at 14:23
  • 3
    But this doesn't tell what the issue was. Only that you stabbed in the dark and something happened to hit. There's very little reason the filename itself should matter. Not much joy of learning there for anyone, but I suppose if it works for you now, you're not too interested in digging into it further. – ilkkachu Sep 26 '23 at 17:11