1

I know this question has been asked and answered before, I have tried the code but I do not get a correct output.

I have 2 folders: vanila1 and vanila2, each has 400 files with similar names

ls vanila1
MB.2613.007_0021.ED4_KS1A29-7_338_all
MB.2613.007_0022.ED9_SD2A27-1_180_all
MB.2613.007_14.ED14_IA2A35-2_310_all

ls vanila2
MB.2613.007_0021.ED4_KS1A29-7_338_all
MB.2613.007_0022.ED9_SD2A27-1_180_all
MB.2613.007_14.ED14_IA2A35-2_310_all

I want to combine files with identical names and I am using this:

ls vanila1 | while read FILE; do
  cat vanila1/"$FILE" vanila2/"$FILE" >> all_combined/"$FILE"
done

I do not get a correct output, the number of lines in combined file is more that the sum of file1 and file 2. Am I doing something wrong?

peterh
  • 9,731
Anna1364
  • 1,026

2 Answers2

1

I have a hunch that you may have run your loop more than once, and since you use the >> redirection operator, which appends data, your result files grows every time.

Instead (and here I'm avoiding using ls too, see the discussion in "Why *not* parse `ls`?" for reasons):

for name in vanila1/*; do
    base_name=${name##*/}

    if [ -f "vanila2/$base_name" ]; then
        cat "$name" "vanila2/$base_name" >"all_combined/$base_name"
    else
        printf 'No file in vanila2 corresponds to "%s"\n' "$name" >&2
    fi
done

The variable substitution ${name##*/} transforms a pathname like vanila1/MB.2613.007_0021.ED4_KS1A29-7_338_all into just MB.2613.007_0021.ED4_KS1A29-7_338_all, i.e. it removes all things before the /, including the slash (this is the filename component of the pathname, or "the basename"). This may be replaced by $(basename "$name").

If there is a file in vanila2 corresponding to the name picked up from vanila1, the two are concatenated and put into the all_combined directory. If not, there is a diagnostic message about this fact.

By using > rather than >>, any existing file in all_combined with the same name will be replaced rather than appended to.


If you have other files or directories in vanila1, then you may want to modify the pattern vanila1/* in the loop to something that matches only the files that you are interested in, for example vanila1/*_all or similar.

Kusalananda
  • 333,661
  • thanks so much for the code and very helpful explanations on it. There is one little question in your code that I could not understand as I am learning programming: if [ -f "vanila2/$base_name" ], what does this part do? I mean -f? – Anna1364 Feb 15 '18 at 20:15
  • @Anna1364 The test [ -f "filename" ] will be true if there exists a file whose name is filename. In the code I using a similar test to check whether vanila2/$base_name corresponds to an existing file. The -f test chocks specifically for an existing regular file, while other tests like -d checks whether the given name is that of an existing directory. See man test. – Kusalananda Feb 15 '18 at 20:55
-1

So you have files with identical names in two directories, and where both files are present you with to concatenate them?

for file in dir1/*; do
   otherfile="$(basename "$file")"
   if [[ -r dir2/"${otherfile}" ]]; then
       cat "$file" dir2/"$otherfile" >> combined/"$otherfile"
   fi
done
DopeGhoti
  • 76,081
  • 1
    Your answer doesn't address the questioner's main point, which was to figure out why his/her result is seemingly longer than the sum of its parts. – user1404316 Feb 14 '18 at 19:53