1

I have a text file with two columns and more than 3,00,000 rows. Format is as below

Filename1.txt Num1
Filename2.txt Num2
Filename3.txt Num3

I want to copy all the filenames for which the corresponding Numx is greater than 50 and less than 200 into a different file.

Once I copy those file names into a different file, I want to copy all of those files into a different folder.

How do I do that?

heemayl
  • 56,300
Innocent
  • 83
  • 1
  • 1
  • 4

3 Answers3

3

If you want you can do the comparison and copying at the same time with awk:

awk '$2>50 && $2<200 {system("cp -- "$1" /path/to/destination/")}' file.txt

Assuming you want to copy the files to destination directory, change this to meet your need.

  • $2>50 && $2<200 does the required comparison

  • if matches, then the cp operation is executed ({system("cp -- "$1" /path/to/destination/")}), done by the system() function of awk

heemayl
  • 56,300
1

Let's consider this test file:

$ cat file
Filename1.txt 49
Filename2.txt 72
Filename3.txt 189
Filename4.txt 203

To select only those files for which the second column is greater than or equal to 50 and also less than or equal to 200:

$ awk '$2>=50 && $2<=200 { print $1}' file
Filename2.txt
Filename3.txt

To put those file names in a new file at some path:

awk '$2>=50 && $2<=200 { print $1}' file >/path/to/newfile

Copying the selected files

Assuming that the numbers are integers, try:

while read fname num; do [ "$num" -ge 50 ] && [ "$num" -le 200 ] && cp -- "$fname" /some/path/ ; done <file

Or, for those who prefer their code spread over multiple lines:

while read fname num
do
   [ "$num" -ge 50 ] && [ "$num" -le 200 ] && cp -- "$fname" /some/path/
done <file
John1024
  • 74,655
0

The question is tagged and , so I assume there is interest in an answer that uses regular expressions. Also the question indicates the input data file is large and so I assume that performance is a consideration.

I also assume that given that the input file contains one filename per line that there will be no (pathological) filenames that contain newline characters.

The other answers effectively spawn a cp process for every file. This causes unnecessary performance reduction. Instead we can use the facilities of xargs to call cp with as many filenames as it can fit on a command line.

sed -rn 's/ (5[1-9]|[6-9].|1..)$//p' input.txt | tr '\n' '\0' | xargs -0 cp -t /destdir

The sed uses a regular expression to match the closed numerical interval (50, 200). Using regular expressions for numerical inequalities is not always the most elegant thing to do, but in this case the required expression is fairly straightforward.

We are assuming that the filenames contain no newlines, but they may contain other unhelpful characters, such as spaces. xargs will handle this correctly if given \0-delimited data, so we use tr to convert all newlines to null characters.

The above assumes the GNU versions of sed and xargs. If instead you have BSD versions (e.g. OSX), then the command is slightly different:

sed -En 's/ (5[1-9]|[6-9].|1..)$//p' input.txt | tr '\n' '\0' | xargs -0 -J {} cp {} /destdir

These commands will spawn exactly one copy of sed, tr and xargs. There will be more than one spawn of cp, but each one will copy multiple files - xargs will attempt to fill up each cp command line to acheive efficient utilisation. This should provide a significant performance improvement over the other answers when the input data is large.