1

I am trying to run a python function on 100s of files using a for loop as follows. It seems like the process is very slow. I was wondering if there is a way to make the process faster as I am using HPC computers with GPU.

#!/usr/bin/env bash

FILES="/directory_with_files/*.ply" for i in $FILES; do python3 function.py -file $i --odir /output_directory --verbose; done

Sher
  • 113
  • 2
    This won't affect your speed, but you have a couple of bad practices there: i) don't use CAPS for your variable names in shell/bash scripts since by convention, global environment variables are capitalized and if your variables are also in caps that can lead to naming collisions and weird bugs; ii) you are not quoting your variable which opens you to other dangers, but you don't even need the variable here: for i in /directory_with_files/*.ply is fine. – terdon Nov 01 '22 at 16:14

1 Answers1

1

A GPU won't help you unless the program itself already utilizes it, but my go-to tool for this is GNU Parallel. It has many flags but I guess your command would be something like this:

$ find /directory_with_files -name '*.ply' | parallel "python3 function.py -file {} --odir /output_directory --verbose"

It will run as many jobs in parallel as it deems proper, which should be one per core by default.

Rumour has it that it should be possible to run remote jobs using this over SSH as well, but I have never tried it.

pipe
  • 920