How many threads should be used to process a million files? How yould you justify your answer? This is a question from an OS exam from last year and I'm courious how you guys think. I think that 10.000 threads and each one of them to process 100 files would be a good ratio.
Asked
Active
Viewed 545 times
1
-
3How many CPUs in the server ? How long does processing of an individual file take ? How much memory does the act of processing a file consume ? How much memory in the server ? – steve Jun 21 '20 at 20:03
-
1If you add some reasoning for choosing 10,000 to your question, it would make it easier to discuss your choices. Is there any extra detail in the question we don't have -- for example whether the files are in one directory, one partition, one device? – Paul_Pedant Jun 21 '20 at 20:26
-
The question is only the first one I asked. This is why I am confused what ratio to choose. – Andrei Gabor Jun 21 '20 at 21:21
-
But please share with us your reasoning. You will not learn anything if all people just gave numbers like you do without spending a single word why they choose those – planetmaker Jun 22 '20 at 07:52
-
One up to the number of CPUs/cores, if you use I/O Completion Ports (proactor pattern). With such an open-ended question it's likely that the idea is that you reason about what circumstances warrant which number of threads ... – 0xC0000022L Jun 22 '20 at 08:08
1 Answers
2
Usually I/O is the limit. It does not make sense to have so many threads that they are waiting for I/O.
You might define the optimum ratio so that n CPU cores are working full time and I/O is at 100%. The optimum number of threads is then defined by the ratio of the time it takes to process a file to the time it takes to read the input and write the output.
Examples:
- If it takes longer to read and write a file than to process it then one thread would be enough. It may make sense to have a second thread / process to ensure that there are always I/O requests available. That second thread should run at idle I/O priority, though.
- If processing a file takes ten times as long as the I/O for this file then ten threads would be the optimum.

Hauke Laging
- 90,279
-
3Does your first bullet mean to read If it takes longer to read and write a file than to process it then...? – user1717828 Jun 22 '20 at 01:51
-
@user1717828 The text was changed several times before publishing. That probably introduced this error. Thanks for pointing it out. – Hauke Laging Jun 22 '20 at 07:51