1

How many threads should be used to process a million files? How yould you justify your answer? This is a question from an OS exam from last year and I'm courious how you guys think. I think that 10.000 threads and each one of them to process 100 files would be a good ratio.

  • 3
    How many CPUs in the server ? How long does processing of an individual file take ? How much memory does the act of processing a file consume ? How much memory in the server ? – steve Jun 21 '20 at 20:03
  • 1
    If you add some reasoning for choosing 10,000 to your question, it would make it easier to discuss your choices. Is there any extra detail in the question we don't have -- for example whether the files are in one directory, one partition, one device? – Paul_Pedant Jun 21 '20 at 20:26
  • The question is only the first one I asked. This is why I am confused what ratio to choose. – Andrei Gabor Jun 21 '20 at 21:21
  • But please share with us your reasoning. You will not learn anything if all people just gave numbers like you do without spending a single word why they choose those – planetmaker Jun 22 '20 at 07:52
  • One up to the number of CPUs/cores, if you use I/O Completion Ports (proactor pattern). With such an open-ended question it's likely that the idea is that you reason about what circumstances warrant which number of threads ... – 0xC0000022L Jun 22 '20 at 08:08

1 Answers1

2

Usually I/O is the limit. It does not make sense to have so many threads that they are waiting for I/O.

You might define the optimum ratio so that n CPU cores are working full time and I/O is at 100%. The optimum number of threads is then defined by the ratio of the time it takes to process a file to the time it takes to read the input and write the output.

Examples:

  • If it takes longer to read and write a file than to process it then one thread would be enough. It may make sense to have a second thread / process to ensure that there are always I/O requests available. That second thread should run at idle I/O priority, though.
  • If processing a file takes ten times as long as the I/O for this file then ten threads would be the optimum.
Hauke Laging
  • 90,279
  • 3
    Does your first bullet mean to read If it takes longer to read and write a file than to process it then...? – user1717828 Jun 22 '20 at 01:51
  • @user1717828 The text was changed several times before publishing. That probably introduced this error. Thanks for pointing it out. – Hauke Laging Jun 22 '20 at 07:51