42

As of November 2010, Linux is used on 459 out of the 500 supercomputers of the TOP500. Refer to the table via Internet Archive.

What are the reasons behind this massive use of Linux in the supercomputer space?

orftz
  • 691
  • 4
    And 19 more are Unix and 16 mixed leaving 1 BSD and 5 Windoze :) – Caleb Jun 04 '11 at 21:57
  • 2
  • 27
    'Why is air commonly used for breathing?' I am amazed that anyone would want to build a supercomputer and then put Windows on it. What are the reasons for that? A really big Excel spreadsheet? Millions of layers in Photoshop? Quickly scanning pron collection with Norton Anti-virus? Playing Crysis with all options on? – Mathew Jun 05 '11 at 01:37
  • 7
    @Mathew Probably that last one. – Maxpm Jun 05 '11 at 02:07
  • 3
    What I find most curious is... what's up with that BSD computer? – Ishpeck Jun 05 '11 at 04:19
  • 1
    Microsoft charges per core for Windows - the cost to license it for a supercomputer would be tremendous – Kyle Cronin Jun 05 '11 at 05:57
  • 1
    @Ishpeck: What's wrong with BSD? It's got a better security record that Linux, at least. Less hardware support though. – Billy ONeal Jun 05 '11 at 06:09
  • 1
    @Billy ONeal: Yeah, I run OpenBSD at home. It's a great system. It's just curious that there's only one... I expected the BSD count to be either 20% or nothing. – Ishpeck Jun 05 '11 at 16:27
  • @op, your link is dead. And as of 2017 its 99%. https://itsfoss.com/linux-supercomputers-2017/ – john-jones Nov 28 '17 at 21:35
  • As of 2019 the statistic is 100%. From the same source as the 2010 numbers in the question, 500 out of 500 of the top supercomputers are now Linux based. – Caleb Nov 04 '19 at 20:23

4 Answers4

40
  • Linux has wide support for lots of different hardware architectures and platforms from tiny embedded boards to massive computing arrays. While other good kernels are available, the coverage and quality of hardware drivers available for Linux far surpass any other platform.
  • The Linux kernel source is open and can easily be modified to run on various custom platforms. For any vendor creating a new piece of hardware, providing Linux drivers is one of the easiest ways to make it accessible. They don't have to work from scratch because they can modify existing drivers for similar pieces of hardware and build on their success.
  • Some of the other OS candidates rack up licensing fees per-CPU. Those become prohibitive at the supercomputer level.
  • Since Linux has been used by everybody in this space before, it has the best support and the widest selection of available software packages and libraries.
Caleb
  • 70,105
  • 1
    Are you sure about all these reasons or are you guessing? If you are guessing, at least indicate so. Either that, or mention where you got this info, or even provide links if you have any. – tshepang Jun 04 '11 at 22:37
  • 6
    's assertions are fairly self-evident. Here's what IBM said about using Linux on their BlueGene supercomptuers which backs up at least the openness of the kernel reason. – Andrew Lambert Jun 04 '11 at 23:23
  • 1
    Yeah the open source and driver base is probably what really sets Linux apart from everything else. There are plenty of capable open source kernels out there -- but without the large base of hardware support. I see this as the principle reason to go with a Linux kernel. – Ishpeck Jun 05 '11 at 04:21
  • 1
    Also, over the last years Linux has been carefully optimized for supercomputers by IBM and others. That BlueGene article is from 2002. – starblue Jun 05 '11 at 08:49
  • 2
    Linux also gained a lot of its supercomputer capability from the integration of SGI's NUMALink technology via the MIPS and Itanium2 architecture trees. One of the first large multiprocessor systems booted with Linux was a 32-cpu Origin 2000. It was later booted on a 128-cpu Origin 2000, and held that record for over two years. Source. – Kumba Jun 05 '11 at 22:52
  • This is a good answer, but the whole Q could use an update (cf. "summit") I only miss that "supervisor" keyword. –  Nov 04 '19 at 14:27
  • @rastafile The only real update since 2010 is that the stats on Linux usage in the the top 500 super computers have gone from 459 out of 500 to 500 out of 500 (2019). – Caleb Nov 04 '19 at 20:21
  • @Caleb yes...I have been studying RISC a bit lately, on wikichip.org etc. And seen a time lapse video how they built "summit", the 256 racks. It is not a cluster, it is really one ("Red Hat"-) kernel running it. Just broke that Exaflop world record. THAT is a "thread-ripper" machine..."mellanox" plays an important role in this supervisor/interconnect thing. Linux is "just" the flexible scalable OS to run on top - the thread-maker. –  Nov 04 '19 at 20:34
20

I work in the HPC industry.

If you're asking why most people today use Linux on their cluster, it's what you listed in your question: more than 90% of the biggest clusters run Linux. It's the de-facto standard - almost any cluster library, tool or application is ready-to-run on Linux. It is more work to setup a cluster using any other operating system.

If you're asking how Linux became the de-facto standard, then Caleb has the answers ;)

i_grok
  • 583
14

For almost any question of the form: "Why is x the predominant choice in the y market segment?" the answers cluster around two factors.

At some critical juncture during the emergence and growth of that market segment or niche the product in question had some advantages in cost and features which encouraged its adoption by a critical mass. Once that critical mass has been achieved then all of the ancillary products for that segment will support it and all of the key personnel in that industry/niche will be familiar with it as the premier choice.

At some point back in the '90s Donald Becker released some code and information regarding the Beowulf cluster that he and Thomas Sterling had built for a project at NASA. This used commodity hardware, running Linux and incorporating the MPI (message passing interface) and PVM (parallel virtual machine) libraries for distribution of computational tasks across a network of nodes.

At the time the alternatives required much more expensive hardware (mostly Sun workstations), had proprietary software licensing with per/node or per/CPU costs, and typically were closed source or had significant closed source components.

Thus Linux had advantages in all three of these factors. That Becker released some code and documentation (and did so under a cool name) gave Linux a tremendous boost in credibility for that sort of supercomputing application. (That it was used by a project at NASA was also a huge boost to its credibility).

From there colleges and universities picked up the approach for their own labs. Within a couple years after that an entire generation of scientists were familiar with Beowulf clusters and a wide array of tools were readily available to support many applications across them.

6

One more reason. In the old days for serious work there were no Linux, no Windows, but UNIX and VMS (MSDOS and similar were not contenders, they lacked too many features), and maybe few less known things like lisp machines...

Of those, only UNIX-derived platforms survived. And Linux was a cheap alternative for UNIX-like OSes: more-or-less compatible, open source and free. This made it possible to reuse scientific software that was written before Linux.

liori
  • 630