0

Given that R, Python and many other open source libraries used for stats have better support in Linux than Windows/OSX (rPy comes to mind), I find it odd that no one has asked this question before. So I do now:

What Linux distribution people doing stats/data analysis/Machine Learning prefer/recommend?

P.S.: I feel a bit embarrassed asking that, since by using Python's and R's inbuilt package management I should theoretically not be experiencing any conflicts with the base system. :P

dmvianna
  • 407
  • possible duplicate: http://unix.stackexchange.com/questions/76059/best-linux-distribution-for-scientific-computing – slm May 21 '13 at 13:43
  • Take a look at the duplicate link I just posted, the distros listed in my answer, http://unix.stackexchange.com/a/76081/7453, which includes several good suggestions for scientific/mathematic distros. – slm May 21 '13 at 13:44
  • Thank you, @slm. Your link shows which distros market themselves as 'scientific'. However, I posted this question to see what people actually use. It may or may not be the same distros (I suppose it isn't). – dmvianna May 21 '13 at 13:49
  • 3
    @dmvianna there is no "the best gnu/linux distro for a given real task". the best distro is always the one that you (or the man who will setup it for you) know better then other. otherwise it's just a waste a time and technology for technolgy instead of technology for task. – rush May 21 '13 at 14:01
  • 1
    @dmvianna It would be non-constructive as such because I have used Debian, Ubuntu, RedHat, Fedora for this exact purpose with success. The other ones are listed in the other question, so given proper kernel tuning and hardware configuration you might be able to do this with almost all distributions. – Karlson May 21 '13 at 14:01
  • @dmvianna I've had my hand on RedHat, Ubuntu and Mint Mate. Since you talked about Python & Ruby (The same thing I do these days) I felt more comfortable with Ubuntu for scientific programming. I could install Python packages easily through Ubuntu and it was fast and flexible dealing with large amount of data. But you definitely should consider your hardware. – Vynylyn May 27 '15 at 11:23

1 Answers1

6

I think what you'll find is the under the hood distro doesn't matter. Especially if you're using R and Python.

Typically people manage there own version of Python using virtualenv or virtualenvwrapper and install the various packages they want into that, rather than try and co-exist with the distro's Python.

Most of the programming languages like Perl, Python, Ruby, and R provide this management layer now. Ruby has rvm, Perl has perlbrew, and R has Renv.

In addition they provide their own package management layer for installing the various libraries and tools systematically so the distro is really no importance with respect to these types of tools.

Examples

On my laptop right now I have several versions of Ruby installed:

$ rvm list

rvm rubies

   ruby-1.9.2-head [ x86_64 ]
   jruby-1.5.6 [ amd64-java ]
   ruby-1.9.2-p290 [ x86_64 ]
=> ruby-1.9.2-p180 [ x86_64 ]
   ree-1.8.7-2011.03 [ x86_64 ]

I'm currently setup to use ruby-1.9.2-p290:

$ which ruby
~/.rvm/rubies/ruby-1.9.2-p180/bin/ruby

This version has several gems (libraries) installed with it as well:

$ gem list|head -10
abstract (1.0.0)
actionmailer (3.0.10, 3.0.5)
actionpack (3.0.10, 3.0.5)
activemodel (3.0.10, 3.0.5)
activerecord (3.0.10, 3.0.5)
activeresource (3.0.10, 3.0.5)
activesupport (3.0.10, 3.0.5)
akami (1.2.0)
albino (1.3.3)
anemone (0.7.2)

Most of the management layers provide the same features as this. Here's perlbrew for example:

$ perlbrew list
  local (5.14.0)
* perl-5.14.0

$ which perl
~/apps/perl5/perlbrew/perls/perl-5.14.0/bin/perl

Python & R are no different. The advantage to managing the environment this way is that my installations are all maintained in my home directory, so I can move them from machine to machine and keep them with my work, rather than waste my time managing the distro itself for these resources.

slm
  • 369,824