I want to create 100 virtual servers. They will be used for testing, so they should be easy to create and destroy.
- They must be accessible through SSH from another physical machine (I provide the public ssh-key)
- They must have their own IP-address and be accessible from another physical host as
ssh I.P.n.o
e.g.ssh 10.0.0.99
(IPv4 or IPv6, private address space OK, port-forwarding is not - so this may involve setting up a bridge) - They must have basic UNIX tools installed (preferably a full distro)
- They must have /proc/cpuinfo, a root user, and a netcard (This is probably only relevant if the machine is not fully virtualized)
- Added bonus if they can be made to run an X server that can be connected to remotely (using VNC or similar)
What is the fastest way (wall clock time) to do this given:
- The host system runs Ubuntu 20.04 and has plenty of RAM and CPU
- The LAN has a DHCP-server (it is also OK to use a predefined IP-range)
- I do not care which Free virtualization technology is used (Containerization is also OK if the other requirements are met)
and what are the actual commands I should run/files I should create?
I have the feeling that given the right technology this is a 50 line job that can be set up in minutes.
The few lines can probably be split into a few bash functions:
install() {
# Install needed software once
}
setup() {
# Configure the virtual servers
}
start() {
# Start the virtual servers
# After this it is possible to do:
# ssh 10.0.0.99
# from another physical server
}
stop() {
# Stop the virtual servers
# After there is no running processes on the host server
# and after this it is no longer possible to do:
# ssh 10.0.0.99
# from another physical server
# The host server returns to the state before running `start`
}
destroy() {
# Remove the setup
# After this the host server returns to the state before running `setup`
}
Background
For developing GNU Parallel I need an easy way to test running on 100 machines in parallel.
For other projects it would also be handy to be able to create a bunch of virtual machines, test some race conditions and then destroy the machines again.
In other words: This is not for a production environment and security is not an issue.
Docker
Based on @danielleontiev's notes below:
install() {
# Install needed software once
sudo apt -y install docker.io
sudo groupadd docker
sudo usermod -aG docker $USER
# Logout and login if you were not in group 'docker' before
docker run hello-world
}
setup() {
# Configure the virtual servers
mkdir -p my-ubuntu/ ssh/
cp ~/.ssh/id_rsa.pub ssh/
cat ssh/*.pub > my-ubuntu/authorized_keys
cat >my-ubuntu/Dockerfile <<EOF
FROM ubuntu:bionic
RUN apt update &&
apt install -y openssh-server
RUN mkdir /root/.ssh
COPY authorized_keys /root/.ssh/authorized_keys
run blocking command which prevents container to exit immediately after start.
CMD service ssh start && tail -f /dev/null
EOF
docker build my-ubuntu -t my-ubuntu
}
start() {
# start container number x..y
servers_min=$1
servers_max=$2
testssh() {
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/known root@"$1" echo "'$1'" '`uptime`'
}
export -f testssh
setup_bridge() {
# OMG why is this so hard
# Default interface must have IP-addr removed
# bridge must have IP-addr + routing copied from $dif, so it takes over default interface
# Why on earth could we not just: brctl addif dock0 $dif - and be done?
default_interface=$(ip -4 route ls | grep default | grep -Po '(?<=dev )(\S+)')
dif=$default_interface
gw=$(ip -4 route ls | grep default | grep -Po '(?<=via )(\S+)')
dif_ip=$(ip -4 route ls | grep default | grep -Po '(?<=src )(\S+)')
echo Add bridge
docker network create --driver bridge --subnet=172.20.0.0/16 --opt com.docker.network.bridge.name=dock0 net0
# $dif must be up, but with no ip addr
sudo ip addr flush dev $dif
sudo brctl addif dock0 $dif
sudo ifconfig dock0:ext $dif_ip
sudo route add -net 0.0.0.0 gw $gw
}
# Start the containers
startone() {
id=$1
net=$2
docker run -d --rm --name ubuntu-$id-$net --network $net my-ubuntu
docker inspect ubuntu-$id-$net
}
export -f startone
setup_bridge
echo Start containers
seq $servers_min $servers_max | parallel startone {} net0 |
# After this it is possible to do:
# ssh 10.0.0.99
# from another physical server
perl -nE '/"IPAddress": "(\S+)"/ and not $seen{$1}++ and say $1' |
# Keep a list of the IP addresses in /tmp/ipaddr
tee /tmp/ipaddr |
parallel testssh
docker ps
route -n
}
stop() {
# Stop the virtual servers
# After there is no running processes on the host server
# and after this it is no longer possible to do:
# ssh 10.0.0.99
# from another physical server
# The host server returns to the state before running start
echo Stop containers
docker ps -q | parallel docker stop {} |
perl -pe '$|=1; s/^............\n$/./'
echo
echo If any containers are remaining it is an error
docker ps
# Take down bridge
docker network ls|G bridge net|field 1| sudo parallel docker network rm
# Re-establish default interface
dif=$default_interface
sudo ifconfig $dif $dif_ip
# Routing takes a while to be updated
sleep 2
route -n
}
destroy() {
# Remove the setup
# After this the host server returns to the state before running setup
rm -rf my-ubuntu/
docker rmi my-ubuntu
}
full() {
install
setup
start
stop
destroy
}
$ time full
real 2m21.611s
user 0m47.337s
sys 0m31.882s
This takes up 7 GB RAM in total for running 100 virtual servers. So you do not even need to have plenty of RAM to do this.
It scales up to 1024 servers after which the docker bridge complains (probably due to Each Bridge Device can have up to a maximum of 1024 ports).
The script can be adapted to run 6000 containers (Run > 1024 docker containers), but at 6055 it blocks (https://serverfault.com/questions/1091520/docker-blocks-when-running-multiple-containers).
Vagrant
Based on @Martin's notes below:
install() {
# Install needed software once
sudo apt install -y vagrant virtualbox
}
setup() {
# Configure the virtual servers
mkdir -p ssh/
cp ~/.ssh/id_rsa.pub ssh/
cat ssh/*.pub > authorized_keys
cat >Vagrantfile <<'EOF'
Vagrant.configure("2") do |config|
config.vm.box = "debian/buster64"
(1..100).each do |i|
config.vm.define "vm%d" % i do |node|
node.vm.hostname = "vm%d" % i
node.vm.network "public_network", ip: "192.168.1.%d" % (100+i)
end
end
config.vm.provision "shell" do |s|
ssh_pub_key = File.readlines("authorized_keys").first.strip
s.inline = <<-SHELL
mkdir /root/.ssh
echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
apt-get update
apt-get install -y parallel
SHELL
end
end
EOF
}
start() {
testssh() {
ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@"$1" echo "'$1'" 'uptime
'
}
export -f testssh
# Start the virtual servers
seq 100 | parallel --lb vagrant up vm{}
# After this it is possible to do:
# ssh 192.168.1.111
# from another physical server
parallel testssh ::: 192.168.1.{101..200}
}
stop() {
# Stop the virtual servers
# After there is no running processes on the host server
# and after this it is no longer possible to do:
# ssh 10.0.0.99
# from another physical server
# The host server returns to the state before running start
seq 100 | parallel vagrant halt vm{}
}
destroy() {
# Remove the setup
# After this the host server returns to the state before running setup
seq 100 | parallel vagrant destroy -f vm{}
rm -r Vagrantfile .vagrant/
}
full() {
install
setup
start
stop
destroy
}
start
gives a lot of warnings:
NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
stop
gives this warning:
NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm.rb:354: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm_provisioner.rb:92: warning: The called method `add_config' is defined here
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/errors.rb:103: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/i18n-1.8.2/lib/i18n.rb:195: warning: The called method `t' is defined here
Each virtual machine takes up 0.5 GB of RAM on the host system.
It is much slower to start than the Docker machines above. The big difference is that the Vagrant-machines do not have to run the same kernel as the host, but are complete virtual machines.
replicas
in docker-compose that's a one-liner once the configuration file is written, both for creation and deletion. Pick any distro image you like. You can runXvfb
(but that will use up more memory, and don't know how much "plenty of RAM" is). But both for GNU parallel and race condition tests you don't have to. I'll leave it as an exercise for you to figure out the actual commands. – dirkt Jun 04 '20 at 16:21systemd-nspawn
? it should suit the job well and also be fairly simple and fast to setup on a ubuntu 20.04 host – LL3 Jun 07 '20 at 13:59vagrant up
(spin up VMs serially)? – Martin Konrad Jun 10 '20 at 05:05--parallel
for the Virtualbox provider. I guess that makes the Virtualbox approach less useful for you. – Martin Konrad Jun 14 '20 at 14:47