13

I want to create 100 virtual servers. They will be used for testing, so they should be easy to create and destroy.

  • They must be accessible through SSH from another physical machine (I provide the public ssh-key)
  • They must have their own IP-address and be accessible from another physical host as ssh I.P.n.o e.g. ssh 10.0.0.99 (IPv4 or IPv6, private address space OK, port-forwarding is not - so this may involve setting up a bridge)
  • They must have basic UNIX tools installed (preferably a full distro)
  • They must have /proc/cpuinfo, a root user, and a netcard (This is probably only relevant if the machine is not fully virtualized)
  • Added bonus if they can be made to run an X server that can be connected to remotely (using VNC or similar)

What is the fastest way (wall clock time) to do this given:

  • The host system runs Ubuntu 20.04 and has plenty of RAM and CPU
  • The LAN has a DHCP-server (it is also OK to use a predefined IP-range)
  • I do not care which Free virtualization technology is used (Containerization is also OK if the other requirements are met)

and what are the actual commands I should run/files I should create?

I have the feeling that given the right technology this is a 50 line job that can be set up in minutes.

The few lines can probably be split into a few bash functions:

install() {
  # Install needed software once
}
setup() {
  # Configure the virtual servers
}
start() {
  # Start the virtual servers
  # After this it is possible to do:
  #   ssh 10.0.0.99
  # from another physical server
}
stop() {
  # Stop the virtual servers
  # After there is no running processes on the host server
  # and after this it is no longer possible to do:
  #   ssh 10.0.0.99
  # from another physical server
  # The host server returns to the state before running `start`
}
destroy() {
  # Remove the setup
  # After this the host server returns to the state before running `setup`
}

Background

For developing GNU Parallel I need an easy way to test running on 100 machines in parallel.

For other projects it would also be handy to be able to create a bunch of virtual machines, test some race conditions and then destroy the machines again.

In other words: This is not for a production environment and security is not an issue.

Docker

Based on @danielleontiev's notes below:

install() {
    # Install needed software once
    sudo apt -y install docker.io
    sudo groupadd docker
    sudo usermod -aG docker $USER
    # Logout and login if you were not in group 'docker' before
    docker run hello-world
}

setup() { # Configure the virtual servers mkdir -p my-ubuntu/ ssh/ cp ~/.ssh/id_rsa.pub ssh/ cat ssh/*.pub > my-ubuntu/authorized_keys cat >my-ubuntu/Dockerfile <<EOF FROM ubuntu:bionic RUN apt update &&
apt install -y openssh-server RUN mkdir /root/.ssh COPY authorized_keys /root/.ssh/authorized_keys

run blocking command which prevents container to exit immediately after start.

CMD service ssh start && tail -f /dev/null EOF docker build my-ubuntu -t my-ubuntu }

start() { # start container number x..y servers_min=$1 servers_max=$2

testssh() {
    ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/tmp/known root@&quot;$1&quot; echo &quot;'$1'&quot; '`uptime`'
}
export -f testssh
setup_bridge() {
# OMG why is this so hard
# Default interface must have IP-addr removed
# bridge must have IP-addr + routing copied from $dif, so it takes over default interface
# Why on earth could we not just: brctl addif dock0 $dif - and be done?
default_interface=$(ip -4 route ls | grep default | grep -Po '(?&lt;=dev )(\S+)')
dif=$default_interface
gw=$(ip -4 route ls | grep default | grep -Po '(?&lt;=via )(\S+)')
dif_ip=$(ip -4 route ls | grep default | grep -Po '(?&lt;=src )(\S+)')
echo Add bridge
docker network create --driver bridge --subnet=172.20.0.0/16 --opt com.docker.network.bridge.name=dock0 net0
# $dif must be up, but with no ip addr
sudo ip addr flush dev $dif
sudo brctl addif dock0 $dif
sudo ifconfig dock0:ext $dif_ip
sudo route add -net 0.0.0.0 gw $gw
}
# Start the containers
startone() {
id=$1
net=$2
    docker run -d --rm --name ubuntu-$id-$net --network $net my-ubuntu
docker inspect ubuntu-$id-$net
}
export -f startone

setup_bridge
echo Start containers
seq $servers_min $servers_max | parallel startone {} net0 |
    # After this it is possible to do:
    #   ssh 10.0.0.99
    # from another physical server
    perl -nE '/&quot;IPAddress&quot;: &quot;(\S+)&quot;/ and not $seen{$1}++ and say $1' |
# Keep a list of the IP addresses in /tmp/ipaddr
tee /tmp/ipaddr |
    parallel testssh
docker ps
route -n

}

stop() { # Stop the virtual servers # After there is no running processes on the host server # and after this it is no longer possible to do: # ssh 10.0.0.99 # from another physical server # The host server returns to the state before running start echo Stop containers docker ps -q | parallel docker stop {} | perl -pe '$|=1; s/^............\n$/./' echo echo If any containers are remaining it is an error docker ps # Take down bridge docker network ls|G bridge net|field 1| sudo parallel docker network rm # Re-establish default interface dif=$default_interface sudo ifconfig $dif $dif_ip # Routing takes a while to be updated sleep 2 route -n }

destroy() { # Remove the setup # After this the host server returns to the state before running setup rm -rf my-ubuntu/ docker rmi my-ubuntu }

full() { install setup start stop destroy }

$ time full real 2m21.611s user 0m47.337s sys 0m31.882s

This takes up 7 GB RAM in total for running 100 virtual servers. So you do not even need to have plenty of RAM to do this.

It scales up to 1024 servers after which the docker bridge complains (probably due to Each Bridge Device can have up to a maximum of 1024 ports).

The script can be adapted to run 6000 containers (Run > 1024 docker containers), but at 6055 it blocks (https://serverfault.com/questions/1091520/docker-blocks-when-running-multiple-containers).

Vagrant

Based on @Martin's notes below:

install() {
    # Install needed software once
    sudo apt install -y vagrant virtualbox
}
setup() {
    # Configure the virtual servers
    mkdir -p ssh/
    cp ~/.ssh/id_rsa.pub ssh/
    cat ssh/*.pub > authorized_keys
    cat >Vagrantfile <<'EOF'
Vagrant.configure("2") do |config|
  config.vm.box = "debian/buster64"
  (1..100).each do |i|
    config.vm.define "vm%d" % i do |node|
      node.vm.hostname = "vm%d" % i
      node.vm.network "public_network", ip: "192.168.1.%d" % (100+i)
    end
  end

config.vm.provision "shell" do |s| ssh_pub_key = File.readlines("authorized_keys").first.strip s.inline = <<-SHELL mkdir /root/.ssh echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys echo #{ssh_pub_key} >> /root/.ssh/authorized_keys apt-get update apt-get install -y parallel SHELL end end EOF } start() { testssh() { ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@"$1" echo "'$1'" 'uptime' } export -f testssh # Start the virtual servers seq 100 | parallel --lb vagrant up vm{} # After this it is possible to do: # ssh 192.168.1.111 # from another physical server parallel testssh ::: 192.168.1.{101..200} } stop() { # Stop the virtual servers # After there is no running processes on the host server # and after this it is no longer possible to do: # ssh 10.0.0.99 # from another physical server # The host server returns to the state before running start seq 100 | parallel vagrant halt vm{} } destroy() { # Remove the setup # After this the host server returns to the state before running setup seq 100 | parallel vagrant destroy -f vm{} rm -r Vagrantfile .vagrant/ }

full() { install setup start stop destroy }

start gives a lot of warnings:

NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.

stop gives this warning:

NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
NOTE: Gem::Specification.default_specifications_dir is deprecated; use Gem.default_specifications_dir instead. It will be removed on or after 2020-02-01.
Gem::Specification.default_specifications_dir called from /usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/bundler.rb:428.
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm.rb:354: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/plugins/kernel_v2/config/vm_provisioner.rb:92: warning: The called method `add_config' is defined here
/usr/share/rubygems-integration/all/gems/vagrant-2.2.6/lib/vagrant/errors.rb:103: warning: Using the last argument as keyword parameters is deprecated; maybe ** should be added to the call
/usr/share/rubygems-integration/all/gems/i18n-1.8.2/lib/i18n.rb:195: warning: The called method `t' is defined here

Each virtual machine takes up 0.5 GB of RAM on the host system.

It is much slower to start than the Docker machines above. The big difference is that the Vagrant-machines do not have to run the same kernel as the host, but are complete virtual machines.

Ole Tange
  • 35,514
  • 1
    I have avoided it being opinion based by putting in the measuring stick: fastest. So we can objectively test if the solution given is the fastest. – Ole Tange Jun 04 '20 at 16:05
  • 4
    First thing I'd try is to spin up 50 docker containers; with replicas in docker-compose that's a one-liner once the configuration file is written, both for creation and deletion. Pick any distro image you like. You can run Xvfb (but that will use up more memory, and don't know how much "plenty of RAM" is). But both for GNU parallel and race condition tests you don't have to. I'll leave it as an exercise for you to figure out the actual commands. – dirkt Jun 04 '20 at 16:21
  • @dirkt Plenty literally means plenty. – Ole Tange Jun 04 '20 at 16:34
  • I would implement this using Ansible. Sorry, did not got time for a full answer. – dirdi Jun 06 '20 at 16:33
  • have you considered systemd-nspawn? it should suit the job well and also be fairly simple and fast to setup on a ubuntu 20.04 host – LL3 Jun 07 '20 at 13:59
  • @LL3 I am open to any solution that meets the requirements. So feel free to post an answer. – Ole Tange Jun 07 '20 at 17:03
  • @Oletange superlatives like "fastest" beg for opinion based answers where all your other metrics like CPU, RAM, and the exact job these will run remain undefined. Stick with a word like "fast" and make it clear whether you mean less human effort or less CPU time. – Philip Couling Jun 07 '20 at 22:22
  • @PhilipCouling Fastest is what is measurable in milliseconds (wall clock time). Given an answer (which hopefully will be around 50 lines of code) I will run that and time it. "Fast" would OTOH make it opinion based. – Ole Tange Jun 07 '20 at 22:33
  • @oletange you missed my point. Other requirements and limitations will affect the outcome. Benchmarks are notorious for having different outcomes for different people because they run subtly different tests. Time also plays a part. Later versions of software has different performance characteristics. Also people rarely know every piece of software on the shelf. OTOH "fast" is a subjective term that is easy enough to understand in context... You are spinning up 100 VMs you don't want to wait an hour. Indeed it's no worse than the word "plenty". – Philip Couling Jun 08 '20 at 05:48
  • The answers below propose different approaches. The Docker approach should start faster when running on Linux whereas the VM approach might offer additional flexibility (with minimal modifications you could for example use it to run 1,000 VMs on AWS). Refer to Should I use Vagrant or Docker for creating an isolated environment? to read some advise of the authors of Docker and Vagrant. – Martin Konrad Jun 09 '20 at 03:11
  • @OleTange the errors you're getting when running Vagrant don't look right. Do you also see them when you run vagrant up (spin up VMs serially)? – Martin Konrad Jun 10 '20 at 05:05
  • @OleTange I looked into the issues you ran into when spinning up the VMs in parallel. It seems to me like VirtualBox doesn't support spinning up VMs in parallel which would explain why Vagrant doesn't implement the --parallel for the Virtualbox provider. I guess that makes the Virtualbox approach less useful for you. – Martin Konrad Jun 14 '20 at 14:47

5 Answers5

10

I think docker meets your requirements.

1) Install docker (https://docs.docker.com/engine/install/) Make sure you are done with linux post installation steps (https://docs.docker.com/engine/install/linux-postinstall/)

2) I assume you have the following directory structure:

.
└── my-ubuntu
    ├── Dockerfile
    └── id_rsa.pub

1 directory, 2 files

id_rsa.pub is your public key and Dockerfile we will discuss below

3) First, we are going to build docker image. It's like template for containers that we are going to run. Each container would be something like materialization of our image.

4) To build image we need a template. It's Dockerfile:

FROM ubuntu:bionic
RUN apt update && \
    apt install -y openssh-server
RUN mkdir /root/.ssh
COPY id_rsa.pub /root/.ssh/authorized_keys

CMD service ssh start && tail -f /dev/null

  • FROM ubuntu:bionic defines our base image. You can find base for Arch, Debian, Apline, Ubuntu, etc on hub.docker.com
  • apt install part installs ssh server
  • COPY from to copies our public key to the place where it will be in the container
  • Here you could add more RUN statements to do additional things: install software, create files, etc...
  • The last is tricky. The first part starts ssh server when we start container which is obvious but the second is important - it runs blocking command which prevents container to exit immediately after start.

5) docker build my-ubuntu -t my-ubuntu - builds image. The output of this command:

Sending build context to Docker daemon  3.584kB
Step 1/5 : FROM ubuntu:bionic
 ---> c3c304cb4f22
Step 2/5 : RUN apt update &&     apt install -y openssh-server
 ---> Using cache
 ---> 40c56d549c0e
Step 3/5 : RUN mkdir /root/.ssh
 ---> Using cache
 ---> c50d8b614b21
Step 4/5 : COPY id_rsa.pub /root/.ssh/authorized_keys
 ---> Using cache
 ---> 34d1cf4e9f69
Step 5/5 : CMD service ssh start && tail -f /dev/null
 ---> Using cache
 ---> a442db47bf6b
Successfully built a442db47bf6b
Successfully tagged my-ubuntu:latest

6) Let's run my-ubuntu. (Once again my-ubuntu is the name of image). Starting container with name my-ubuntu-1 which is derived from my-ubuntu image:

docker run -d --rm --name my-ubuntu-1 my-ubuntu

Options:

  • -d demonize for running container in bg
  • --rm to erase container after container stops. It can be important because when you deal with a lot of containers they can quickly pollute you HDD.
  • --name name for container
  • my-ubuntu image we start from

7) Image is running. docker ps can prove this:

CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
ee6bc20fd820        my-ubuntu           "/bin/sh -c 'service…"   5 minutes ago       Up 5 minutes         my-ubuntu-1

8) To execute command in the container run:

docker exec -it my-ubuntu-1 bash - to get into the container's bash. It is possible to provide any command

9) If running command the way above is not enough do docker inspect my-ubuntu-1 and grep IPAddress field. For my it's 172.17.0.2.

ssh root@172.17.0.2
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.6.15-arch1-1 x86_64)

10) To stop container: docker stop my-ubuntu-1

11) Now it is possible to run 100 containers:

#!/bin/bash

for i in $(seq 1 100); do
    docker run -d --rm --name my-ubuntu-$i my-ubuntu
done

My docker ps:

... and so on ...
ee2ccce7f642        my-ubuntu           "/bin/sh -c 'service…"   46 seconds ago      Up 45 seconds                            my-ubuntu-20
9fb0bfb0d6ec        my-ubuntu           "/bin/sh -c 'service…"   47 seconds ago      Up 45 seconds                            my-ubuntu-19
ee636409a8f8        my-ubuntu           "/bin/sh -c 'service…"   47 seconds ago      Up 46 seconds                            my-ubuntu-18
9c146ca30c9b        my-ubuntu           "/bin/sh -c 'service…"   48 seconds ago      Up 46 seconds                            my-ubuntu-17
2dbda323d57c        my-ubuntu           "/bin/sh -c 'service…"   48 seconds ago      Up 47 seconds                            my-ubuntu-16
3c349f1ff11a        my-ubuntu           "/bin/sh -c 'service…"   49 seconds ago      Up 47 seconds                            my-ubuntu-15
19741651df12        my-ubuntu           "/bin/sh -c 'service…"   49 seconds ago      Up 48 seconds                            my-ubuntu-14
7a39aaf669ba        my-ubuntu           "/bin/sh -c 'service…"   50 seconds ago      Up 48 seconds                            my-ubuntu-13
8c8261b92137        my-ubuntu           "/bin/sh -c 'service…"   50 seconds ago      Up 49 seconds                            my-ubuntu-12
f8eec379ee9c        my-ubuntu           "/bin/sh -c 'service…"   51 seconds ago      Up 49 seconds                            my-ubuntu-11
128894393dcd        my-ubuntu           "/bin/sh -c 'service…"   51 seconds ago      Up 50 seconds                            my-ubuntu-10
81944fdde768        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 50 seconds                            my-ubuntu-9
cfa7c259426a        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 51 seconds                            my-ubuntu-8
bff538085a3a        my-ubuntu           "/bin/sh -c 'service…"   52 seconds ago      Up 51 seconds                            my-ubuntu-7
1a50a64eb82c        my-ubuntu           "/bin/sh -c 'service…"   53 seconds ago      Up 51 seconds                            my-ubuntu-6
88c2e538e578        my-ubuntu           "/bin/sh -c 'service…"   53 seconds ago      Up 52 seconds                            my-ubuntu-5
1d10f232e7b6        my-ubuntu           "/bin/sh -c 'service…"   54 seconds ago      Up 52 seconds                            my-ubuntu-4
e827296b00ac        my-ubuntu           "/bin/sh -c 'service…"   54 seconds ago      Up 53 seconds                            my-ubuntu-3
91fce445b706        my-ubuntu           "/bin/sh -c 'service…"   55 seconds ago      Up 53 seconds                            my-ubuntu-2
54c70789d1ff        my-ubuntu           "/bin/sh -c 'service…"   2 minutes ago       Up 2 minutes         my-ubuntu-1

I can do f.e. docker inspect my-ubuntu-15, get its IP and connect to ssh to it or use docker exec.

It is possible to ping containters from containers (install iputils-ping to reproduce):

root@5cacaf03bf89:~# ping 172.17.0.2 
PING 172.17.0.2 (172.17.0.2) 56(84) bytes of data.
64 bytes from 172.17.0.2: icmp_seq=1 ttl=64 time=1.19 ms
64 bytes from 172.17.0.2: icmp_seq=2 ttl=64 time=0.158 ms
64 bytes from 172.17.0.2: icmp_seq=3 ttl=64 time=0.160 ms
^C
--- 172.17.0.2 ping statistics ---

N.B. running containers from bash is quick solution. If you would like scalable approach consider using kubernetes or swarm

P.S. Useful commands:

  • docker ps
  • docker stats
  • docker container ls
  • docker image ls

  • docker stop $(docker ps -aq) - stops all running containers

Also, follow the basics from docs.docker.com - it's 1 hour time spent for better experience working with containers

Additional:

Base image in the example is really minimal image. It does not have DE or even xorg. You could install it manually (adding packages to RUN apt install ... section) or use image that already has the software you need. Quick googling gives me this (https://github.com/fcwu/docker-ubuntu-vnc-desktop). I have never tried but I think it should work. If you are definitely need VNC access I should try to play around a bit and add info to the answer

Exposing to local network:

This one may be tricky. I am sure it can be done with some obscure port forwarding but the straightforward solution is to change running script as follows:

#!/bin/bash

for i in $(seq 1 100); do
    docker run -d --rm -p $((10000 + i)):22 --name my-ubuntu-$i my-ubuntu
done

After that you would be able to access your containers with host machine IP:

ssh root@localhost -p 10001
The authenticity of host '[localhost]:10001 ([::1]:10001)' can't be established.
ECDSA key fingerprint is SHA256:erW9kguSvn1k84VzKHrHefdnK04YFg8eE6QEH33HmPY.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '[localhost]:10001' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 18.04.4 LTS (GNU/Linux 5.6.15-arch1-1 x86_64)
  • This is close to an answer. I can see, I have not made it clear, that port-forwarding is not enough: The virtual server must be accessible from another physical machine on its IP address without a port number (e.g. ssh 10.0.0.99). – Ole Tange Jun 08 '20 at 06:17
  • I will try to do something with ports later today – danielleontiev Jun 08 '20 at 07:32
  • docker network create --driver=bridge --ip-range=10.0.0.0/24--subnet=10.0.0.0/16 --aux-address='ip1=10.0.0.1' -o "com.docker.network.bridge.name=br0" br0 , then add --network=br0 to the docker run command and find a way to either FORWARD with iptables or attach a network card to your br0 bridge – Benji over_9000 'benchonaut' Jun 09 '20 at 10:50
  • I have some problems with my local network and in fact I am only able to test on my machine. Maybe @oletange will try it – danielleontiev Jun 09 '20 at 11:09
  • @BenjiBear Can you adapt the precise commands given the host has IP 192.168.1.31 on interface eno1? – Ole Tange Jun 09 '20 at 16:57
  • @danielleontiev docker defaults to using 172.x.x.x. How can I ask it to use 192.168.1.100-200, so they can be bridged onto the lan that uses 192.168.1.x? – Ole Tange Jun 09 '20 at 16:59
  • @OleTange Try to ask another question on "How to give a docker container an IP on local network?", because I cannot adapt BenjiBear's command -- it does not work for me – danielleontiev Jun 09 '20 at 17:48
  • docker network create --driver=bridge --ip-range=192.168.1.100/25 --subnet=192.168.1.0/24 --aux-address='ip1=192.168.1.32' -o "com.docker.network.bridge.name=br0" , then brctl addif br0 eno1 , the docker side will have the ip 1.32 , no guarantee, you are better off modifying /etc/network/interfaces and make the bridge your default interface as described here https://developer.ibm.com/recipes/tutorials/bridge-the-docker-containers-to-external-network/ – Benji over_9000 'benchonaut' Jun 09 '20 at 18:00
5
  • create a virtual network

    ( either with virtualbox

    or by using docker , e.g.: docker network create --driver=bridge --ip-range=10.0.190.0/24 --subnet=10.0.0.0/16 --aux-address='ip1=10.0.190.1' --aux-address='ip2=10.0.190.2' --aux-address='ip3=10.0.190.3' -o "com.docker.network.bridge.name=br0" br0 )

  • if you want virtualbox/kvm :

    prepare a pxe/http server and a distribution like SLAX or Alpine Linux , with slax and savechanges you cloud build a system with all software prepackaged , on the other hand , it will be much overhead , but with tools like Cluster SSH you can trigger your commands simultaneously by running

    cssh root@10.0.190.{04..254} -p 22

  • when using docker: attach all containers to the named network , either via docker-compose or manually , you could also modify the CMD to run dropbear if you want to have ssh access

  • 1
    This does not seem to be a complete answer, but mostly ideas for an answer. I am looking for a complete answer. A complete answer would include all the actual commands with complete config files to run on a stock Ubuntu 20.04. – Ole Tange Jun 07 '20 at 22:28
  • 1
    i am really sorry that i am not going to do your homework – Bash Stack Jun 08 '20 at 10:33
  • 3
    @BashStack, lol. If I knew how to, I wouldn't mind doing some of Ole's "homework" so he could put the time to better use maintaining GNU Parallel – iruvar Jun 08 '20 at 17:25
  • 1
    @OleTange @iruvar

    since @Ole found a way with docker:

    you can create a network named br0 ( the physical bridge will have the same name ) with the command above , and then just modify your current version in the post to have "docker run ... --network=br0 " ( or however you named it )

    – Bash Stack Jun 08 '20 at 19:31
5

You could use Vagrant for spinning up your test environments. Once you wrote a Vagrantfile defining the distro to run, the network configuration etc. you can bring up the machines by running vagrant up <vmname> or just vagrant up to fire all of them up. Vagrant supports various virtualization providers including Virtual Box, VMware, KVM, AWS, Docker,... Vagrant is able to spin up development environments quickly since it is leveraging pre-built "box" files rather than installing each system from scratch. At the same time Vagrant allows you to run your custom provisioning for each VM using Ansible, Puppet, Chef, CFEngine or simply a short shell script. You can mix and match different distributions in the same Vagrantfile. SSH access is set up automatically. You can get access to a machine by running vagrant ssh <vmname>. Synced folders make it easy to bring files from your host system into your test environments.


Here are the steps in detail:

  1. Download and install Vagrant and your favorite virtualization provider:

    $ sudo apt install -y vagrant virtualbox
    
  2. Create Vagrantfile with the following content:

    Vagrant.configure("2") do |config|
      config.vm.box = "debian/buster64"
      (1..100).each do |i|
        config.vm.define "vm%03d" % i do |node|
          node.vm.hostname = "vm%03d" % i
          node.vm.network "public_network", ip: "192.168.1.%d" % (99 + i)
        end
      end
    
      config.vm.provision "shell" do |s|
        ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
        s.inline = <<-SHELL
          mkdir /root/.ssh
          echo #{ssh_pub_key} >> /home/vagrant/.ssh/authorized_keys
          echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
          apt-get update
          apt-get install -y parallel
        SHELL
      end
    end
    
  3. Spin up the VMs:

    $ parallel vagrant up ::: vm{001..100}
    
  4. SSH to the VMs: The Vagrant way (using the key generated by Vagrant):

    $ vagrant ssh vm001
    

    Using your own key (which we installed into the VMs during the provisioning phase):

    $ ssh vagrant@<IP>
    

    Or to get root access:

    $ ssh root@<IP>
    
  5. You can suspend the VMs by running vagrant suspend and bring them up a few days later to continue testing (vagrant up). If you have many test environments but only limited disk space you can destroy some VMs and recreate them later.

  6. Destroy the VMs and delete the configuration:

    vagrant destroy -f
    rm -rf Vagrantfile .vagrant
    
  • This does not seem to be a complete answer, but mostly ideas for an answer. I am looking for a complete answer. A complete answer would include all the actual commands with complete config files to run on a stock Ubuntu 20.04. vagrant ssh <vmname> does not seem to meet the criteria of being accessible from a different physical machine using plain ssh. – Ole Tange Jun 07 '20 at 22:30
  • I found some time to outline the steps for setting this up. With the configuration above you can also get direct SSH access from a different machine. The "Vagrant way" would be to let Vagrant set up SSH and the keys and then run vagrant ssh-config to get an SSH config snippet + a private key you can use to get access from outside. – Martin Konrad Jun 09 '20 at 02:45
  • This is getting closer to an answer. I have updated my question with how I interpret your notes. But I still cannot access the virtual machines on their IP-address from a remote machine. I hope you will update your answer showing where I got you wrong. – Ole Tange Jun 09 '20 at 06:29
  • I was expecting your DHCP/DNS to take care of that based on the host name. I updated my answer to use static IPs. – Martin Konrad Jun 10 '20 at 06:02
1

This might be a job well suited for systemd-nspawn containers, except for the X server unless xvfb is enough, and here I made a couple of complete scripts including basic network connectivity to LAN.

I've made them along the lines of your skeleton script, and they are tailored for maximum speed of setting-up.

The first script builds containers based on an Ubuntu 20.04 providing the same tools as in your docker attempt, as it seems you are happy with those for your use case. On a single-CPU Xeon Silver 4114 2.20Ghz (10cores + HT) with 32GB of RAM this script completes a full run from install to destroy of 100 containers in ~35secs, with a RAM occupation of ~600MB.

The second script builds containers that more resemble a true VM, with a fuller Ubuntu 20.04 distro comprising its own systemd and the typical service daemons like cron, rsyslog etc. This completes in < 3mins, with an occupation of about 3.3GB for 100 "machines".

In both cases the great majority of time is spent in the setup phase, downloading/bootstrapping the image template etc.


First script, "docker-like" experience:

#!/bin/bash --
# vim: ts=4 noet

install() { [ -e /etc/radvd.conf ] || cat > /etc/radvd.conf <<EOF interface bogus { IgnoreIfMissing on; }; EOF apt -y install systemd-container debootstrap wget radvd }

setup() { mkdir -p "$machines"

# Fetch Ubuntu 20.04 basic system
#debootstrap focal &quot;$machines/$tmpl&quot; # &lt;-- either this, or the below wget + tar + mount
wget -P &quot;$machines&quot; https://partner-images.canonical.com/core/focal/current/ubuntu-focal-core-cloudimg-amd64-root.tar.gz
mkdir -p &quot;$machines/$tmpl&quot;
tar -C &quot;$machines/$tmpl&quot; -xzf &quot;$machines/ubuntu-focal-core-cloudimg-amd64-root.tar.gz&quot;
mount --bind /etc/resolv.conf &quot;$machines/$tmpl/etc/resolv.conf&quot;

# Put our ssh pubkeys
mkdir -p &quot;$machines/$tmpl/root/.ssh&quot;
(shopt -s failglob; : ~/.ssh/*.pub) 2&gt;/dev/null \
    &amp;&amp; cat ~/.ssh/*.pub &gt; &quot;$machines/$tmpl/root/.ssh/authorized_keys&quot;
# Let nspawn use our parameterized hostname
rm -f &quot;$machines/$tmpl/etc/hostname&quot;
# Allow apt to function in chroot without complaints
mount -o bind,slave,unbindable /dev &quot;$machines/$tmpl/dev&quot;
mount -o bind,slave,unbindable /dev/pts &quot;$machines/$tmpl/dev/pts&quot;
export DEBIAN_FRONTEND=noninteractive LANG=C.UTF-8
chroot &quot;$machines/$tmpl&quot; sh -c 'apt-get update &amp;&amp; apt-get install -y --no-install-recommends apt-utils'
# No init-scripts are to be run while in chroot
cat &gt;&gt; &quot;$machines/$tmpl/usr/sbin/policy-rc.d&quot; &lt;&lt;'EOF'

#!/bin/sh -- exit 101 EOF chmod +x "$machines/$tmpl/usr/sbin/policy-rc.d" # Install additional packages for the use case chroot "$machines/$tmpl" apt-get install -y --no-install-recommends
bash-completion iproute2 vim iputils-ping
openssh-server # Uncomment these to allow root in, with password "let-me-in"

echo 'PermitRootLogin yes' > "$machines/$tmpl/etc/ssh/sshd_config.d/allow-root-with-password.conf" \

&& chroot "$machines/$tmpl" chpasswd <<<'root:let-me-in'

umount -l &quot;$machines/$tmpl/dev/pts&quot; &quot;$machines/$tmpl/dev&quot; &quot;$machines/$tmpl/etc/resolv.conf&quot;

}

start() { # Connect to physical LAN by building a temporary bridge over the specified physical interface # Of course this is not required if the interface facing the LAN is already a bridge interface, in which case you can just use that as "$mybr" and skip this pipeline # TODO: check on possible "$mybr" existence, and/or being already a bridge, and/or enslaving of "$intf" already in place # NOTE: be careful how the interface in "$intf" is named, as here it is used in sed's regex ip -o -b - <<EOF | awk '{print "route list " $4}' | ip -b - | sed "s/^/route replace /;s/ $intf / $mybr /g" | ip -b - link add $mybr type bridge link set $mybr up link set $intf master $mybr addr show $intf up EOF

# Advertise a temporary private IPv6 network in LAN
ipv6pfx='fddf:' # this arbitrary pfx is not properly compliant, but very handy for quick use in simple LANs
cat &gt;&gt; /etc/radvd.conf &lt;&lt;EOF

$tmpl

interface $mybr { AdvSendAdvert on; prefix $ipv6pfx:/64 { AdvValidLifetime 7200; AdvPreferredLifetime 3600; }; };

EOF systemctl start radvd

for i in $(seq &quot;$vmnum&quot;); do
    # Spawn containers that don't persist on disk
    systemd-run --unit=&quot;$tmpl-mini-$i&quot; --service-type=notify \
        systemd-nspawn --notify-ready=no --register=no --keep-unit --kill-signal=RTMIN+3 \
            -M &quot;${tmpl:0:8}$i&quot; \
            -D &quot;$machines/$tmpl&quot; --read-only --link-journal no \
            --overlay +/etc::/etc --overlay +/var::/var \
            --network-bridge=&quot;$mybr&quot; \
            --as-pid2 sh -c 'ip link set host0 up &amp;&amp; ip addr add '&quot;$ipv6pfx:$i/64&quot;' dev host0 &amp;&amp; mkdir -p /run/sshd &amp;&amp; exec /usr/sbin/sshd -D' \
            &amp; # Run in bg and wait later; this way we allow systemd's parallel spawning
            # Below is a --as-pid2 alternative for using dhcp, but beware bombing on LAN's dhcp server
            #--as-pid2 sh -c 'udhcpc -fbi host0; mkdir -p /run/sshd &amp;&amp; exec /usr/sbin/sshd -D' \
done
wait

}

stop() { systemctl stop "$tmpl-mini-*" systemctl stop radvd ip link del "$mybr" 2>/dev/null netplan apply sed -i "/^### $tmpl/,/^###$/d" /etc/radvd.conf }

destroy() { rm -rf "$machines/$tmpl" rm -f "$machines/ubuntu-focal-core-cloudimg-amd64-root.tar.gz" }

: "${machines:=/var/lib/machines}" # default location for systemd-nspawn containers : "${vmnum:=100}" # how many containers to spawn : "${intf:?specify the physical interface facing the LAN to connect to}" : "${tmpl:?specify directory basename under $machines to store the containers' OS template into}" : "${mybr:=$tmpl-br}" # the temporary bridge to LAN will be named this

install setup start stop destroy

Once you have spawn "docker-like" containers you can handle them through systemctl. They are all spawn as systemd services named <template-name>-mini-<number>.

You may enter a shell into any one of them either through ssh or via nsenter -at <pid-of-any-process-belonging-to-a-specific-container>


Second script, "vm-like" experience:

#!/bin/bash --
# vim: ts=4 noet

install() { [ -e /etc/radvd.conf ] || cat > /etc/radvd.conf <<EOF || return interface bogus { IgnoreIfMissing on; }; EOF apt -y install systemd-container debootstrap radvd || return }

setup() { mkdir -p "$machines/$tmpl" || return # Fetch Ubuntu 20.04 base system debootstrap focal "$machines/$tmpl" || return

# Allow apt to function in chroot without complaints
trap &quot;umount -l $machines/$tmpl/dev/pts&quot; RETURN
mount -o bind,slave,unbindable /dev/pts &quot;$machines/$tmpl/dev/pts&quot; || return
# Put our ssh pubkeys
mkdir -p &quot;$machines/$tmpl/root/.ssh&quot; || return
(shopt -s failglob; : ~/.ssh/*.pub) 2&gt;/dev/null \
    &amp;&amp; { cat ~/.ssh/*.pub &gt; &quot;$machines/$tmpl/root/.ssh/authorized_keys&quot; || return; }
# Let nspawn use our parameterized hostname
rm -f &quot;$machines/$tmpl/etc/hostname&quot; || return
# Enable container's systemd-networkd, it blends automatically with host's systemd-networkd
chroot &quot;$machines/$tmpl&quot; systemctl enable systemd-networkd || return
# Make provision for static addresses passed along at start time (see start phase below)
cat &gt; &quot;$machines/$tmpl/etc/networkd-dispatcher/carrier.d/$tmpl-static-addrs.sh&quot; &lt;&lt;'EOF' || return

#!/bin/bash -- [ -n "$static_ipaddrs" ] && printf 'addr add %s dev host0\n' ${static_ipaddrs//,/ } | ip -b - EOF chmod +x "$machines/$tmpl/etc/networkd-dispatcher/carrier.d/$tmpl-static-addrs.sh" || return # Uncomment this to mind about updates and security

printf 'deb http://%s.ubuntu.com/ubuntu/ focal-%s main\n' \

archive updates security security \

>> "$machines/$tmpl/etc/apt/sources.list" || return

# Uncomment this to consider [uni|multi]verse packages

sed -i 's/$/ universe multiverse' "$machines/$tmpl/etc/apt/sources.list" || return

export DEBIAN_FRONTEND=noninteractive LANG=C.UTF-8
chroot &quot;$machines/$tmpl&quot; apt-get update || return
# To upgrade or not to upgrade? that is the question..
#chroot &quot;$machines/$tmpl&quot; apt-get -y upgrade || return
# Install additional packages for the use case
chroot &quot;$machines/$tmpl&quot; apt-get install -y --no-install-recommends \
        bash-completion \
        openssh-server \
    || return
# Uncomment these to allow root in, with password &quot;let-me-in&quot;

echo 'PermitRootLogin yes' > "$machines/$tmpl/etc/ssh/sshd_config.d/allow-root-with-password.conf" || return

chroot "$machines/$tmpl" chpasswd <<<'root:let-me-in' || return

}

start() { # For full-system modes we need inotify limits greater than default even for just a bunch of containers (( (prev_max_inst = $(sysctl -n fs.inotify.max_user_instances)) < 10vmnum ))
&& { sysctl fs.inotify.max_user_instances=$((10
vmnum)) || return 1; } (( (prev_max_wd = $(sysctl -n fs.inotify.max_user_watches)) < 40vmnum ))
&& { sysctl fs.inotify.max_user_watches=$((40
vmnum)) || return 1; } [ -s "$machines/prev_inotifys" ] || declare -p ${!prev_max_*} > "$machines/prev_inotifys"

# Connect to physical LAN by building a temporary bridge over the specified physical interface
# Of course this is not required if the interface facing the LAN is already a bridge interface, in which case you can just use that as &quot;$mybr&quot; and skip this pipeline
# TODO: check on possible &quot;$mybr&quot; existence, and/or being already a bridge, and/or enslaving of &quot;$intf&quot; already in place
# NOTE: be careful how the interface in &quot;$intf&quot; is named, as here it is used in sed's regex
ip -o -b - &lt;&lt;EOF | awk '{print &quot;route list &quot; $4}' | ip -b - | sed &quot;s/^/route replace /;s/ $intf / $mybr /g&quot; | ip -b -

link add $mybr type bridge link set $mybr up link set $intf master $mybr addr show $intf up EOF

# Advertise a temporary private IPv6 network in LAN
ipv6pfx='fddf:' # this arbitrary pfx is not properly compliant, but very handy for quick use in simple LANs
cat &gt;&gt; /etc/radvd.conf &lt;&lt;EOF || return

$tmpl

interface $mybr { AdvSendAdvert on; prefix $ipv6pfx:/64 { AdvValidLifetime 7200; AdvPreferredLifetime 3600; }; };

EOF systemctl start radvd

for i in $(seq &quot;$vmnum&quot;); do
    # Spawn containers that don't persist on disk
    systemd-run --unit=&quot;$tmpl-full-$i&quot; --service-type=notify \
        systemd-nspawn --notify-ready=yes -b \
            -M &quot;${tmpl:0:8}$i&quot; \
            -D &quot;$machines/$tmpl&quot; --read-only --link-journal no \
            --overlay +/etc::/etc --overlay +/var::/var \
            --network-bridge=&quot;$mybr&quot; \
            --capability=all --drop-capability=CAP_SYS_MODULE \
            &quot;systemd.setenv=static_ipaddrs=$ipv6pfx:$i/64&quot; \
            &amp; # Run in bg and wait later; this way we allow systemd's parallel spawning
            # All capabilities allowed and no users isolation provide an experience which is
            # closer to a true vm (though with less security)
            # The comma separated list of static addresses will be set by our script in networkd-dispatcher
done
wait

}

stop() { systemctl stop "machine-$tmpl" "$tmpl-full-" systemctl stop radvd ip link del "$mybr" 2>/dev/null netplan apply sed -i "/^### $tmpl/,/^###$/d" /etc/radvd.conf # restore previous inotify limits source "$machines/prev_inotifys" || return rm -f "$machines/prev_inotifys" (( prev_max_wd > 0 )) && sysctl fs.inotify.max_user_watches="$prev_max_wd" (( prev_max_inst > 0 )) && sysctl fs.inotify.max_user_instances="$prev_max_inst" }

destroy() { rm -rf "$machines/$tmpl" }

: "${machines:=/var/lib/machines}" # default location for systemd-nspawn machines : "${vmnum:=100}" # how many containers to spawn : "${intf:?specify the physical interface facing the LAN to connect to}" : "${tmpl:?specify directory basename under $machines to store the containers' OS template into}" : "${mybr:=$tmpl-br}" # the temporary bridge will be named this

install || exit setup || { destroy; exit 1; } start || { stop; exit 1; } stop destroy

Once you have spawn "vm-like" containers, on the host you can also use either machinectl and systemctl to handle them. Examples:

  • machinectl shell <container-name> provides a handy way to get a shell into a specific container
  • machinectl alone, or also systemctl list-machines, provide the list of running containers
  • machinectl poweroff <container-name>, or also systemctl stop machine-<container-name> stops a container (you can also do poweroff from a shell inside the container)

For both scripts I went for IPv6 connectivity as it has native features for hosts autoconfiguration. If all your hosts on LAN are friendly IPv6 citizens they will self-configure a temporary address of the fddf::/64 network initiated on-fly by my scripts and advertised to the entire LAN (and of course shared with all containers).

Such "fddf::/64" IPv6 prefix is entirely arbitrary, and falls in the official prefix allocated by IANA for private networks. I've chosen it very handy so that from any host on your LAN you can just do ssh root@fddf::<vm-number>.

However it is not exactly compliant to how those prefixes should be generated, and should you wish to generate a compliant private prefix please read RFC 4193, particularly section 3.2.2.

In any case, said "vm-number" is from 1 to whatever number of guests you'll spawn, and I've left them decimal in an IPv6 hextet so you have room for up to 9999 addresses.

Of course you may also use IPv4 addresses, and here you have two options:

  • static addresses: just add them to the command line that spawns the containers (see comments there); however you will have to implement a way to pre-compute such addresses as per your needs
  • dhcp: the "docker-like" script has a commented line for enabling dhcp, the "vm-like" already does it on its own accord as per Ubuntu 20.04 systemd's default behavior
LL3
  • 5,418
0

I would suggest self hosted Gitlab. It has kubernetes and docker integrated out of the box and will allow for automation of pretty much everything you describe needing to do.