Munin deployment crash course

Generally speaking, server and service monitoring is a good thing. You can't fix problems you don't know about. I've put off setting up any sort of monitoring for my growing amounts of personal infrastructure, and I finally got sufficiently bored and distracted to do something about it.

I wound up using Munin for the task. In this day and age, the usual tool of choice tends to be Prometheus, especially when paired with Grafana for pretty graphs. However, the machines which I want to monitor are in a variety of locations and on different providers' networks, and it wasn't immediately clear to me how to add authentication and transport security to metrics gathering in Prometheus. Munin, on the other hand, has out-of-the-box support for SSH tunneling. I'm also familiar with Munin from experience at one $DAYJOB, where I also incidentally got to see various scaling limitations in Munin's data storage backend which become apparent when you're monitoring on the order of thousands of hosts. Hopefully my personal network won't get that out of hand.

Architecture

There are two parts to a Munin deployment: the "master" and the "node". The master is a host which gathers and consumes metrics data, and the nodes are hosts which produce metrics data. The nodes all run a daemon which listens on the network, and invokes a series of preconfigured monitoring scripts (which generate metrics data) upon connections from the master, and the master runs a script out of cron(8) which connects to each node, gathers data, stores it in an on-disk time series database, and then generates HTML files and graphs from this data for human consumption.

I set up a dedicated VM for my Munin master, and every node which I want to monitor needs the node software installed and configured. The following setup assumes that everything runs Debian, which is (almost) true in my case, but won't be elsewhere

Munin master setup

# apt-get install munin

Edit /etc/munin/munin.conf and comment out the localhost.localdomain host configuration.

As I'm using SSH as a transport for metrics gathering, there's some SSH client setup required in order to make that work. Generate a keypair for the munin system user, under which the metrics gathering cronjob runs:

# cd /etc/munin
# ssh-keygen -t ed25519 -f ssh_id_ed25519 -N '' -C 'munin.example.com access key'
# chown munin:munin ssh_id_ed25519 ssh_id_ed25519.pub

Then add the following configuration to /etc/ssh/ssh_config. The ControlPersist and Compression configuration options are adapted from the Munin SSH transport documentation to allow reuse of existing SSH connections and transport data compression.

Match localuser=munin
    ControlMaster auto
    ControlPath /run/munin/ssh.%C.sock
    ControlPersist 360
    TCPKeepAlive yes
    ServerAliveInterval 60
    Compression yes
    IdentityFile /etc/munin/ssh_id_25519

I'm using Nginx to serve the static Munin-generated HTML files and graphs, and fcgiwrap for running the CGI script which powers the dynamic zoom page.

# apt-get install nginx-light fcgiwrap

My Nginx virtual host for serving Munin content looks a bit like this, based on the examples given in the Munin documentation; obtaining a Let's Encrypt certificate for securing the HTTP endpoint with TLS and setting up credentials for HTTP basic authentication is left as an exercise for the reader.

server {
        listen 443 ssl;
        listen [::]:443 ssl;

        ssl_certificate /etc/letsencrypt/live/munin.example.com/fullchain.pem;
        ssl_certificate_key /etc/letsencrypt/live/munin.example.com/privkey.pem;

        server_name munin.example.com default_server;
        auth_basic "Munin";
        auth_basic_user_file /etc/munin/htpasswd;

        root /var/cache/munin;

        location /munin/static/ {
                alias /etc/munin/static/;
                expires modified +1w;
        }

        location ^~ /munin-cgi/munin-cgi-graph/ {
                fastcgi_split_path_info ^(/munin-cgi/munin-cgi-graph)(.*);
                fastcgi_param PATH_INFO $fastcgi_path_info;
                fastcgi_param SCRIPT_FILENAME /usr/lib/munin/cgi/munin-cgi-graph;
                fastcgi_pass unix:/run/fcgiwrap.socket;
                include fastcgi_params;
        }

        location /munin/ {
                alias /var/cache/munin/www/;
                expires modified 310s;
        }

        location = / {
                rewrite ^/$ munin/ redirect; break;
        }
}

fcgiwrap needs to have its out-of-the-box systemd unit configuration overridden so that it's run as the munin user instead of www-data:

# cd /etc/systemd/system
# mkdir fcgiwrap.service.d
# cat > fcgiwrap.service.d/override.conf <<EOF
> [Service]
> User=munin
> Group=munin
> EOF
# systemctl daemon-reload
# systemctl restart fcgiwrap

For each node from which I want to gather metrics data, I need to tell Munin how to reach it; for example:

# cd /etc/munin/munin-conf.d
# cat > webserver.conf <<EOF
> [webserver.example.com}
>     address ssh://munin-access@webserver.example.com -W localhost:4949
> EOF

Munin's ssh:// URL schema gives the user and hostname to connect to, with a command to run as the request path (e.g. ssh://user@example.com/bin/true), and with any options to pass to the ssh(1) command line provided afterwards. When Munin invokes ssh(1), it expects to communicate with the munin-node daemon on the remote system over ssh(1)'s standard input and output, so here I'm using the -W option to ssh(1) to request that the client's standard input and output streams be redirected to the named hostname and port (which is the standard Munin port over the loopback interface).

Munin node setup

This configuration needs to be performed on every host which I want to monitor using Munin.

# apt-get install munin-node

Edit /etc/munin/munin-node.conf, replace the host * line with host 127.0.0.1, and then restart the munin-node service.

I have a dedicated user on each machine for handling remote connections to the local munin-node instance, which I set up like this:

# useradd -r -m -s /bin/bash -d /srv/munin-access munin-access
# su -l munin-access
$ mkdir -m 0700 .ssh
$ cat > .ssh/authorized_keys <<EOF
> restrict,command="/bin/false",port-forwarding,permitopen="localhost:4949" ssh-ed25519 AAAC...
> EOF

This uses authorized_keys access controls (described in detail in sshd(8)) to ensure that remote systems logging in as the local munin-access user can only request a port forwarding to the munin port over the loopback, and are not permitted to do anything else.

And that's it

One of the useful things which is enabled out of the box on Debian systems is a Munin graph for traffic and error rates on each node's network interfaces. If you set up munin-node on your home broadband router (assuming it runs Debian, of course), then you'll get a nice traffic graph on the ISP uplink interface which has sustained peaks whenever you spend time on long video-conference calls with friends, as I discovered.


Changelog