Solving a minor ZFS cron mystery

(Alternatively: non-standard system configurations have non-standard and possibly surprising behaviour.)

My laptop runs Debian, like most of my other computers. Unlike most of my other machines, my laptop’s root filesystem is ZFS. The reasons for this are mostly social, but a convenient side-effect has been the ability to perform filesystem-level snapshots, which provide a point-in-time looking glass into the past state of a particular filesystem (or dataset, really, to use the ZFS nomenclature).

There’s a handy script called zfs-auto-snapshot, intended to be called out of cron(8), which can be used to take and maintain periodic snapshots of ZFS datasets. It’s packaged in Debian, which is really convenient: apt-get install zfs-auto-snapshot, and the system will then keep a history of four 15-minutely snapshots, twenty-four hourly snapshots, thirty-one daily snapshots, eight weekly snapshots, and twelve monthly snapshots (by default every dataset in the system gets snapped, but this is configurable on a per-dataset basis). This has proven useful time and time again in situations where I delete something, close my laptop for the night, and then come back to it the following morning and then realise I still need the file which was deleted. There is some overhead in disk space usage, however, as anything which gets caught in a snapshot might hang around for a while, but thus far that seems like a reasonable compromise.

Earlier on today, I was attempting to restore a dotfile in my home directory used by this particular shell script, as I’d made some local modifications to the script which had caused it to corrupt its data file. None of the past day’s hourly snapshots had a non-corrupted version, and to my surprise, when I went to look for the previous day’s daily snapshot, I found that no such snapshot existed.

Listing all the daily snapshots of my home directory’s dataset gave me the following:

molly on flywheel ~> zfs list -t snapshot tank/home/molly | grep daily | sort
tank/home/molly@zfs-auto-snap_daily-2020-02-21-1430        0B      -     4.63G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-22-1124     87.3M      -     4.68G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-23-1256     41.9M      -     4.74G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-24-0638     40.9M      -     4.73G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-25-0405     45.7M      -     4.73G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-26-0939     51.3M      -     4.75G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-27-1421     58.8M      -     4.81G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-28-1859        0B      -     4.88G  -
tank/home/molly@zfs-auto-snap_daily-2020-02-29-1005     39.6M      -     4.87G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-01-1306     61.0M      -     4.87G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-02-0957     49.8M      -     4.89G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-03-1025     38.3M      -     4.89G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-04-1051     49.0M      -     4.89G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-07-1340      616K      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-08-1029     55.2M      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-09-0954     63.1M      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-10-1051     41.8M      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-11-1045     52.7M      -     4.85G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-12-1136     58.5M      -     4.81G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-13-1801     62.5M      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-14-1230      374K      -     4.81G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-15-1138     48.3M      -     4.81G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-18-1534      111M      -     4.83G  -
tank/home/molly@zfs-auto-snap_daily-2020-03-27-1522      678K      -     4.94G  -
tank/home/molly@zfs-auto-snap_daily-2020-04-01-1031      124M      -     4.95G  -
tank/home/molly@zfs-auto-snap_daily-2020-04-13-1007      632K      -     5.01G  -
tank/home/molly@zfs-auto-snap_daily-2020-04-30-2112        0B      -     6.32G  -
tank/home/molly@zfs-auto-snap_daily-2020-05-02-1552     88.0M      -     6.28G  -
tank/home/molly@zfs-auto-snap_daily-2020-05-22-2311        0B      -     6.30G  -
tank/home/molly@zfs-auto-snap_daily-2020-05-22-2340     31.5M      -     6.30G  -
tank/home/molly@zfs-auto-snap_daily-2020-05-28-1124     1.61M      -     5.99G  -

Some time around the 15th of March, regular daily snapshots stopped happening for some reason. This was obviously undesirable.

I then took a look at the cron(8) configuration on my laptop, to try and determine whether the snapshotting script was getting called correctly. The Debian package installs scripts which invoke zfs-auto-snapshot in various cron.<time period> directories in /etc, e.g. /etc/cron.daily. In the case of the latter, executable files in that directory will be run by cron once a day. I’m not sure whether these directories are a Debianism or not, but they’re handled by the system crontab(5); /etc/crontab contains the following lines:

17 *    * * *   root    cd / && run-parts --report /etc/cron.hourly
25 6    * * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6    * * 7   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6    1 * *   root    test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )

The important thing to note here is that if anacron(8) is installed, then cron(8) actually doesn’t handle these directories (despite having “cron” in their names). (It took me a bit too long to realise that just because there are lines in my laptop’s logs which look like CRON[2086304]: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) does not mean that the commands following the test are actually executed…)

anacron(8) is a bit like cron(8), except that it’s intended for systems which aren’t always on twenty-four hours a day, which may miss periodic tasks due to being powered off (e.g. late-night cronjobs on machines which are only usually powered on during working hours). anacron(8) stores timestamps for each job it runs, so that in the future it can work out when each job is overdue for another run, and then perform the job appropriately.

If one peruses /etc/anacrontab, one can see that it is indeed responsible for managing the execution of daily, weekly and monthly tasks (with a comment confirming that it takes priority over cron(8)):

# These replace cron's entries
1   5   cron.daily  run-parts --report /etc/cron.daily
7   10  cron.weekly run-parts --report /etc/cron.weekly
@monthly    15  cron.monthly    run-parts --report /etc/cron.monthly

One crucial point here is that anacron(8) itself is not a persistent daemon. My laptop is configured to run it once at system boot time, but thereafter anacron(8) is invoked periodically by cron(8), this time from /etc/cron.d/anacron:

# /etc/cron.d/anacron: crontab entries for the anacron package

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

30 7-23 * * *   root    [ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; \
    then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi

And here we have the smoking gun. If the machine has the System V-style RC script installed, and there’s no evidence of systemd running on the system, invoke anacron(8) through the script. The problem is that my laptop runs neither sysvinit nor systemd (it runs this and a few other bits and pieces instead). Attempting to invoke the script manually results in the following, without anacron(8) being executed:

root@flywheel:~# /usr/sbin/invoke-rc.d anacron start
invoke-rc.d: could not determine current runlevel

So this cronjob never worked in the first place. The next question is of course why there are any daily/monthly/weekly snapshots at all, and why did they stop happening regularly after the 15th of March?

The answer is (indirectly) Covid-19. Prior to the 15th, I would turn my laptop off at night when I was asleep, and then turn it on when I started using it the following day. Remember: my laptop is configured to run anacron(8) manually once on every boot. And the timestamps in the snapshot names are not regular, which matches the fact that my sleep cycle wasn’t very consistent around that time, and occasionally I wouldn’t need my laptop until the evenings.

Around the 15th, for reasons of the pandemic, I moved to a different flat, and stopped regularly going outside (first by choice and then by regulation), so my laptop became more of a static workstation, which I’d leave running overnight for several days at a time. You never get your boot-time ZFS snapshots if you don’t reboot.

This is all fixed if /etc/cron.d/anacron is amended to invoke anacron(8) directly.

Non-standard system configurations have non-standard and possibly surprising behaviour. And Unix arcana is arcane.