Solving a minor ZFS cron mystery
(Alternatively: non-standard system configurations have non-standard and possibly surprising behaviour.)
My laptop runs Debian, like most of my other computers. Unlike most of my other machines, my laptop's root filesystem is ZFS. The reasons for this are mostly social, but a convenient side-effect has been the ability to perform filesystem-level snapshots, which provide a point-in-time looking glass into the past state of a particular filesystem (or dataset, really, to use the ZFS nomenclature).
There's a handy script called zfs-auto-snapshot, intended to be called out of
cron(8), which can be used to take and maintain periodic snapshots of ZFS datasets. It's packaged in Debian, which is really convenient:
apt-get install zfs-auto-snapshot, and the system will then keep a history of four 15-minutely snapshots, twenty-four hourly snapshots, thirty-one daily snapshots, eight weekly snapshots, and twelve monthly snapshots (by default every dataset in the system gets snapped, but this is configurable on a per-dataset basis). This has proven useful time and time again in situations where I delete something, close my laptop for the night, and then come back to it the following morning and then realise I still need the file which was deleted. There is some overhead in disk space usage, however, as anything which gets caught in a snapshot might hang around for a while, but thus far that seems like a reasonable compromise.
Earlier on today, I was attempting to restore a dotfile in my home directory used by this particular shell script, as I'd made some local modifications to the script which had caused it to corrupt its data file. None of the past day's hourly snapshots had a non-corrupted version, and to my surprise, when I went to look for the previous day's daily snapshot, I found that no such snapshot existed.
Listing all the daily snapshots of my home directory's dataset gave me the following:
molly on flywheel ~> zfs list -t snapshot tank/home/molly | grep daily | sort tank/home/molly@zfs-auto-snap_daily-2020-02-21-1430 0B - 4.63G - tank/home/molly@zfs-auto-snap_daily-2020-02-22-1124 87.3M - 4.68G - tank/home/molly@zfs-auto-snap_daily-2020-02-23-1256 41.9M - 4.74G - tank/home/molly@zfs-auto-snap_daily-2020-02-24-0638 40.9M - 4.73G - tank/home/molly@zfs-auto-snap_daily-2020-02-25-0405 45.7M - 4.73G - tank/home/molly@zfs-auto-snap_daily-2020-02-26-0939 51.3M - 4.75G - tank/home/molly@zfs-auto-snap_daily-2020-02-27-1421 58.8M - 4.81G - tank/home/molly@zfs-auto-snap_daily-2020-02-28-1859 0B - 4.88G - tank/home/molly@zfs-auto-snap_daily-2020-02-29-1005 39.6M - 4.87G - tank/home/molly@zfs-auto-snap_daily-2020-03-01-1306 61.0M - 4.87G - tank/home/molly@zfs-auto-snap_daily-2020-03-02-0957 49.8M - 4.89G - tank/home/molly@zfs-auto-snap_daily-2020-03-03-1025 38.3M - 4.89G - tank/home/molly@zfs-auto-snap_daily-2020-03-04-1051 49.0M - 4.89G - tank/home/molly@zfs-auto-snap_daily-2020-03-07-1340 616K - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-08-1029 55.2M - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-09-0954 63.1M - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-10-1051 41.8M - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-11-1045 52.7M - 4.85G - tank/home/molly@zfs-auto-snap_daily-2020-03-12-1136 58.5M - 4.81G - tank/home/molly@zfs-auto-snap_daily-2020-03-13-1801 62.5M - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-14-1230 374K - 4.81G - tank/home/molly@zfs-auto-snap_daily-2020-03-15-1138 48.3M - 4.81G - tank/home/molly@zfs-auto-snap_daily-2020-03-18-1534 111M - 4.83G - tank/home/molly@zfs-auto-snap_daily-2020-03-27-1522 678K - 4.94G - tank/home/molly@zfs-auto-snap_daily-2020-04-01-1031 124M - 4.95G - tank/home/molly@zfs-auto-snap_daily-2020-04-13-1007 632K - 5.01G - tank/home/molly@zfs-auto-snap_daily-2020-04-30-2112 0B - 6.32G - tank/home/molly@zfs-auto-snap_daily-2020-05-02-1552 88.0M - 6.28G - tank/home/molly@zfs-auto-snap_daily-2020-05-22-2311 0B - 6.30G - tank/home/molly@zfs-auto-snap_daily-2020-05-22-2340 31.5M - 6.30G - tank/home/molly@zfs-auto-snap_daily-2020-05-28-1124 1.61M - 5.99G -
Some time around the 15th of March, regular daily snapshots stopped happening for some reason. This was obviously undesirable.
I then took a look at the
cron(8) configuration on my laptop, to try and determine whether the snapshotting script was getting called correctly. The Debian package installs scripts which invoke
zfs-auto-snapshot in various
cron.<time period> directories in
/etc/cron.daily. In the case of the latter, executable files in that directory will be run by cron once a day. I'm not sure whether these directories are a Debianism or not, but they're handled by the system
/etc/crontab contains the following lines:
17 * * * * root cd / && run-parts --report /etc/cron.hourly 25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) 47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ) 52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
The important thing to note here is that if
anacron(8) is installed, then
cron(8) actually doesn't handle these directories (despite having "cron" in their names). (It took me a bit too long to realise that just because there are lines in my laptop's logs which look like
CRON: (root) CMD (test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )) does not mean that the commands following the test are actually executed...)
anacron(8) is a bit like
cron(8), except that it's intended for systems which aren't always on twenty-four hours a day, which may miss periodic tasks due to being powered off (e.g. late-night cronjobs on machines which are only usually powered on during working hours).
anacron(8) stores timestamps for each job it runs, so that in the future it can work out when each job is overdue for another run, and then perform the job appropriately.
If one peruses
/etc/anacrontab, one can see that it is indeed responsible for managing the execution of daily, weekly and monthly tasks (with a comment confirming that it takes priority over
# These replace cron's entries 1 5 cron.daily run-parts --report /etc/cron.daily 7 10 cron.weekly run-parts --report /etc/cron.weekly @monthly 15 cron.monthly run-parts --report /etc/cron.monthly
One crucial point here is that
anacron(8) itself is not a persistent daemon. My laptop is configured to run it once at system boot time, but thereafter
anacron(8) is invoked periodically by
cron(8), this time from
# /etc/cron.d/anacron: crontab entries for the anacron package SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin 30 7-23 * * * root [ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; \ then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi
And here we have the smoking gun. If the machine has the System V-style RC script installed, and there's no evidence of
systemd running on the system, invoke
anacron(8) through the script. The problem is that my laptop runs neither
systemd (it runs this and a few other bits and pieces instead). Attempting to invoke the script manually results in the following, without
anacron(8) being executed:
root@flywheel:~# /usr/sbin/invoke-rc.d anacron start invoke-rc.d: could not determine current runlevel
So this cronjob never worked in the first place. The next question is of course why there are any daily/monthly/weekly snapshots at all, and why did they stop happening regularly after the 15th of March?
The answer is (indirectly) Covid-19. Prior to the 15th, I would turn my laptop off at night when I was asleep, and then turn it on when I started using it the following day. Remember: my laptop is configured to run
anacron(8) manually once on every boot. And the timestamps in the snapshot names are not regular, which matches the fact that my sleep cycle wasn't very consistent around that time, and occasionally I wouldn't need my laptop until the evenings.
Around the 15th, for reasons of the pandemic, I moved to a different flat, and stopped regularly going outside (first by choice and then by regulation), so my laptop became more of a static workstation, which I'd leave running overnight for several days at a time. You never get your boot-time ZFS snapshots if you don't reboot.
This is all fixed if
/etc/cron.d/anacron is amended to invoke
Non-standard system configurations have non-standard and possibly surprising behaviour. And Unix arcana is arcane.