Solving a minor ZFS cron mystery
(Alternatively: non-standard system configurations have non-standard and possibly surprising behaviour.)
My laptop runs Debian, like most of my other computers. Unlike most of my other machines, my laptop’s root filesystem is ZFS. The reasons for this are mostly social, but a convenient side-effect has been the ability to perform filesystem-level snapshots, which provide a point-in-time looking glass into the past state of a particular filesystem (or dataset, really, to use the ZFS nomenclature).
There’s a handy script called
zfs-auto-snapshot, intended
to be called out of cron(8)
, which can be used to take and maintain periodic
snapshots of ZFS datasets. It’s packaged in Debian, which is really convenient:
apt-get install zfs-auto-snapshot
, and the system will then keep a history
of four 15-minutely snapshots, twenty-four hourly snapshots, thirty-one daily
snapshots, eight weekly snapshots, and twelve monthly snapshots (by default
every dataset in the system gets snapped, but this is configurable on a
per-dataset basis). This has proven useful time and time again in situations
where I delete something, close my laptop for the night, and then come back
to it the following morning and then realise I still need the file which was
deleted. There is some overhead in disk space usage, however, as anything
which gets caught in a snapshot might hang around for a while, but thus far
that seems like a reasonable compromise.
Earlier on today, I was attempting to restore a dotfile in my home directory used by this particular shell script, as I’d made some local modifications to the script which had caused it to corrupt its data file. None of the past day’s hourly snapshots had a non-corrupted version, and to my surprise, when I went to look for the previous day’s daily snapshot, I found that no such snapshot existed.
Listing all the daily snapshots of my home directory’s dataset gave me the following:
molly on flywheel ~> zfs list -t snapshot tank/home/molly | grep daily | sort
tank/home/molly@zfs-auto-snap_daily-2020-02-21-1430 0B - 4.63G -
tank/home/molly@zfs-auto-snap_daily-2020-02-22-1124 87.3M - 4.68G -
tank/home/molly@zfs-auto-snap_daily-2020-02-23-1256 41.9M - 4.74G -
tank/home/molly@zfs-auto-snap_daily-2020-02-24-0638 40.9M - 4.73G -
tank/home/molly@zfs-auto-snap_daily-2020-02-25-0405 45.7M - 4.73G -
tank/home/molly@zfs-auto-snap_daily-2020-02-26-0939 51.3M - 4.75G -
tank/home/molly@zfs-auto-snap_daily-2020-02-27-1421 58.8M - 4.81G -
tank/home/molly@zfs-auto-snap_daily-2020-02-28-1859 0B - 4.88G -
tank/home/molly@zfs-auto-snap_daily-2020-02-29-1005 39.6M - 4.87G -
tank/home/molly@zfs-auto-snap_daily-2020-03-01-1306 61.0M - 4.87G -
tank/home/molly@zfs-auto-snap_daily-2020-03-02-0957 49.8M - 4.89G -
tank/home/molly@zfs-auto-snap_daily-2020-03-03-1025 38.3M - 4.89G -
tank/home/molly@zfs-auto-snap_daily-2020-03-04-1051 49.0M - 4.89G -
tank/home/molly@zfs-auto-snap_daily-2020-03-07-1340 616K - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-08-1029 55.2M - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-09-0954 63.1M - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-10-1051 41.8M - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-11-1045 52.7M - 4.85G -
tank/home/molly@zfs-auto-snap_daily-2020-03-12-1136 58.5M - 4.81G -
tank/home/molly@zfs-auto-snap_daily-2020-03-13-1801 62.5M - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-14-1230 374K - 4.81G -
tank/home/molly@zfs-auto-snap_daily-2020-03-15-1138 48.3M - 4.81G -
tank/home/molly@zfs-auto-snap_daily-2020-03-18-1534 111M - 4.83G -
tank/home/molly@zfs-auto-snap_daily-2020-03-27-1522 678K - 4.94G -
tank/home/molly@zfs-auto-snap_daily-2020-04-01-1031 124M - 4.95G -
tank/home/molly@zfs-auto-snap_daily-2020-04-13-1007 632K - 5.01G -
tank/home/molly@zfs-auto-snap_daily-2020-04-30-2112 0B - 6.32G -
tank/home/molly@zfs-auto-snap_daily-2020-05-02-1552 88.0M - 6.28G -
tank/home/molly@zfs-auto-snap_daily-2020-05-22-2311 0B - 6.30G -
tank/home/molly@zfs-auto-snap_daily-2020-05-22-2340 31.5M - 6.30G -
tank/home/molly@zfs-auto-snap_daily-2020-05-28-1124 1.61M - 5.99G -
Some time around the 15th of March, regular daily snapshots stopped happening for some reason. This was obviously undesirable.
I then took a look at the cron(8)
configuration on my laptop, to try and
determine whether the snapshotting script was getting called correctly. The
Debian package installs scripts which invoke zfs-auto-snapshot
in various
cron.<time period>
directories in /etc
, e.g. /etc/cron.daily
. In the
case of the latter, executable files in that directory will be run by cron once
a day. I’m not sure whether these directories are a Debianism or not, but
they’re handled by the system crontab(5)
; /etc/crontab
contains the
following lines:
17 * * * * root cd / && run-parts --report /etc/cron.hourly
25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly )
52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly )
The important thing to note here is that if anacron(8)
is installed, then
cron(8)
actually doesn’t handle these directories (despite having “cron”
in their names). (It took me a bit too long to realise that just because
there are lines in my laptop’s logs which look like CRON[2086304]: (root) CMD
(test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ))
does not mean that the commands following the test are actually executed…)
anacron(8)
is a bit like cron(8)
, except that it’s intended for systems
which aren’t always on twenty-four hours a day, which may miss periodic tasks
due to being powered off (e.g. late-night cronjobs on machines which are
only usually powered on during working hours). anacron(8)
stores timestamps
for each job it runs, so that in the future it can work out when each job is
overdue for another run, and then perform the job appropriately.
If one peruses /etc/anacrontab
, one can see that it is indeed responsible
for managing the execution of daily, weekly and monthly tasks (with a comment
confirming that it takes priority over cron(8)
):
# These replace cron's entries
1 5 cron.daily run-parts --report /etc/cron.daily
7 10 cron.weekly run-parts --report /etc/cron.weekly
@monthly 15 cron.monthly run-parts --report /etc/cron.monthly
One crucial point here is that anacron(8)
itself is not a persistent
daemon. My laptop is configured to run it once at system boot time, but
thereafter anacron(8)
is invoked periodically by cron(8)
, this time from
/etc/cron.d/anacron
:
# /etc/cron.d/anacron: crontab entries for the anacron package
SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
30 7-23 * * * root [ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; \
then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi
And here we have the smoking gun. If the machine has the System
V-style RC script installed, and there’s no evidence of systemd
running on the system, invoke anacron(8)
through the script. The
problem is that my laptop runs neither sysvinit
nor systemd
(it runs
this and a few other bits and
pieces instead). Attempting to invoke the script manually results in the
following, without anacron(8)
being executed:
root@flywheel:~# /usr/sbin/invoke-rc.d anacron start
invoke-rc.d: could not determine current runlevel
So this cronjob never worked in the first place. The next question is of course why there are any daily/monthly/weekly snapshots at all, and why did they stop happening regularly after the 15th of March?
The answer is (indirectly) Covid-19. Prior to the 15th, I would turn my laptop
off at night when I was asleep, and then turn it on when I started using it the
following day. Remember: my laptop is configured to run anacron(8)
manually
once on every boot. And the timestamps in the snapshot names are not regular,
which matches the fact that my sleep cycle wasn’t very consistent around that
time, and occasionally I wouldn’t need my laptop until the evenings.
Around the 15th, for reasons of the pandemic, I moved to a different flat, and stopped regularly going outside (first by choice and then by regulation), so my laptop became more of a static workstation, which I’d leave running overnight for several days at a time. You never get your boot-time ZFS snapshots if you don’t reboot.
This is all fixed if /etc/cron.d/anacron
is amended to invoke anacron(8)
directly.
Non-standard system configurations have non-standard and possibly surprising behaviour. And Unix arcana is arcane.