Install Arch Linux on ZFS

From ArchWiki

This article details the steps required to install Arch Linux onto a ZFS root filesystem.

Warning: Blindly copying and pasting this wiki will not work. It is necessary to take the time to understand the boot process, and what is done when creating the pool and datasets. Here are some useful links:

Installation

To install Arch Linux on ZFS, you need to boot archiso system with ZFS module.

Get ZFS module on archiso system

A script to easily install and load the ZFS module on running archiso system. It should work on any archiso version.

See eoli3n/archiso-zfs.

Embedding ZFS module into custom archiso

To build a custom archiso, see ZFS article.

Partition the destination drive

Review Partitioning for information on determining the partition table type to use for ZFS. ZFS supports GPT and MBR partition tables.

ZFS manages its own partitions, so only a basic partition table scheme is required. The partition that will contain the ZFS filesystem should be of the type bf00, or "Solaris Root".

Drives larger than 2TB require a GPT partition table. GRUB on BIOS/GPT configurations require a small (1~2MiB) BIOS boot partition to embed its image of boot code.

Depending upon your machine's firmware or your choice of boot mode, booting may or may not require an EFI partition. On a BIOS machine (or a UEFI machine booting in legacy mode) EFI partition is not required. Consult Arch boot process#Boot loader for more information.

Partition scheme

Here is an example of a basic partition scheme that could be employed for your ZFS root install on a BIOS/MBR installation using GRUB:

Part     Size   Type
----     ----   -------------------------
   1     XXXG   Solaris Root (bf00)

Using GRUB on a BIOS (or UEFI machine in legacy boot mode) machine but using a GPT partition table:

Part     Size   Type
----     ----   -------------------------
   1       2M   BIOS boot partition (ef02)
   2     XXXG   Solaris Root (bf00)

Another example, this time using a UEFI-specific bootloader (such as rEFInd) with an GPT partition table:

Part     Size   Type
----     ----   -------------------------
   1     100M   EFI boot partition (ef00)
   2     XXXG   Solaris Root (bf00)

ZFS does not support swap files. If you require a swap partition, see ZFS#Swap volume for creating a swap ZVOL.

Tip: Bootloaders with support for ZFS are described in #Install and configure the bootloader.
Warning: Several GRUB bugs (bug #42861, zfsonlinux/grub/issues/5) complicate installing it on ZFS partitions, see #Install and configure the bootloader for a workaround

Example parted commands

Here are some example commands to partition a drive for the second scenario above ie using BIOS/legacy boot mode with a GPT partition table and a (slighty more than) 1MB BIOS boot partition for GRUB:

# parted /dev/sdx
(parted)mklabel gpt
(parted)mkpart non-fs 0% 2
(parted)mkpart primary 2 100%
(parted)set 1 bios_grub on
(parted)set 2 boot on
(parted)quit

You can achieve the above in a single command like so:

# parted --script /dev/sdx mklabel gpt mkpart non-fs 0% 2 mkpart primary 2 100% set 1 bios_grub on set 2 boot on

If you are creating an EFI partition then that should have the boot flag set instead of the root partition.

Format the destination disk

If you have opted for a boot partition as well as any other non-ZFS system partitions then format them. Do not do anything to the Solaris partition nor to the BIOS boot partition. ZFS will manage the first, and your bootloader the second.

Setup the ZFS filesystem

Warning: Do not use '-' in the names of your datasets. (see this "feature")

First, make sure the ZFS modules are loaded,

# modprobe zfs

Create the root zpool

Create your pool and set all default dataset options. All dataset created on the zpool will inherit of each -O set at the zpool creation. Default options are detailed in Debian Buster Root on ZFS. Step 2: Disk Formatting.

Note: Use -o ashift=9 for disks with a 512 byte physical sector size or -o ashift=12 for disks with a 4096 byte physical sector size. See lsblk -S -o NAME,PHY-SEC to get the physical sector size of each SCSI/SATA disk. Remove -S if you want the same value from all devices.
Warning: Keep in mind that most modern devices use a 4096 byte physical sector size, even though some report 512. This is especially true for SSDs. Selecting ashift=9 on a 4096 byte sector size (even if it reports 512) will incur a performance penalty. Selecting ashift=12 on a 512 byte sector size may incur in a capacity penalty, but no performance penalty. If in doubt, for a modern drive, err on the side of ashift=12, or research your particular device for the appropriate value. Refer to OpenZFS issue #967 for a related discussion, and OpenZFS issue #2497 for a consequence of a higher ashift value.
# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=legacy       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             zroot /dev/disk/by-id/id-to-partition-partx

Compression and native encryption

This will enable compression and native encryption by default on all datasets:

# zpool create -f -o ashift=12         \
             -O acltype=posixacl       \
             -O relatime=on            \
             -O xattr=sa               \
             -O dnodesize=legacy       \
             -O normalization=formD    \
             -O mountpoint=none        \
             -O canmount=off           \
             -O devices=off            \
             -R /mnt                   \
             -O compression=lz4        \
             -O encryption=aes-256-gcm \
             -O keyformat=passphrase   \
             -O keylocation=prompt     \
             zroot /dev/disk/by-id/id-to-partition-partx
Warning:
  • Always use id names when working with ZFS, otherwise import errors will occur.
  • GRUB users should keep in mind that the zpool-create command normally enables all features, some of which may not be supported by GRUB. See: ZFS#GRUB-compatible pool creation.

Create your datasets

Instead of using conventional disk partitions, ZFS has the concept of datasets to manage your storage. Unlike disk partitions, datasets have no fixed size and allow for different attributes, such as compression, to be applied per dataset. Normal ZFS datasets are mounted automatically by ZFS whilst legacy datasets are required to be mounted using fstab or with the traditional mount command.

One of the most useful features of ZFS is boot environments. Boot environments allow you to create a bootable snapshot of your system that you can revert to at any time instantly by simply rebooting and booting from that boot environment. This can make doing system updates much safer and is also incredibly useful for developing and testing software. In order to be able to use a boot environment manager such as beadm, zectlAUR (systemd-boot), or zedenvAUR (GRUB) to manage boot environments, your datasets must be configured properly. Key to this are that you split your data directories (such as /home) into datasets that are distinct from your system datasets and that you do not place data in the root of the pool as this cannot be moved afterwards.

You should always create a dataset for at least your root filesystem and in nearly all cases you will also want /home to be in a separate dataset. You may decide you want your logs to persist over boot environments. If you are a running any software that stores data outside of /home (such as is the case for database servers) you should structure your datasets so that the data directories of the software you want to run are separated out from the root dataset.

With these example commands, we will create a basic boot environment compatible configuration comprising of just root and /home datasets. It inherits default options from zpool creation.

# zfs create -o mountpoint=none zroot/data
# zfs create -o mountpoint=none zroot/ROOT
# zfs create -o mountpoint=/ -o canmount=noauto zroot/ROOT/default
# zfs create -o mountpoint=/home zroot/data/home

You can also create your ROOT dataset without having to specify mountpoint to / since GRUB will mount it to / anyway. That gives you possibility to boot into some old versions of root just by cloning it and putting as menuentry of GRUB. In such, you can create ROOT with the following command:

# zfs create -o mountpoint=/roots/default zroot/ROOT/default

You can store /root in your zroot/data/home dataset.

# zfs create -o mountpoint=/root zroot/data/home/root

You will need to enable some options for datasets which hold specific directories:

Options required by specific directories
Directory Dataset option Details
/ canmount=noauto
/var/log/journal acltype=posixacl Systemd#systemd-tmpfiles-setup.service fails to start at boot

System datasets

To create datasets for system directories, use canmount=off.

For some examples, please read Debian-Buster-Root-on-ZFS#step-3-system-installation

# zfs create -o mountpoint=/var -o canmount=off     zroot/var
# zfs create                                        zroot/var/log
# zfs create -o mountpoint=/var/lib -o canmount=off zroot/var/lib
# zfs create                                        zroot/var/lib/libvirt
# zfs create                                        zroot/var/lib/docker

Export/Import your pools

To validate your configurations, export then reimport all your zpools.

Warning: Do not skip this, otherwise you will be required to use -f when importing your pools. This unloads the imported pool.
Note: This might fail if you added a swap partition. You need to turn it off with the swapoff command.
# zpool export zroot
# zpool import -d /dev/disk/by-id -R /mnt zroot -N
Note: -d is not the actual device ID, but the /dev/by-id directory containing the symbolic links.

If this command fails and you are asked to import your pool via its numeric ID, run zpool import to find out the ID of your pool then use a command such as:

# zpool import 9876543212345678910 -R /mnt zroot

If you used native encryption, load zfs key.

# zfs load-key zroot

Manually mount your rootfs dataset because it uses canmount=noauto, then mount all others datasets.

# zfs mount zroot/ROOT/default
# zfs mount -a

The ZFS filesystem is now ready to use.

Configure the root filesystem

If you used legacy datasets, it must be listed in /etc/fstab.

Set the bootfs property on the descendant root filesystem so the boot loader knows where to find the operating system.

# zpool set bootfs=zroot/ROOT/default zroot

Be sure to bring the zpool.cache file into your new system. This is required later for the ZFS daemon to start.

# cp /etc/zfs/zpool.cache /mnt/etc/zfs/zpool.cache

if you do not have /etc/zfs/zpool.cache, create it:

# zpool set cachefile=/etc/zfs/zpool.cache zroot

Install and configure Arch Linux

Follow the following steps using the Installation guide. It will be noted where special consideration must be taken for ZFSonLinux.

  • First mount any legacy or non-ZFS boot or system partitions using the mount command.
  • Install the base system.
  • The procedure described in Installation guide#Fstab is usually overkill for ZFS. ZFS usually auto mounts its own partitions, so we do not need ZFS partitions in fstab file, unless the user made legacy datasets of system directories. To generate the fstab for filesystems, use:
# genfstab -U -p /mnt >> /mnt/etc/fstab
# arch-chroot /mnt
  • Edit the /etc/fstab:
Note:
  • If you chose to create legacy datasets for system directories, keep them in this fstab!
  • Comment out all non-legacy datasets apart from the swap file and the boot/EFI partition. It is a convention to replace the swap's uuid with /dev/zvol/zroot/swap.
  • You need to add the Arch ZFS repository to /etc/pacman.conf, sign its key and install zfs-linux (or zfs-linux-lts if you are running the LTS kernel) within the arch-chroot before you can update the ramdisk with ZFS support.
  • When creating the initial ramdisk, first edit /etc/mkinitcpio.conf and add zfs before filesystems. Also, move keyboard hook before zfs so you can type in console if something goes wrong. You may also remove fsck (if you are not using Ext3 or Ext4). Your HOOKS line should look something like this:
HOOKS=(base udev autodetect modconf block keyboard zfs filesystems)

When using systemd in the initrd, you need to install mkinitcpio-sd-zfsAUR and add the sd-zfs hook after the systemd hook instead of the zfs hook. Keep in mind that this hook uses different kernel parameters than the default zfs hook, more information can be found at the project page.

Note: sd-zfs does not support native encryption yet dasJ/sd-zfs/issues/4.
Note:
  • If you are using a separate dataset for /usr and have followed the instructions below, you must make sure you have the usr hook enabled after zfs, or your system will not boot.
  • When you generate the initramfs, the zpool.cache is copied into the initrd. If you did not generate it before, or needed to regenerate it, remember to regenerate the initramfs again.
  • You can also use legacy mountpoint to let fstab mount it

Install and configure the bootloader

Using GRUB for EFI/BIOS

If you use GRUB, you can store your /boot on a zpool. Please read Debian-Buster-Root-on-ZFS#step-3-system-installation.

Install GRUB onto your disk as instructed here: GRUB#BIOS systems or GRUB#UEFI systems. The GRUB manual provides detailed information on manually configuring the software which you can supplement with GRUB and GRUB/Tips and tricks.

bug: broken root pool detection

Because of a known bug, grub-mkconfig will fail to detect the root pool and omit in /boot/grub/grub.cfg. Until this is fixed, there are two possible workarounds:

  • Workaround A: Modify code for rpool detection in /etc/grub.d/10_linux. Replace
rpool=`${grub_probe} --device ${GRUB_DEVICE} --target=fs_label 2>/dev/null || true`
with
rpool=`zdb -l ${GRUB_DEVICE} | grep " name:" | cut -d\' -f2`
and
LINUX_ROOT_DEVICE="ZFS=${rpool}${bootfs%/}"
with
LINUX_ROOT_DEVICE="zfs:${rpool}${bootfs%/}"
This usually can detect the correct root pool name and write working path to /boot/grub/grub.cfg any time grub-mkconfig is used.
  • Workaround B: If the above solution cannot detect the correct path, you can hardcode it in /etc/grub.d/10_linux. Replace
linux   ${rel_dirname}/${basename} root=${linux_root_device_thisversion} rw ${args}
with
linux   ${rel_dirname}/${basename} root=zfs:zroot/ROOT/default rw ${args}

error: failed to get canonical path of

grub-mkconfig fails to properly generate entries for systems hosted on ZFS.

# grub-mkconfig -o /boot/grub/grub.cfg
/usr/bin/grub-probe: error: failed to get canonical path of `/dev/bus-Your_Disk_ID-part#'
grub-install: error: failed to get canonical path of `/dev/bus-Your_Disk_ID-part#'

To work around this you must set this environment variable: ZPOOL_VDEV_NAME_PATH=1. For example:

# ZPOOL_VDEV_NAME_PATH=1 grub-mkconfig -o /boot/grub/grub.cfg

error: unknown filesystem

GRUB tools like grub-probe or grub-install may fail with the error unknown filesystem when filesystem detection fails. This may happen due to the filesystem not being supported by GRUB, or in the case of ZFS, unsupported features may be present (refer to ZFS#GRUB-compatible pool creation for appropriate features to include in a boot zpool.)

In order to troubleshoot the error, understand which filesystem it is failing to identify (e.g. run grub-probe on the suspects, like grub-probe / or grub-probe /boot). An example interaction follows:

# grub-probe /boot
zfs

# grub-probe /
grub-probe: error: unknown filesystem.

After identifying the problem filesystem, run grub-probe -vvvv / and scan the output for the filesystem it was expected to identify. In this case, ZFS was expected, but the following output was generated:

grub-probe -vvvv /
(...)
grub-core/kern/fs.c:56: Detecting zfs...
grub-core/osdep/hostdisk.c:420: opening the device `/dev/sda4' in open_device()
grub-core/fs/zfs/zfs.c:1199: label ok 0
grub-core/osdep/hostdisk.c:399: reusing open device `/dev/sda4'
grub-core/fs/zfs/zfs.c:1014: check 2 passed
grub-core/fs/zfs/zfs.c:1025: check 3 passed
grub-core/fs/zfs/zfs.c:1032: check 4 passed
grub-core/fs/zfs/zfs.c:1042: check 6 passed
grub-core/fs/zfs/zfs.c:1050: check 7 passed
grub-core/fs/zfs/zfs.c:1061: check 8 passed
grub-core/fs/zfs/zfs.c:1071: check 9 passed
grub-core/fs/zfs/zfs.c:1093: check 11 passed
grub-core/fs/zfs/zfs.c:1119: check 10 passed
grub-core/fs/zfs/zfs.c:1135: str=com.delphix:hole_birth
grub-core/fs/zfs/zfs.c:1135: str=com.delphix:embedded_data
grub-core/fs/zfs/zfs.c:1144: check 12 passed (feature flags)
grub-core/fs/zfs/zfs.c:1884: zio_read: E 0: size 4096/4096
(...)
grub-core/osdep/hostdisk.c:399: reusing open device `/dev/sda4'
grub-core/fs/zfs/zfs.c:2117: zap: name = com.delphix:extensible_dataset, value = 18, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = com.datto:bookmark_v2, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = com.datto:encryption, value = c, cd = 0 # <------------------
grub-core/kern/fs.c:78: zfs detection failed.  # <----------------------------------------------------
grub-core/kern/fs.c:56: Detecting xfs...
grub-core/fs/xfs.c:931: Reading sb
grub-core/fs/xfs.c:270: Validating superblock
grub-core/kern/fs.c:78: xfs detection failed.
grub-core/kern/fs.c:56: Detecting ufs2...
(...)
grub-core/kern/fs.c:56: Detecting affs...
grub-core/kern/fs.c:78: affs detection failed.
grub-probe: error: unknown filesystem.

This shows that ZFS detection went well until the com.datto:encryption feature was detected. Since ZFS Native Encryption is not supported by GRUB (as of August 2021), detection of ZFS failed. A second, GRUB-compatible zpool may be appropriate to boot into an encrypted system - as of August 2021, this is the recommended approach (refer to the relevant OpenZFS project page).

A successful execution of grub-probe on a GRUB-compatible zpool looks like this:

grub-probe -vvvv /boot
(...)
grub-core/osdep/hostdisk.c:399: reusing open device `/dev/sda3'
grub-core/fs/zfs/zfs.c:2117: zap: name = com.delphix:extensible_dataset, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = com.delphix:embedded_data, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = com.delphix:hole_birth, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = org.open-zfs:large_blocks, value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = org.illumos:lz4_compress, value = 1, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:2117: zap: name = , value = 0, cd = 0
grub-core/fs/zfs/zfs.c:3285: alive
(...)
grub-core/fs/zfs/zfs.c:1906: endian = 1
grub-core/fs/zfs/zfs.c:597: dva=8, 20008
grub-core/fs/zfs/zfs.c:2697: alive
zfs

Booting your kernel and initrd from ZFS

You may skip this section if you have your kernel and initrd on a separate /boot partition using something like ext4 or vfat.

Otherwise grub needs to load your kernel and initrd are from a ZFS dataset the kernel and initrd paths have to be in the following format:

/dataset/@/actual/path

Example with Arch installed on the root dataset:

/boot/grub/grub.cfg
set timeout=5
set default=0

menuentry "Arch Linux" {
    search -u UUID
    linux /@/boot/vmlinuz-linux zfs=zroot rw
    initrd /@/boot/initramfs-linux.img
}

Example with Arch installed on a nested dataset:

/boot/grub/grub.cfg
set timeout=5
set default=0

menuentry "Arch Linux" {
    search -u UUID
    linux /ROOT/default/@/boot/vmlinuz-linux zfs=zroot/ROOT/default rw
    initrd /ROOT/default/@/boot/initramfs-linux.img
}

Booting your kernel and initrd from separate boot partition

Example with a separate non-ZFS /boot partition and Arch installed on a nested dataset:

/boot/grub/grub.cfg
set timeout=5
set default=0

menuentry "Arch Linux" {
    search -u UUID
    linux /vmlinuz-linux zfs=zroot/ROOT/default rw
    initrd /initramfs-linux.img
}

Using systemd-boot for EFI only

Systemd-boot cannot open ZFS zpools, you must store your /boot on a separated VFAT or ext4 partition.

Note: To be able to manage your Boot Environments with zectlAUR, follow zectl/docs/plugins/systemdboot.md.

Install bootloader on your esp, following Systemd-boot#Installing the EFI boot manager.

Create a boot entry:

/efi/loader/entries/archlinux.conf
title           Arch Linux
linux           vmlinuz-linux
initrd          intel-ucode.img
initrd          initramfs-linux.img
options         zfs=zroot/ROOT/default rw

Using rEFInd for UEFI

Use EFISTUB and rEFInd for the UEFI boot loader. The kernel parameters in refind_linux.conf for ZFS should include zfs=bootfs or zfs=zroot so the system can boot from ZFS. The root and rootfstype parameters are not needed.

Configure systemd ZFS mounts

For your system to be able to reboot without issues, you need to enable the zfs.target to auto mount the pools and set the hostid.

Note: The instructions in this section assume you are still in arch-chroot

For each pool you want automatically mounted execute:

# zpool set cachefile=/etc/zfs/zpool.cache <pool>

Enable zfs.target

In order to mount zfs pools automatically on boot you need to enablezfs-import-cache.service, zfs-mount.service and zfs-import.target.

When running ZFS on root, the machine's hostid will not be available at the time of mounting the root filesystem. There are two solutions to this. You can either place your spl hostid in the kernel parameters in your boot loader. For example, adding spl.spl_hostid=0x00bab10c, to get your number use the hostid command.

The other, and suggested, solution is to make sure that there is a hostid in /etc/hostid, and then regenerate the initramfs image which will copy the hostid into the initramfs image. To write the hostid file safely you need to use the zgenhostid command.

To use the libc-generated hostid (recommended):

# zgenhostid $(hostid)

To use a custom hostid (must be hexadecimal and 8 characters long):

# zgenhostid deadbeef

To let the tool generate a hostid:

# zgenhostid

Do not forget to regenerate your image using mkinitcpio.

Unmount and restart

We are almost done!

# umount /mnt/boot (if you have a legacy boot partition)
# zfs umount -a
# zpool export zroot

Now reboot.

Warning: If you do not properly export the zpool, the pool will refuse to import in the ramdisk environment and you will be stuck at the busybox terminal.

Loading password from USB-Stick

It is possible to store password on usb-stick and load it when booting:

Save password on first bytes of usb-stick:

# dd if=your_password_file bs=32 count=1 of=/dev/disk/by-id/usb_stick

To create partition zfs partition you can either use previous described method with password prompt or pipe with dd:

# dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs create -o encryption=on -o keyformat=passphrase zroot/ROOT

Next step is modyfing zfs hook. By default zfs prompts for password. You have to change it to have it piped with dd from your pendrive. In order to do so modify /usr/lib/initcpio/hooks/zfs and change line:

# ! eval zfs load-key "${encryptionroot}"; do

to:

# ! eval dd if=/dev/disk/by-id/usb_stick bs=32 count=1 | zfs load-key "${encryptionroot}"; do

You are modifying your zfs hook so do not forget to regenerate your image using mkinitcpio. Now zfs should load password from your usb-stick on boot.

Troubleshooting

System fails to boot due to: cannot import zroot: no such pool available

You can try the following steps and see if they can help.

  • Use the kernel modules from the archzfs repo instead of the dkms version. You can go back to the dkms version after a sucessfull boot.
  • Remove the /etc/zfs/zpool.cache and run:
# zpool set cachefile=none zroot
  • Remove the /etc/hostid.
  • Rebuild your initramfs.

See also