ZFS

From David's Wiki
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

How to use ZFS:

Background

There are three levels to understand

  • zpools are a JBOD of one or more vdevs
  • vdevs are groups of drives, likely in raidz (or raidz2, raidz3) or mirror.
  • datasets are filesystems stored on a zpool, similar to partitions
  • zvol is a virtual block device on a zpool without a filesystem

Usage

# Create a zpool with a mirror vdev.
zpool create -f -o ashift=12 -o compression=zstd $zpool_name mirror \
  ata-diskA \
  ata-diskB

# Create a dataset.
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase $zpool_name/$dataset_name
Notes
  • You should always use the id under /dev/disk/by-id/
    • E.g. /dev/disk/by-id/ata-diskA

Alerts

First setup postfix to send emails.
Then setup ZED notifications

Automatic Scrubs

By default, ZFS on Ubuntu will automatically scrub every month

Automatic Snapshots

See sanoid

zfs list -t snapshot

Caching

ZFS has two read caches:

  • ARC - this is enabled by default and uses half of your memory. This memory will be released if you approach out of memory.
  • L2ARC - you can enable additional caching by adding an L2ARC drive for ARC to overflow to.

For writes:

  • SLOG - A separate log, typically an SSD backed mirror to write the ZFS intent log (ZIL).

In general, you will want to use an Intel Optane SSD for caching as they're supposed to last longer and have less latency.
A 16GB Optane stick can be had for ~$12.

ARC

arc_summary or arcstat will show you the memory used by ARC. This does not appear in htop.

If you want to reduce arc memory usage, you can set limits by creating /etc/modprobe.d/zfs.conf:

/etc/modprobe.d/zfs.conf
# Set Max ARC size => 4GB == 4294967296 Bytes
options zfs zfs_arc_max=4294967296
# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824

L2ARC

L2ARC costs about 80 bytes per record. Historically, this used to be 320 bytes, but now it's mostly negligible.
At the default of 128K record size, 1 GiB has 8196 records, hence requiring approx 656 KiB of memory.
At 4K record size, you will need approx. 20 MB of RAM per GB.

To add an l2arc:

sudo zpool add $pool cache $device

SLOG

sudo zpool add $pool log $device
# or
# sudo zpool add $pool log mirror $device1 $device2

Expanding

You can only expand by adding vdevs or replacing all drives in a vdev with larger ones.
See [1]

After replacing all drives in a vdev, you need to run: sudo zpool online -e $pool $disk on any disk.

Pros and Cons

VS Snapraid + btrfs + mergerfs

Pros
  • ZFS has realtime parity.
  • ZFS can work while degraded.
  • ZFS snapshots with send and receive.
  • ZFS has encryption on per-dataset.
  • ZFS handles everything altogether including parity on permissions
Cons
  • The main con is that ZFS is less expandable.
    • You can only expand by replacing every drive or adding entire vdevs.
  • If many drives die, i.e. >2 for raidz2, you lose all your data.

Resources