ZFS: Difference between revisions

From David's Wiki
 
(13 intermediate revisions by the same user not shown)
Line 4: Line 4:
There are three levels to understand
There are three levels to understand
* zpools are a JBOD of one or more vdevs
* zpools are a JBOD of one or more vdevs
* vdevs are groups of drives, likely in raidz(or raidz2, raidz3) or mirror.
* vdevs are groups of drives, likely in raidz (or raidz2, raidz3) or mirror.
* datasets are filesystems stored on a zpool, similar to partitions
* datasets are filesystems stored on a zpool, similar to partitions
* zvol is a virtual block device on a zpool without a filesystem
* zvol is a virtual block device on a zpool without a filesystem
Line 11: Line 11:
<pre>
<pre>
# Create a zpool with a mirror vdev.
# Create a zpool with a mirror vdev.
zpool create -f -o ashift=12 -o compression=lz4 $zpool_name mirror \
zpool create -f -o ashift=12 -o compression=zstd $zpool_name mirror \
   ata-diskA \
   ata-diskA \
   ata-diskB
   ata-diskB
Line 22: Line 22:
* You should always use the id under <code>/dev/disk/by-id/</code>
* You should always use the id under <code>/dev/disk/by-id/</code>
** E.g. <code>/dev/disk/by-id/ata-diskA</code>
** E.g. <code>/dev/disk/by-id/ata-diskA</code>
==Alerts==
First [https://askubuntu.com/questions/1332219/send-email-via-gmail-without-other-mail-server-with-postfix setup postfix to send emails].<br>
Then [https://askubuntu.com/questions/770540/enable-zfs-zed-email-notifications-on-16-04 setup ZED notifications]
==Automatic Scrubs==
By default, ZFS on Ubuntu will automatically scrub every month
==Automatic Snapshots==
See [https://github.com/jimsalterjrs/sanoid sanoid]
<pre>
zfs list -t snapshot
</pre>
==Caching==
ZFS has two read caches:
* ARC - this is enabled by default and uses half of your memory. This memory will be released if you approach out of memory.
* L2ARC - you can enable additional caching by adding an L2ARC drive for ARC to overflow to.
For writes:
* SLOG - A separate log, typically an SSD backed mirror to write the ZFS intent log (ZIL).
In general, you will want to use an Intel Optane SSD for caching as they're supposed to last longer and have less latency.<br>
A 16GB Optane stick can be had for ~$12.
===ARC===
<code>arc_summary</code> or <code>arcstat</code> will show you the memory used by ARC. This does not appear in <code>htop</code>.
If you want to reduce arc memory usage, you can set limits by creating <code>/etc/modprobe.d/zfs.conf</code>:
{{hidden | <code>/etc/modprobe.d/zfs.conf</code> |
<pre>
# Set Max ARC size => 4GB == 4294967296 Bytes
options zfs zfs_arc_max=4294967296
# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824
</pre>
}}
===L2ARC===
L2ARC costs about 80 bytes per record. Historically, this used to be 320 bytes, but now it's mostly negligible.<br>
At the default of 128K record size, 1 GiB has 8196 records, hence requiring approx 656 KiB of memory.<br>
At 4K record size, you will need approx. 20 MB of RAM per GB.
To add an l2arc:
<syntaxhighlight lang="bash">
sudo zpool add $pool cache $device
</syntaxhighlight>
===SLOG===
<syntaxhighlight lang="bash">
sudo zpool add $pool log $device
# or
# sudo zpool add $pool log mirror $device1 $device2
</syntaxhighlight>
==Expanding==
You can only expand by adding vdevs or replacing all drives in a vdev with larger ones.<br>
See [https://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html]<br>
After replacing all drives in a vdev, you need to run:
<code>sudo zpool online -e $pool $disk</code> on any disk.


==Pros and Cons==
==Pros and Cons==
Line 32: Line 90:
* ZFS handles everything altogether including parity on permissions
* ZFS handles everything altogether including parity on permissions
;Cons
;Cons
* The main con is that ZFS is less expandable. You need to buy all of your drives up front.
* The main con is that ZFS is less expandable.
** You can only expand by replacing every drive or adding entire vdevs.
* If many drives die, i.e. >2 for raidz2, you lose all your data.
* If many drives die, i.e. >2 for raidz2, you lose all your data.



Latest revision as of 05:17, 1 February 2023

How to use ZFS:

Background

There are three levels to understand

  • zpools are a JBOD of one or more vdevs
  • vdevs are groups of drives, likely in raidz (or raidz2, raidz3) or mirror.
  • datasets are filesystems stored on a zpool, similar to partitions
  • zvol is a virtual block device on a zpool without a filesystem

Usage

# Create a zpool with a mirror vdev.
zpool create -f -o ashift=12 -o compression=zstd $zpool_name mirror \
  ata-diskA \
  ata-diskB

# Create a dataset.
zfs create -o encryption=aes-256-gcm -o keyformat=passphrase $zpool_name/$dataset_name
Notes
  • You should always use the id under /dev/disk/by-id/
    • E.g. /dev/disk/by-id/ata-diskA

Alerts

First setup postfix to send emails.
Then setup ZED notifications

Automatic Scrubs

By default, ZFS on Ubuntu will automatically scrub every month

Automatic Snapshots

See sanoid

zfs list -t snapshot

Caching

ZFS has two read caches:

  • ARC - this is enabled by default and uses half of your memory. This memory will be released if you approach out of memory.
  • L2ARC - you can enable additional caching by adding an L2ARC drive for ARC to overflow to.

For writes:

  • SLOG - A separate log, typically an SSD backed mirror to write the ZFS intent log (ZIL).

In general, you will want to use an Intel Optane SSD for caching as they're supposed to last longer and have less latency.
A 16GB Optane stick can be had for ~$12.

ARC

arc_summary or arcstat will show you the memory used by ARC. This does not appear in htop.

If you want to reduce arc memory usage, you can set limits by creating /etc/modprobe.d/zfs.conf:

/etc/modprobe.d/zfs.conf
# Set Max ARC size => 4GB == 4294967296 Bytes
options zfs zfs_arc_max=4294967296
# Set Min ARC size => 1GB == 1073741824
options zfs zfs_arc_min=1073741824

L2ARC

L2ARC costs about 80 bytes per record. Historically, this used to be 320 bytes, but now it's mostly negligible.
At the default of 128K record size, 1 GiB has 8196 records, hence requiring approx 656 KiB of memory.
At 4K record size, you will need approx. 20 MB of RAM per GB.

To add an l2arc:

sudo zpool add $pool cache $device

SLOG

sudo zpool add $pool log $device
# or
# sudo zpool add $pool log mirror $device1 $device2

Expanding

You can only expand by adding vdevs or replacing all drives in a vdev with larger ones.
See [1]

After replacing all drives in a vdev, you need to run: sudo zpool online -e $pool $disk on any disk.

Pros and Cons

VS Snapraid + btrfs + mergerfs

Pros
  • ZFS has realtime parity.
  • ZFS can work while degraded.
  • ZFS snapshots with send and receive.
  • ZFS has encryption on per-dataset.
  • ZFS handles everything altogether including parity on permissions
Cons
  • The main con is that ZFS is less expandable.
    • You can only expand by replacing every drive or adding entire vdevs.
  • If many drives die, i.e. >2 for raidz2, you lose all your data.

Resources