Data Hoarding

From David's Wiki
Jump to navigation Jump to search
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

How to do data hoarding.

My personal server setup is:

  • 6x USB hard drives.
  • LUKS encryption + btrfs on each drive.
  • SnapRAID + Mergerfs with 4 data drives and 2 parities.

Disks

  • In general, get WD easystore disks from Best Buy when they're on sale.
    • $15 per TB is great pricing for 8TB+ drives. When on sale, 12TB runs $180 and 14TB runs $200. You can also find $250 14TB bare drives on eBay.
    • WD owns Hitachi Global Storage Technologies (HGST). All new HGST drives are WD drives.
  • Avoid SMR drives which have worse performance and reliability. Specifically, their write performance is incredibly bad so you'll end up taking multiple days to rebuild parity drives in raid arrays.
  • Avoid lower-end Seagate drives (e.g. rosewood).

USB Disks

Using USB disks is not best practice.
However, if you're avoiding buying an actual server, you can also use them as-is with a few caveats:

  • The included USB enclosures are typically fanless. This is okay if you let them spin down all the time. If you want to keep them spining all the time, you can ls them every few minutes. However, you should add a fan if you let them spin all the time.
keep_drives_alive.sh
#!/bin/bash

DRIVES=(
  /media/veracrypt1
  /media/veracrypt2
  /media/veracrypt3
  /media/veracrypt4
  /media/veracrypt5
  /media/veracrypt6
  /media/veracrypt7
  /media/veracrypt8
  /media/veracrypt9
)

for i in "${DRIVES[@]}"
do
        ls $i
done

You can run this in a cron job every minute:

* * * * * /home/david/bin/keep_drives_spinning.sh  > /dev/null 2>&1
  • SMART data may or may not be available. You can try using -d sat with smartctl.

If you decide to shuck the drive and you're using an older power supply, you may need to apply tape. See instructables.

Do not use molex to sata adapters.

Operating Systems

If you're not building a dedicated storage server, I recommend using a Linux distro such as Ubuntu or Debian. This allows you to use your NAS for other purposes without virtualizing a linux kernel.

For a dedicated storage server, TrueNAS Core (formerly FreeNAS) is a very popular FreeBSD based OS. Many people use it because it comes with a web interface for managing ZFS arrays.

If not using ZFS, make sure you format all drives using EXT4, XFS, or BTRFS.
Using NTFS on Linux will lead to performance and permissions issues.

Below are some NAS-focused operating systems. I haven't tried any of them personally.

HDD Testing

You should test all hard drives, both new and old, before adding them to your array.

  • See S.M.A.R.T. to view smart data and run smart tests on linux
export DRIVE_PATH=<your drive>

# Do a short smart test
smartctl -t short "${DRIVE_PATH}" [-d sat]

# Do a long smart test
smartctl -t long "${DRIVE_PATH}" [-d sat]

# Check all smart attributes
smartctl -a "${DRIVE_PATH}" [-d sat]

# Do a random readwrite test
sudo fio --filename="${DRIVE_PATH}" --name=randwrite --ioengine=sync --iodepth=1 --rw=randrw --rwmixread=50 --rwmixwrite=50 --bs=4k --direct=0 --numjobs=8 --size=300G --runtime=7200 --group_reporting

Encryption

I just use LUKS for full disk encryption of every individual disk.
You can also use LUKS with ZFS.

  • Archwiki: dm-crypt/Encrypting a non-root file system for full-disk or partition encryption. Most robust but only on Linux.
    • See also LUKS.
    • This is slightly more technical to setup due to not having a GUI.
  • VeraCrypt for full-disk, partition, or container based encryption. Works across operating systems and has a nice GUI.
  • Rclone for file-based encryption. No GUI but has a nice clean CLI interface.
  • Cryptomator is also another good choice for local file-based encryption.

See https://www.privacytools.io/software/encryption-tools/ for more options.

Parity

It is inevitable that one of your drives will eventually fail. Parity protects you from single drive failures, but are not a replacement for off-site backups.
In general, I strongly recommend against hardware raid and Intel RST raid, despite them being the most popular.
When raid cards or motherboards fail, your data becomes difficult to recover.
I personally only use SnapRAID, though ZFS has a good reputation as well.

Notes
  • Due to SnapRAID not being real-time, you should exclude applications such as MariaDB or GitLab. You can snapshot them separately to a folder with parity.

Backup

  • rsync for local or ssh backups. Note that rsync does not handle moves or renames gracefully.
  • rclone for cloud backups
  • borgbackup for deduplicating backups over local or remote connections.
    • deduplicating allows for graceful handling of renames, moves, and incremental backups.

Union File Systems

Union file systems automatically manage data across multiple drives. Mergerfs does so with existing filesystems.

Front-ends

  • Nextcloud is a file management application. It allows you to add an SFTP link as an external storage.
    • Alternatively, you can also move the Nextcloud directory to your array but then it'll be slightly more difficult to access using other methods.
  • Jellyfin is a front-end to stream media from your server, similar to Plex. This is a fork of Emby, a once open-source media solution.
  • Seafile is a fast file storage application and protocol.

Closed-source/Windows Apps

I do not use a Windows NAS. If you want to though, most of the above also runs on Windows. Note: I have not tried the applications below.

Here is a list of popular closed-source/Windows apps:

  • Storage Spaces comes with Windows.
  • Stablebit DrivePool is like mergerfs but costs $30. They also have an rclone-esque software which costs $40.
  • Unraid is a very popular alternative to TrueNAS/ZFS. It has expandability benefits of SnapRAID but runs in real-time. It costs $60-130. Based on Linux.

More Resources