Data Hoarding

From David's Wiki
Jump to navigation Jump to search
\( \newcommand{\P}[]{\unicode{xB6}} \newcommand{\AA}[]{\unicode{x212B}} \newcommand{\empty}[]{\emptyset} \newcommand{\O}[]{\emptyset} \newcommand{\Alpha}[]{Α} \newcommand{\Beta}[]{Β} \newcommand{\Epsilon}[]{Ε} \newcommand{\Iota}[]{Ι} \newcommand{\Kappa}[]{Κ} \newcommand{\Rho}[]{Ρ} \newcommand{\Tau}[]{Τ} \newcommand{\Zeta}[]{Ζ} \newcommand{\Mu}[]{\unicode{x039C}} \newcommand{\Chi}[]{Χ} \newcommand{\Eta}[]{\unicode{x0397}} \newcommand{\Nu}[]{\unicode{x039D}} \newcommand{\Omicron}[]{\unicode{x039F}} \DeclareMathOperator{\sgn}{sgn} \def\oiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x222F}\,}{\unicode{x222F}}{\unicode{x222F}}{\unicode{x222F}}}\,}\nolimits} \def\oiiint{\mathop{\vcenter{\mathchoice{\huge\unicode{x2230}\,}{\unicode{x2230}}{\unicode{x2230}}{\unicode{x2230}}}\,}\nolimits} \)

How to do data hoarding.

My personal server setup is:

  • 6x USB hard drives.
  • LUKS full-disk encryption with EXT4 on each drive.
  • SnapRAID + Mergerfs with 4 data drives and 2 parities.
  • SFTP to mount or transfer files.
  • borgbackup to backup other drives to my SnapRAID array and to backup the whole array to another custom offsite setup.


  • In general, get WD easystore disks from Best Buy when they're on sale.
    • $15 per TB is great pricing for 8TB+ drives. When on sale, 12TB runs $180 and 14TB runs $200. You can also find $250 14TB bare drives on eBay.
    • WD owns Hitachi Global Storage Technologies (HGST). All new HGST drives are WD drives.
  • Avoid SMR drives which have worse performance and reliability. Specifically, their write performance is incredibly bad so you'll end up taking multiple days to rebuild parity drives in raid arrays.
  • Avoid lower-end Seagate drives (e.g. rosewood).

USB Disks

Usually people like to shuck these and put them in a PC or multi-bay HDD enclosure.
You can also use them as-is but there are a few caveats:

  • The included USB enclosures are typically fanless. This is okay if you let them spin down all the time. If you want to keep them spining all the time, you can ls them every few minutes. However, you should add a fan if you let them spin all the time.


for i in "${DRIVES[@]}"
        ls $i

You can run this in a cron job every minute:

* * * * * /home/david/bin/  > /dev/null 2>&1
  • SMART data may or may not be available. You can try using -d sat with smartctl.

If you decide to shuck the drive and you're using an older power supply, you may need to apply tape. See instructables.

Do not use molex to sata adapters.

Operating Systems

I recommend using a Linux distro such as Ubuntu or Debian. This allows you to use your NAS for other purposes without virtualizing a linux kernel. The main reason to consider is NAS-focused distro is to get a nice GUI for management.

TrueNAS Core (formerly FreeNAS) is a very popular FreeBSD based OS. Many people use it because it comes with a web interface for managing ZFS arrays. However these days, you can also use ZFS on linux.

If not using ZFS, make sure you format all drives using EXT4, XFS, or BTRFS.
Using NTFS will lead to performance issues (fragmentation) and permissions issues.

Below are some NAS-focused operating systems. I haven't tried any of them personally.

  • TrueNAS Core (Formerly, FreeNAS)
  • Unraid - A paid NAS OS based on Linux. This is popular among the virtualization community since it supports PCIE passthrough.
  • Openmediavault - A debian fork with a GUI for managing your data.

HDD Testing

You should test all hard drives, both new and old, before adding them to your array.

  • See S.M.A.R.T. to view smart data and run smart tests on linux
export DRIVE_PATH=<your drive>

# Do a short smart test
smartctl -t short "${DRIVE_PATH}" [-d sat]

# Do a long smart test
smartctl -t long "${DRIVE_PATH}" [-d sat]

# Check all smart attributes
smartctl -a "${DRIVE_PATH}" [-d sat]

# Do a random readwrite test
sudo fio --filename="${DRIVE_PATH}" --name=randwrite --ioengine=sync --iodepth=1 --rw=randrw --rwmixread=50 --rwmixwrite=50 --bs=4k --direct=0 --numjobs=8 --size=300G --runtime=7200 --group_reporting


I just use LUKS for full disk encryption of every individual disk.
You can also use LUKS with ZFS.

  • Archwiki: dm-crypt/Encrypting a non-root file system for full-disk or partition encryption. Most robust but only on Linux.
    • See also LUKS.
    • This is slightly more technical to setup due to not having a GUI.
  • VeraCrypt for full-disk, partition, or container based encryption. Works across operating systems and has a nice GUI.
  • Rclone for file-based encryption. No GUI but has a nice clean CLI interface.
  • Cryptomator is also another good choice for local file-based encryption.

See for more options.


It is inevitable that one of your drives will eventually fail. Parity protects you from single drive failures, but are not a replacement for off-site backups.
In general, I strongly recommend against hardware raid and Intel RST raid, despite them being the most popular.
When raid cards or motherboards fail, your data becomes difficult to recover.
I personally only use SnapRAID, though ZFS has a good reputation as well.

  • Due to SnapRAID not being real-time, you should exclude applications such as MariaDB or GitLab. You can snapshot them separately to a folder with parity.


  • rsync for local or ssh backups. Note that rsync does not handle moves or renames gracefully.
  • rclone for cloud backups
  • borgbackup for deduplicating backups over local or remote connections.
    • deduplicating allows for graceful handling of renames, moves, and incremental backups.

Union File Systems

Union file systems automatically manage data across multiple drives. Mergerfs does so with existing filesystems.


  • Nextcloud is a file management application. It allows you to add an SFTP link as an external storage.
    • Alternatively, you can also move the nextcloud directory to your array but then it'll be slightly more difficult to access using other methods.
  • Jellyfin is a front-end to stream media from your server, similar to Plex.

Closed-source Windows Apps

Would not recommend creating a Windows NAS. If you want to though, most of the above also runs on Windows. Note: I have not tried the applications below.

Here is a list of popular closed-source apps:

More Resources