Difference between revisions of "Data Hoarding"

From David's Wiki
Jump to navigation Jump to search
Line 3: Line 3:
 
My personal server setup is:
 
My personal server setup is:
 
* 5x USB hard drives.
 
* 5x USB hard drives.
* Veracrypt full-disk encryption with EXT4 on each drive.
+
* LUKS full-disk encryption with EXT4 on each drive.
 
* SnapRAID with 4 data drives and 2 parities.
 
* SnapRAID with 4 data drives and 2 parities.
 
* Mergerfs on the data drives with default values (mfs).
 
* Mergerfs on the data drives with default values (mfs).

Revision as of 22:53, 25 July 2020

How to do data hoarding.

My personal server setup is:

  • 5x USB hard drives.
  • LUKS full-disk encryption with EXT4 on each drive.
  • SnapRAID with 4 data drives and 2 parities.
  • Mergerfs on the data drives with default values (mfs).
  • SFTP to mount or transfer files.

Disks

  • In general, get WD easystore disks from Best Buy when they're on sale.
    • $15 per TB is great pricing for 8TB+ drives. When on sale, 12TB runs $180 and 14TB runs $200. You can also find $250 14TB bare drives on eBay.
    • WD owns Hitachi Global Storage Technologies (HGST). All new HGST drives are WD drives.
  • Avoid SMR drives which have worse performance and reliability. Specifically, their write performance is incredibly bad so you'll end up taking multiple days to rebuild parity drives in raid arrays.
  • Avoid lower-end Seagate drives (e.g. rosewood).

USB Disks

Usually people like to shuck these and put them in a PC or multi-bay HDD enclosure.
You can also use them as-is but there are a few caveats:

  • The included USB enclosures are typically fanless. This is okay if you let them spin down all the time. If you want to keep them spining all the time, you can ls them every few minutes. However, you should add a fan if you let them spin all the time.
keep_drives_alive.sh
#!/bin/bash

DRIVES=(
  /media/veracrypt1
  /media/veracrypt2
  /media/veracrypt3
  /media/veracrypt4
  /media/veracrypt5
  /media/veracrypt6
  /media/veracrypt7
  /media/veracrypt8
  /media/veracrypt9
)

for i in "${DRIVES[@]}"
do
        ls $i
done

You can run this in a cron job every minute:

* * * * * /home/david/bin/keep_drives_spinning.sh  > /dev/null 2>&1
  • SMART data may or may not be available. You can try using -d sat with smartctl.

If you decide to shuck the drive and you're using an older power supply, you may need to apply tape. See instructables.

Do not use molex to sata adapters. There is a saying among the Slickdeals community: molex to sata, lose all your data.

Operating Systems

I recommend using a Linux distro such as Ubuntu or Debian. This allows you to use your NAS for other purposes without virtualizing a linux kernel. The main reason to consider is NAS-focused distro is to get a nice GUI for management.

TrueNAS Core (formerly FreeNAS) is a very popular FreeBSD based OS. Many people use it because it comes with a web interface for managing ZFS arrays. However these days, you can also use ZFS on linux.

If not using ZFS, make sure you format all drives using EXT4, XFS, BTRFS, or similar. Using NTFS will lead to performance issues (fragmentation) and permissions issues.

Below are some NAS-focused operating systems. I haven't tried any of them personally.

  • FreeNAS / TrueNAS Core
  • Unraid - A paid NAS OS based on Linux. This is popular among the virtualization community since it supports PCIE passthrough.
  • Openmediavault - A popular debian fork with a GUI.

HDD Testing

You should test all hard drives, both new and old, before adding them to your array.

Encryption

I mainly use VeraCrypt for full disk encryption of every individual disk.
You can also use LUKs which is built into linux and more widely used.

  • Archwiki: dm-crypt/Encrypting a non-root file system for full-disk or partition encryption. Most robust but only on Linux.
    • This is slightly more technical to setup due to not having a GUI.
  • VeraCrypt for full-disk, partition, or container based encryption. Works across operating systems and has a nice GUI.
  • Rclone for file-based encryption. No GUI but has a nice clean CLI interface.
  • Cryptomator is also another good choice for local file-based encryption.

See https://www.privacytools.io/software/encryption-tools/ for more options.

Parity

It is inevitable that one of your drives will eventually fail.
In general, I strongly recommend against hardware Raid and intel RST Raid, despite them being the most popular.
When raid cards or motherboards fail, your data becomes difficult to recover.
I personally only use SnapRAID.

Notes
  • Due to SnapRAID not being real-time, you should exclude applications such as MariaDB or GitLab. You can snapshot them separately to a folder with parity.

Backup

Union File Systems

Front-ends

  • Nextcloud allows you to add an SFTP link as an external storage.
    • Alternatively, you can also move the nextcloud directory to your array but then it'll be slightly more difficult to access using other methods.

Closed-source Windows Apps

Would not recommend creating a Windows NAS. If you want to though, most of the above also runs on Windows. Note: I have not tried the applications below.

Here is a list of popular closed-source apps:

More Resources