ZFS on Proxmox

Creating Storage Pools

I’ve been up and running a few weeks at the time of this post, and so far my favorite way to set up my disks is in ZFS arrays. I originally thought I would make a FreeNAS VM and use that to manage the storage and pass through the SAS HBA to it so I could take advantage of the ZFS features I set out to have while also having the fantastic user and share management that FreeNAS has to offer. I even set this up and had a pool going, but decided that this would rely too heavily on the loopback interface and could create an unnecessary bottleneck since I added 10gb Ethernet recently. To avoid this bottleneck, I decided to use the ZFS functionality that Proxmox already has, and toughen up and learn how to manage ZFS pools from the command line like a real sysadmin.

The web interface allows you to make a pool quite easily, but does require some set up before it will allow you to see the all of the available disks. My favorite commands for prepping the disks has to be

fdisk list

fdisk /dev/sda

Which will allow you to start modifying the partitions, in this case deleting them all so we can use them for Proxmox. Once this is done you can easily create a zpool with the disks you have installed and call it a day.

HOWEVER, if you want to learn, implement caching of any kind, or do any other advanced set up, read the next section to learn how to put together ZFS arrays from the command line.

Using the Command Line

You decided to read on to this section, so I can be confident you are an advanced user, curious, or just hate yourself and your data. Why do I say that? Because doing anything below incorrectly WILL DESTROY ALL OF YOUR DATA. Proceed with caution, be sparing with the -f flag when creating pools, research warning messages your server gives you before proceeding, and make sure you have a complete understanding of what the components of a zpool do before trying to assemble one with commands in a terminal. With that out of the way, let’s get started with my favorite commands for getting me bearings on an unfamiliar disk arrangement.

To get detailed information on the disks in the system, their unique identifiers, partitions, and current uses/flags, I use

lsblk -o NAME,PARTUUID,LABEL,PATH,FSTYPE

This will return some nice information we may need later on to create our pool, sample below.

NAME PARTUUID LABEL PATH FSTYPE
sda /dev/sda
├─sda1 59a20e15-c8f2-4c0e-9cc3-38e3e814d756 /dev/sda1
├─sda2 5cbb9c01-f03f-4d43-8978-079b4c5763ff /dev/sda2 vfat
└─sda3 b436dc4b-7174-4bac-aa5d-1562be8444c7 /dev/sda3 LVM2_member
├─pve-swap /dev/mapper/pve-swap swap
├─pve-root /dev/mapper/pve-root ext4
├─pve-data_tmeta /dev/mapper/pve-data_tmeta
│ └─pve-data-tpool /dev/mapper/pve-data-tpool
│ ├─pve-data /dev/mapper/pve-data
│ ├─pve-vm--101--disk--0 /dev/mapper/pve-vm--101--disk--0
│ └─pve-vm--100--disk--0 /dev/mapper/pve-vm--100--disk--0
└─pve-data_tdata /dev/mapper/pve-data_tdata
└─pve-data-tpool /dev/mapper/pve-data-tpool
├─pve-data /dev/mapper/pve-data
├─pve-vm--101--disk--0 /dev/mapper/pve-vm--101--disk--0
└─pve-vm--100--disk--0 /dev/mapper/pve-vm--100--disk--0
sdb /dev/sdb
├─sdb1 416e3896-d11e-3d4d-8a58-ee63c8bbf20c small /dev/sdb1 zfs_member
└─sdb2 102d26fe-7cb9-f44e-b0ec-6c29ebb7d860 /dev/sdb2
sdc /dev/sdc
├─sdc1 0006e9ba-11d4-11eb-a2cf-818015bd507b /dev/sdc1
└─sdc2 04f7e60e-11d4-11eb-a2cf-818015bd507b /dev/sdc2
sdd /dev/sdd zfs_member
├─sdd1 151d9d04-11d4-11eb-a2cf-818015bd507b /dev/sdd1 zfs_member
└─sdd2 1670a33d-11d4-11eb-a2cf-818015bd507b vTank /dev/sdd2 zfs_member
sde /dev/sde zfs_member
├─sde1 0d387804-11d4-11eb-a2cf-818015bd507b /dev/sde1 zfs_member
└─sde2 0f8a209c-11d4-11eb-a2cf-818015bd507b vTank /dev/sde2 zfs_member

Using the information here, we can properly refer to the partitions available by a persistent identifier, which is important because disk labels and paths can change for a number of reasons. For example, the “/dev/sda” is dependent on initialization order of the disks, so it can change on reboot or rearrangement of the disks, so should never be used to create a pool. For this, we should refer to a UUID for the disk or partition we plan to use, as seen above, labeled PARTUUID. I always use partitions to create pools as it allows me to set aside extra space on the disk so I can use slightly mismatched drives when creating a pool or replacing a pool member. For instance, you may have seen that some 500GB SSDs have slightly more or less than exactly 500GB usable, like 480, 500, or 512. If I create a partition that uses 480, I could make that partition on any of these disks and would earn some flexibility in my replacements later on. Or I could use 500GB and be able to use 500 or 512 disks, you get the idea. FreeNAS does this by default with a 2GB extra space to allow for small differences, not quite what I just described.

To make the partitions, I refer back to the first section and select a given disk, say /dev/nvme0n1. I’ll open it with fdisk, then outline my partition here. I plan to use this disk for an l2arc, but have 192GB of RAM, which is not enough for 1TB of l2arc. I will outline the requirements and use cases for l2arc in another page, but for my purposes, I only want to partition 500GB, which is still enormous for my setup. So I need to determine how many sectors I will need to allocate in order to get 500GB. One sector is 512 Bytes, so I just need to divide 512 Bytes into the desired 500GBytes to get number of sectors.

This looks like (500*1024*1024*1024)/512 = 1,048,576,000 sectors. I will be starting at sector 2048, so my range will be 2048-1,048,578,047. Here is a snippet of what it looks like to make this partition.

root@rex:~# fdisk /dev/nvme0n1

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition number (1-128, default 1): 1
First sector (34-2000409230, default 2048): 2048
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2000409230, default 2000409230): 1048578047

Created a new partition 1 of type 'Linux filesystem' and of size 500 GiB.
Partition #1 contains a zfs_member signature.

Do you want to remove the signature? [Y]es/[N]o: y

The signature will be removed by a write command.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Now I have my partition and I can attach it to a pool, for this I will give an example. I have 4 spare drives that I’ll use to make an array of striped mirrors using the primary partitions I created, then add this drive as an l2arc.

root@rex:~# zpool create small mirror 4dadc559-e757-3249-8ac0-394275f1e30a 183b1cd2-e4ef-234f-8298-e9becfabc5b2 mirror d7992b6e-ce47-ed49-9b6a-a7ccf8da9974 0c2311bb-6316-d940-a6a4-e7a472b92be7
root@rex:~# zpool status small
pool: small
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
small ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
4dadc559-e757-3249-8ac0-394275f1e30a ONLINE 0 0 0
183b1cd2-e4ef-234f-8298-e9becfabc5b2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
d7992b6e-ce47-ed49-9b6a-a7ccf8da9974 ONLINE 0 0 0
0c2311bb-6316-d940-a6a4-e7a472b92be7 ONLINE 0 0 0

errors: No known data errors
root@rex:~# zpool add small cache 42def480-d99f-b84a-8696-0f64e488d528
root@rex:~# zpool status small
pool: small
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
small ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
4dadc559-e757-3249-8ac0-394275f1e30a ONLINE 0 0 0
183b1cd2-e4ef-234f-8298-e9becfabc5b2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
d7992b6e-ce47-ed49-9b6a-a7ccf8da9974 ONLINE 0 0 0
0c2311bb-6316-d940-a6a4-e7a472b92be7 ONLINE 0 0 0
cache
42def480-d99f-b84a-8696-0f64e488d528 ONLINE 0 0 0

errors: No known data errors

 

 

Sources

Adding l2arc device:

https://docs.oracle.com/cd/E19253-01/819-5461/givdo/index.html

https://blog.programster.org/zfs-add-l2arc

https://www.truenas.com/community/threads/can-i-detach-and-remove-l2arc-drive.11319/

Commands for adding devices to pool:

https://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6qvk/index.html#gfgaw

Information on disk identification:

https://unix.stackexchange.com/questions/474371/how-do-i-create-a-zpool-using-uuid-or-truly-unique-identifier

https://wiki.archlinux.org/index.php/persistent_block_device_naming

http://www.freebsddiary.org/zfs-with-gpart.php

Information on creating partitions:

https://www.crucial.com.au/blog/2009/11/18/how-to-create-a-new-partition-on-a-linux-server/

Calculator for partition start and end block numbers:

http://www.csgnetwork.com/mediasizecalc.html

Considerations on resource requirements:
General information: