ZFS on Proxmox

November 7, 2020November 27, 2020Ethan

I’ve been up and running a few weeks at the time of this post, and so far my favorite way to set up my disks is in ZFS arrays. I originally thought I would make a FreeNAS VM and use that to manage the storage and pass through the SAS HBA to it so I could take advantage of the ZFS features I set out to have while also having the fantastic user and share management that FreeNAS has to offer. I even set this up and had a pool going, but decided that this would rely too heavily on the loopback interface and could create an unnecessary bottleneck since I added 10gb Ethernet recently. To avoid this bottleneck, I decided to use the ZFS functionality that Proxmox already has, and toughen up and learn how to manage ZFS pools from the command line like a real sysadmin.

The web interface allows you to make a pool quite easily, but does require some set up before it will allow you to see the all of the available disks. My favorite commands for prepping the disks has to be

fdisk list

fdisk /dev/sda

Which will allow you to start modifying the partitions, in this case deleting them all so we can use them for Proxmox. Once this is done you can easily create a zpool with the disks you have installed and call it a day.

HOWEVER, if you want to learn, implement caching of any kind, or do any other advanced set up, read the next section to learn how to put together ZFS arrays from the command line.

You decided to read on to this section, so I can be confident you are an advanced user, curious, or just hate yourself and your data. Why do I say that? Because doing anything below incorrectly WILL DESTROY ALL OF YOUR DATA. Proceed with caution, be sparing with the -f flag when creating pools, research warning messages your server gives you before proceeding, and make sure you have a complete understanding of what the components of a zpool do before trying to assemble one with commands in a terminal. With that out of the way, let’s get started with my favorite commands for getting me bearings on an unfamiliar disk arrangement.

To get detailed information on the disks in the system, their unique identifiers, partitions, and current uses/flags, I use

lsblk -o NAME,PARTUUID,LABEL,PATH,FSTYPE

This will return some nice information we may need later on to create our pool, sample below.

NAME PARTUUID LABEL PATH FSTYPE sda /dev/sda ├─sda1 59a20e15-c8f2-4c0e-9cc3-38e3e814d756 /dev/sda1 ├─sda2 5cbb9c01-f03f-4d43-8978-079b4c5763ff /dev/sda2 vfat └─sda3 b436dc4b-7174-4bac-aa5d-1562be8444c7 /dev/sda3 LVM2_member ├─pve-swap /dev/mapper/pve-swap swap ├─pve-root /dev/mapper/pve-root ext4 ├─pve-data_tmeta /dev/mapper/pve-data_tmeta │ └─pve-data-tpool /dev/mapper/pve-data-tpool │ ├─pve-data /dev/mapper/pve-data │ ├─pve-vm--101--disk--0 /dev/mapper/pve-vm--101--disk--0 │ └─pve-vm--100--disk--0 /dev/mapper/pve-vm--100--disk--0 └─pve-data_tdata /dev/mapper/pve-data_tdata └─pve-data-tpool /dev/mapper/pve-data-tpool ├─pve-data /dev/mapper/pve-data ├─pve-vm--101--disk--0 /dev/mapper/pve-vm--101--disk--0 └─pve-vm--100--disk--0 /dev/mapper/pve-vm--100--disk--0 sdb /dev/sdb ├─sdb1 416e3896-d11e-3d4d-8a58-ee63c8bbf20c small /dev/sdb1 zfs_member └─sdb2 102d26fe-7cb9-f44e-b0ec-6c29ebb7d860 /dev/sdb2 sdc /dev/sdc ├─sdc1 0006e9ba-11d4-11eb-a2cf-818015bd507b /dev/sdc1 └─sdc2 04f7e60e-11d4-11eb-a2cf-818015bd507b /dev/sdc2 sdd /dev/sdd zfs_member ├─sdd1 151d9d04-11d4-11eb-a2cf-818015bd507b /dev/sdd1 zfs_member └─sdd2 1670a33d-11d4-11eb-a2cf-818015bd507b vTank /dev/sdd2 zfs_member sde /dev/sde zfs_member ├─sde1 0d387804-11d4-11eb-a2cf-818015bd507b /dev/sde1 zfs_member └─sde2 0f8a209c-11d4-11eb-a2cf-818015bd507b vTank /dev/sde2 zfs_member

Using the information here, we can properly refer to the partitions available by a persistent identifier, which is important because disk labels and paths can change for a number of reasons. For example, the “/dev/sda” is dependent on initialization order of the disks, so it can change on reboot or rearrangement of the disks, so should never be used to create a pool. For this, we should refer to a UUID for the disk or partition we plan to use, as seen above, labeled PARTUUID. I always use partitions to create pools as it allows me to set aside extra space on the disk so I can use slightly mismatched drives when creating a pool or replacing a pool member. For instance, you may have seen that some 500GB SSDs have slightly more or less than exactly 500GB usable, like 480, 500, or 512. If I create a partition that uses 480, I could make that partition on any of these disks and would earn some flexibility in my replacements later on. Or I could use 500GB and be able to use 500 or 512 disks, you get the idea. FreeNAS does this by default with a 2GB extra space to allow for small differences, not quite what I just described.

To make the partitions, I refer back to the first section and select a given disk, say /dev/nvme0n1. I’ll open it with fdisk, then outline my partition here. I plan to use this disk for an l2arc, but have 192GB of RAM, which is not enough for 1TB of l2arc. I will outline the requirements and use cases for l2arc in another page, but for my purposes, I only want to partition 500GB, which is still enormous for my setup. So I need to determine how many sectors I will need to allocate in order to get 500GB. One sector is 512 Bytes, so I just need to divide 512 Bytes into the desired 500GBytes to get number of sectors.

This looks like (500*1024*1024*1024)/512 = 1,048,576,000 sectors. I will be starting at sector 2048, so my range will be 2048-1,048,578,047. Here is a snippet of what it looks like to make this partition.

root@rex:~# fdisk /dev/nvme0n1

Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Command (m for help): n
Partition number (1-128, default 1): 1
First sector (34-2000409230, default 2048): 2048
Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2000409230, default 2000409230): 1048578047

Created a new partition 1 of type 'Linux filesystem' and of size 500 GiB.
Partition #1 contains a zfs_member signature.

Do you want to remove the signature? [Y]es/[N]o: y

The signature will be removed by a write command.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Now I have my partition and I can attach it to a pool, for this I will give an example. I have 4 spare drives that I’ll use to make an array of striped mirrors using the primary partitions I created, then add this drive as an l2arc.

root@rex:~# zpool create small mirror 4dadc559-e757-3249-8ac0-394275f1e30a 183b1cd2-e4ef-234f-8298-e9becfabc5b2 mirror d7992b6e-ce47-ed49-9b6a-a7ccf8da9974 0c2311bb-6316-d940-a6a4-e7a472b92be7
root@rex:~# zpool status small
pool: small
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
small ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
4dadc559-e757-3249-8ac0-394275f1e30a ONLINE 0 0 0
183b1cd2-e4ef-234f-8298-e9becfabc5b2 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
d7992b6e-ce47-ed49-9b6a-a7ccf8da9974 ONLINE 0 0 0
0c2311bb-6316-d940-a6a4-e7a472b92be7 ONLINE 0 0 0

errors: No known data errors
root@rex:~# zpool add small cache 42def480-d99f-b84a-8696-0f64e488d528
root@rex:~# zpool status small
pool: small
state: ONLINE
scan: none requested
config:

errors: No known data errors

Using ELK to Analyze Twitter Data

October 11, 2019October 11, 2019Ethan

We are experiencing a novel challenge nowadays, where we have copious amounts of data being generated by each and every internet user every day, but lack adequate resources to take full advantage of that data. Without diving into what it means to "take full advantage" and get into privacy concerns, let us consider some of the good ways we can use data.

Find patterns in public opinion on a topic
Determine what topic is gaining popularity in communities
Predict shifts in markets, politics, or other environments

These are broad enough that they are unlikely to have a negative impact on any persons privacy but have clear advantages to governments, companies, and any interest group. Currently we may see political organizations taking advantage of public data to determine probabilities of certain officials winning an election, or companies determining how a decision was received or is likely to be received by their audience, or an interest group finding like-minded communities to focus their efforts on.

Alright, so data is useful. Now what? Now we need to decide how to make sense of it, and computers are our best bet if we want to tackle any sizable data set. For this exercise, let's take a look at the ELK stack.

What is the ELK stack? The ELK stack is a collection of 3 tools that work together towards the end of making sense of data.

E - Elasticsearch: JSON-based search and analytics engine
L - Logstash: Data collection pipeline
K - Kibana: UI for data visualization

So this gives us a method for collecting data, searching or analyzing the data, and visualizing the data. It seems pretty clear why these get used together more than independently. So Logstash processes whatever data we give it access to based on filters we set, sends that on to Elasticsearch. Then, using Kibana, we can view and analyze our filtered data.

Set up isn't too bad for any of these three in any operating system with the current release (7.2.0 at time of writing) and the developers have what seems to be very thorough documentation of how to get set up.

Installing Elasticsearch
Installing Logstash
Installing Kibana

Preparing to Connect to Twitter

Clearly, in order to analyze Twitter data we need a way to connect to Twitter. To use their API we need to set up a developer account, register an app with them, then generate some keys and tokens that we will need in order to configure Logstash. Doing this does require that you agree not to submit data to governments, not to spy, and not to try to harvest personal information about users. Basic stuff.

Okay, we've got our keys, so now we can refer to Elastics documentations for Logstash's Twitter plug in that we will be making use of and get the following format for our configuration file.

input {
  twitter {
      consumer_key => "consumer_key"
      consumer_secret => "consumer_secret"
      oauth_token => "oauth_token"
      oauth_token_secret => "oauth_token_secret"
      keywords => ["Keyword1","Keyword2"]
      full_tweet => true
  }
}
output {
  elasticsearch { hosts => ["localhost:9200"] }
  stdout { codec => rubydebug }
}

We can then launch Logstash from command line and provided the other component were installed and started as the instructions provided by Elastic suggest, we are up and running, collecting data! This plugin, unfortunately, only collects current data from Twitter while it is running. This can be circumvented with additional code and use of premium Twitter APIs, but for many use cases this is more than enough to work with. Also worth noting any API call to Twitter for past Tweets counts towards your relatively limited quota, after which would cost money to use.

With a very basic implementation, we primarily have the ability to view number of Tweets in our dataset, which is based on our keywords set in config, which contain some other keyword, which we can define through Kibana.

In this example, I filtered tweets by keywords "Hurricane" and "Hurricanes" in my configuration file, then assigned several filters to split the results into "sub-buckets" or the different lines seen in the image. To represent the number of Tweets contained certain keywords, I used a Date Histogram, which makes visualizing frequency of data, and in this case, relative frequencies much easier.

To provide a realistic analogue to a real world scenario, I chose many different specific subjects (hurricanes by name) belonging to my broad subject (hurricanes) which allows me to quickly see how many people are talking about which hurricanes and when they are doing it. This particular example only shows a relatively small window of time, but it is easy to see how such a concept scales.

Now, the more interesting idea explored here. Pinpointing associated topics and public sentiment. A good topic to explore in our example of hurricanes is climate change. It is widely believed that climate change is likely to have increased and continue to increase the frequency of catastrophic weather events, such as hurricanes. As a rough way of seeing how many people are talking about this relationship, I added a filter for climate change or global warming, using both because people often use them to refer to the same phenomenon. For this time period, it appears that people who were talking about hurricanes at all also mentioned climate change as much as or more than they did any particular hurricane. In our example, this suggests that people may be more interested in the cause of these hurricanes than the hurricanes themselves!

Okay, so there isn't too much most companies can do with information on who is talking about what hurricanes or why. But, this concept is very transferable. Take for instance the rapidly changing market for processors in the midst of riding competition between Intel and AMD. What are people talking about when they mention Intel? AMD? Do people mention positive or negative emotions or ideas when talking about one or the other? The keywords could be set to include AMD and Intel, then the filter first split it into graphs of each then subdivide each graph into groups of keywords related to public sentiment. The use case is wildly different, but the differences in implementation are trivial.

Hopefully some light has been shed on the general approach a person can take when trying to make use of anonymous personal data as well as why it can be such an important tool to a company or other organization.

Custom Battery Packs

September 28, 2019September 28, 2019Ethan

Why would you need a custom battery pack?

Building a powerwall for your home
Need portable power for tools or entertainment
Backup power for emergencies
Replacing lead-acid UPS batteries
Additional battery for your laptop
Replacing batteries in old devices or tools
Building an electric bike
Repairing solar lights
Any other powered devices

If you are able to provide dimensions, required capacity and power ratings needed for your project or repair, I can design and build a battery pack for you. Depending on your requirements, the pack can be more focused on cost per capacity, volumetric energy density, or simply designed to fit an odd space, like in a roomba or bike triangle.

Regardless of your needs, I am confident I will be able to help accomplish your goals and get your project moving along, just reach out to me for details and we can get started!

Battery Recycling

If you have batteries or battery packs that you are unsure how to dispose of, I am currently accepting drop offs! I am taking old batteries, testing and disassembling them, then discarding unusable cells and components to local centers for recycling. I am only able to accept drop offs or pick up in the Dallas area currently, but if you need assistance feel free to reach out and I can help.

What will I dispose of for you?

Phone and laptop batteries
Automotive batteries
Used solar light batteries
Individual cells from RC devices
Dead packs from electric devices (power tools, hover boards, roombas)
Any other worn out or unusable batteries you have

Disposing of batteries can be a pain, but every bit we keep out of the trash helps the environment and supports a culture of reusing and recycling, so please reach out if you need help!

DIY Mains LED Panel

July 14, 2019September 2, 2019Ethan

To help with taking higher quality pictures, I thought I should improve my lighting in my workspace so I decided to see how hard it would be to make some LED lights that get power from ordinary mains voltage.

I want to start with saying that I would not leave these plugged in or unsupervised as they lack many safety features. I put this together for use in taking photos, so they are only plugged in while supervised, and for that they serve their purpose just fine. If you are interested in putting together a light of your own that can be used for ordinary lighting, it would be best to implement some basic safety features, which we will go over later in this post.

The design for this circuit is quite simple, it is made up of a rectifier, a smoothing capacitor, and a load (the LEDs). To make things a bit better and safer, we add current limiting resistors to save our LEDs, an inrush current limitor to prevent blown breakers, and a capacitor discharging resistor to prevent any unexpected shocks after removing power from the device. Alright, so the general design is out of the way, how should you determine what values to use or how many LEDs you need? Let's start with the LEDs and what we know about basic white LEDs.

White LEDs have a voltage drop of 3-3.7 volts, generally take from 5-150mA, and produce about the equivalent of an ordinary light bulb with every 10 watts of power put in. So to get a decently bright light we want to make an array of LEDs that can handle 10 watts. How much current can the LEDs you use handle? Mine are rated for 150mA so I should use about 10/0.15= ~66 of them, but my board only fit 44, so I did that. It's more to see how well it would work anyways.

In addition to considering total number needed, you'll need to consider how you will arrange them. If you arrange them such that the voltage drop is near your source voltage (~170VDC for rectified 120VAC) your current limiting resistor will be held responsible for dropping a very small voltage, which means if the source voltage fluctuates, the current will fluctuate more. Take for example if your LEDs dropped 150 volts and your smoothing capacitor was able to limit ripple to about 3 volts, the voltage dropped by your current limiting resistor would fluctuate from 17-20 volts, this is considerable but not excessive. If your LEDs dropped 160 volts and your smoothing capacitor was the same, you would be dropping 7-10 volts, so you would have fluctuations of 30%, that is substantial. Because the load is primarily resistive, fluctuations in voltage across the resistor are proportional to fluctuation in current, which for an LED means fluctuation in brightness. More LEDs in series mean less power wasted in the resistor and less waste heat, but also means it will be more sensitive to ripple.

Alright, so we know a bit more about LEDs and the implications of how we decide to arrange them. What value should we use for the current limiting resistor? That depends. For the resistor, take a look at the datasheet for your LEDs, what current do you need through them and how much voltage is dropped by the array at that current? If you use 40 LEDs in series and need 100mA and they drop 3.5 volts each at 100mA then they will drop 40*3.5=140 volts. So we need our ~170VDC to be reduced to 100mA after dropping 140 volts on our lights, so we need to drop 170-140=30 volts across our resistor at 100mA. Time for a little math.

 Recall our formula V=IR 
 So we have 30=0.1*R
 R=30/0.1=300 Ohms

Not too bad, we've got most of our design done. Now we need to decide on a value for our capacitor. We first make sure it's voltage rating is good for at least 170VDC plus some room to accommodate brief surges, so say a minimum of 250 volts. Now what capacity to use? As much as I would like to be able to give you a formula for this, it is much easier to use a tool online alongside the information we gathered above.Come up with what ripple you are comfortable with personally, lets say 5% for the sake of simplicity (also the threshold for changes in brightness that can be perceived). If we go with our design with 40 LEDs, we have a voltage drop of 30 volts at our full 170VDC, and we want that to never drop beneath 30*0.95, that is 95% brightness. In order to do this, our smoothing capacitor needs to be able to prevent the voltage from dropping any more than 1.5 volts, to find out how to achieve this we will turn to an online tool. We have a starting voltage of 170VDC, a load of 100mA at that voltage, and a frequency of 120Hz when using a full bridge rectifier, so we need to be able to sustain a load of 100mA for 83 milliseconds. To get that current, I approximate our load as a 1700 Ohm resistor. But with these requirements, we would need a large 550μF capacitor, so you can either go with that or lower your current, number of LEDs, or increase your ripple tolerance.

See this example plugged in at Must Calculate it's a tool I have used for years now.
Now we have our components, just wire everything according the schematic and add your high value resistor across the capacitor terminals and the project is complete! To make mine a bit safer to handle, I designed and printed an enclosure and coated the exposed components in silicon modified conformal coating to prevent accidental shock. Happy building!