Deconverging Cairo

Published on January 12, 2025

SYSLOG_25-W02

1. Syncthing Swap

My primary server is "srv-cairo": a Proxmox hypervisor running a ZFS mirror of 6 TB hard drives in an old Silverstone HTPC case. I added a pair of cheap M.2 NVMe drives to store all the VMs and containers with the OS on a $20 SATA SSD. I was running Syncthing on the host as the master node since it could directly access the files with no NFS intermediary.

In preparation for its replacement (separating the storage from the compute) I turned attention to reworking this. I set up a dedicated (privileged) LXC for Syncthing, mounted the essential directories, and joined the other Syncthing nodes. It worked okay initially but I soon discovered that Obsidian does not like accessing files over the network at all. I had a ton of sync conflicts and duplicate files that took hours to fully fix. The solution was to simply make the Obsidian share "Receive Only" for the LXC. My phone, laptop, and desktop all have access to Obsidian through Syncthing so having the Master node be receive only for that subset of data effectively just makes it a backup copy.

2. ZFS Datasets and NAS Swap

Nesting regular "data" within a top-level dataset is better than directly under the root. Here's a good example why:

dozer/subvol-501-disk-0
dozer/vm-104-disk-0

Proxmox will create datasets directly on the pool. It would be messy to see something like:

dozer/Media
dozer/subvol-501-disk-0
dozer/Transcodes
dozer/vm-104-disk-0
dozer/YouTube_DL

I will be storing my own datasets under /AVALON/data/{A..Z} for the new NAS. The overall theme will be that... srv-cairo.home.arpa:/tank ... replaces ... avalon.home.arpa:/AVALON/data ...therefore... sed -i.bak 's|srv-cairo\.home\.arpa:/tank|avalon.home.arpa:/AVALON/data|g' /etc/fstab

I will write about the FreeBSD NAS host another time as it was a considerable undertaking. Once I updated the FSTABs, I rebooted each container or VM to confirm the new NFS shares were recognized.

3. Building a Proxmox + Ceph Cluster

With the storage offloaded from Cairo that leaves only the VMs and containers. Before those can be moved I needed to set up my long-awaited solution: high-availability Ceph cluster with a mesh network. This means that I don't even need a switch for the traffic between each of the three nodes - they are connected directly to one another and the only single point of failure is the power supply to the house. A UPS doesn't really make sense: if I'm not home the servers will die before I can do anything about it; you'd need a generator backup to actually maintain uptime.

Node Setup: Each node got multiple 2.5 Gb NICs so that they are always connected to at least one other node.
Routing with OSPFv3: I can pull the ethernet cables on one node and traffic automatically routes to the others! 5 or more nodes would be pretty chaotic and at that point it would probably be better to have redundant switches using CARP or something.
Forming the Cluster: I created a cluster on Node 1, joined from Node 2 and Node 3, and made sure Ceph used IPv6 addresses. Then installed monitors, managers, and OSDs. The final step was building a Ceph pool, which gave me shared storage for future VMs.

This process was also a substantial effort and I will end up writing several posts on the entire thing: the hardware, the cluster with Ceph, the mesh network, and configuring HA along with some caveats.