zenodotus280

Wrestling with Jellyfin

SYSLOG_25-W05

The Ceph cluster which has been working flawlessly (aside from some user error). This week is entirely about Jellyfin running across all nodes using hardware transcoding. One of the reasons I wanted identical hardware for all nodes is so that I don't need to treat them any differently - fungible servers. Though, with the RAM issues I had and some more reflection I think the better setup is the active-standby model with a third, less performant, node that can act as a witness and can do some non-essential operations or act as a temporary staging/testing area. This is effectively my setup now since "hydra1" has 24GB rather than 32GB like "hydra{2,3}".

1. Setting Up Hardware Transcoding

With Intel QuickSync or VA-API acceleration, I can handle high-bitrate videos of a variety of codecs (except for AV1) without pegging my CPU at 100%. Most people recommend keeping a separate 4K library but I would like to at least try to see what all the fuss is about. I don't have a 4K TV so any 4K stuff I get will have to be transcoded. Unfortunately...

... hardware transcoding is broken.

I'm going to skip ahead now (I have a more detailed write-up but I'll post that separately) to the specific issues and exactly what it took to resolve:

  1. The "renderD128" device in the container is actually owned by "kvm" as GID 104. But changing the "GID in CT" value to 104 in Proxmox lead to failure.
  2. I removed the "lxc.mount" based on some changes in Proxmox and could at least boot the container.
  3. I reset the GID to 105 in Proxmox and I was now getting "render" for the group for /dev/dri ... but no transcoding yet.
  4. In the Jellyfin logs I found the culprit: "failed to open segment ...". Because I have mounted my transcoding area as a separate disk the permissions got reset when I migrated to the cluster for some reason. chmod 777 /srv/transcodes and I was back in business!

In short, hardware transcoding is easier than ever with the previously linked Github discussion. Most tutorials are out of date and are more complicated than needed. I added complexity by separating my transcodes (to keep my backups reasonable) but this would have otherwise been a matter of changing a single value in Proxmox from 104 to 105 with no change in the container itself.

Inter-Node Migration Testing If this were a VM I could try a live migration. But it's an LXC so I'll just have to accept the portability. Unfortunately: TASK ERROR: Device /dev/dri/renderD128 does not exist on hydra3.

Which is not true. It exists with identical permissions. When I ran it again it worked without issue. I migrated it to hydra1 and had the same issue but still worked on the second attempt.

Lastly, I moved it back to hydra3 and it booted on the first try. Then let it run for a bit before migrating it without shutting down back to hydra2.

All clear! Everything working as hoped.

Created a backup and rebooted the node just to do a final verification. All clear.

2. Trickplay and BIF Files

Jellyfin’s Trickplay feature (preview thumbnails when you scrub through the timeline) is no longer a separate plugin but built-in. Generating these from .BIF files took hours, and my container ballooned in size. After letting it finish, I cleaned up leftover .BIF files, dropping the disk usage back down. It was pretty simple to switch over:

  1. enable trickplay (under Server>Playback>Trickplay)
  2. enable per-library
  3. configure settings
  4. convert .BIF to native format
  5. monitor the increasing size of the JF boot disk... it hit 43GB Successfully converted 2170/2170 attempted .BIF files! Successfully converted 2170/2170 attempted .BIF files!
  6. I'm going to skip verifying the conversion was successful. I can re-generate later if needed.
  7. Delete the leftover .BIF and watch the boot disk usage drop back to ~29 GB. Successfully deleted 2220/2220 .BIF files!
  8. uninstall plug-in and remove repo

I also removed the IntroSkipper plugin and repo as it was listed as incompatible but it's seems stuck and I can't get rid of it. Doesn't hurt anything though.

I manually ran the Generate Trickplay Images task and it took about two hours to complete.

3. Quotas on Proxmox Backup Storage

My PBS datastore had no set quota and I maxed it out. It broke my backups and I had no way of recovering. I wasable to delete files but for some reason PBS waits a full 24 hours before liberating that disk space as free. It was faster to just start over with a new datastore since I didn't have a long history of backups as this was the virtualized instance of PBS and not the hardware one.

Thoughts? Leave a comment