My homelab setup

Posted on 2023-12-29 in Software

Welcome to my homelab!

If you don't know what a homelab is, it refers to running servers from your home. What is a server? It's basically a computer that is always on, connected to the network, often with no display or input. For a longer explanation take a look at this article.

The homelab movement crosses paths with self-hosting (running software services for your own use), but they are not the same thing. You can have a homelab not dedicated to self-hosting services. You can self-host using computers not in your home (e.g. on a hosting provider). But I believe there is a considerable overlap of people interested in both, as is my case.

My setup is quite involved and some people will be reading this article and think “this is way too complicated, I cannot do any of this”. And they would be right! BUT, the good news is that you don't have to do things the same way I do. There is a whole gradient of possibilities with varying degrees of difficulty, effort, reward, control and price, and it's normal to combine them:

  1. Use paid hosted services, i.e. you pay and get access to a software that somebody else manages. You don't get as much control but it's simpler, data might be encrypted and the provider doesn't have the incentive to sell your data (because you already pay for the service). Some of these services are actually open-source software, and you can later migrate to your own instance if you so desire. As an example, I use Protonmail as my email provider, because setting up your own email is ever more complicated to get right.
  2. Use hosting services that make it easy to deploy apps. Same as the previous point, I would prioritize open-source services that you could migrate to your own instance later on. Some examples are Sandstorm and YunoHost. I personally used OwnCube before migrating to my own self-hosted Nextcloud instance.
  3. Run your own services, but on off-premises hardware (i.e. with a hosting provider, or in “the cloud”). There is a wide range of hosting possibilities, from dedicated servers (i.e. a whole real computer) to shared hosting where you share a computerf with other users (separated by some virtualization or containerization techniques), and others. You can read Wikipedia on Internet hosting for more details. Personally I use Vultr to host some services I want to be always online and publicly accessible to the Internet.
  4. Dedicate a computer at home as a server. You can start cheap and small, like a Raspberry Pi 5 for $80. Or you can use some spare computer you don't need anymore, like an old laptop. You can evolve your homelab as you understand your needs better.

There are countless resources on self-hosting and homelabs, if anything the difficulty lies in knowing where to start. My recommendation is to ask people you trust about what to focus on to avoid getting overwhelmed.

Nevertheless, here are some resources:

No matter what you choose to do I encourage you to read and educate yourself about digital privacy and free software. I believe it's time well spent because our lives are increasingly more digital, not knowing about these topics leave you open to abuse.

Lastly I want to thank my friend Harmeet for his encouragement to write this article. It probably wouldn't have happened without him!

Some history

I'm old in Internet terms. Ancient.

I was browsing the World Wide Web and participating in Usenet in 1995. In 1999 I tried out GNU/Linux for the first time. By 2003 I had eradicated Windows from all my computers. The free software bug really bit me deep.

During this time I saw Google's birth as the best search engine of its time, hands down. They had the best developers and made some amazing products. I really believed in Google's “Don't be evil” slogan, and was an enthusiastic user for years. I got a Gmail account back when it required an invite to get in. I put my events in Google Calendar, my photos in Google Photos, my (shared) documents in Google Drive. I got an Android phone because it was open source. I even used Google+ and Google Wave actively.

Fast-forward to 2013 and Edward Snowden's revelations shatter my confidence to pieces. Google was not to be trusted, nor any of the big Silicon Valley companies. I resolved to trace a plan to take back control of my digital life. That plan took years to unfold, but every day it passes I'm more glad that I put in the effort.

This article will not be a chronological story, but only the latest version of my setup.

Goals

My main goal should be clear after reading the story above: take control of my data. I don't want others to abuse it, sell it to innumerable third parties that will analyze it, profile it, use it to target me with ads or misinformation campaigns, train AI models with it... I also don't want to see my accounts closed one day for no good reason, my valuable data gone like tears in the rain. Having my data at home gives me peace of mind.

A secondary benefit of having all my data locally available is that I get to do stuff with it. Things like directly sharing with people, without size limits or accounts nor anybody else knowing about it.

With the advent of powerful open-source AI models I'm excited about the possibilities of having my data readily available:

  • Train LLMs with all my text, so I can query it for knowledge or ask it to write in my style.
  • Fine-tune Stable Diffusion models using my photos so I can generate images portraying me, my family and friends.
  • Use coding LLMs to analyze and write code.
  • Train voice models with my own voice, I could make podcasts out of these blog posts for example.
  • Who knows what the future holds! The field is moving at breakneck speed.

My guiding principles are simplicity and stability. The less moving pieces the better. I favor software that is light on resources, properly maintained and that respects backwards compatibility or has clear ways to upgrade. I have limited time so I don't want things to break often.

Using free software is paramount so I can understand what is going on, diagnose problems and fix them, and also that I can trust the software is not doing something I don't want it to.

Summary

My current setup is composed of 3 systems, not counting regular personal computers (laptop, desktop, mobile, tablet...):

  • A home server for data storage and personal apps.
  • A HTPC (Home Theater PC) media player.
  • A VPS for public apps/websites.

I will go over each of them in detail.

Home server

The heart of my setup is an HPE ProLiant MicroServer Gen10, bought in 2018. It has a modest dual-core AMD Opteron X3216 CPU and “only” 8 GB of RAM, but that has proven plenty for my needs, which mostly boil down to storage. It costed 375€ (about $400) back in the day.

The main selling point for me was having a sufficiently powerful GPU, because at first it doubled as an HTPC. Nowadays I have a dedicated HTPC media player, so my pick would have been different, but I'm not replacing it yet.

The server has been happily running Debian GNU/Linux stable all this time. I have been using Debian for over 2 decades and it feels right for servers. If I had more free time I would love to dive into NixOS or Guix, they make for rock-solid systems due to their reproducibility and ability to roll-back to any previous system configuration.

Storage

The server has a 256 GB SSD that I use for the OS and software, and then 4x 6 TB HDDs for data hosting. Storage keeps getting better and cheaper all the time. I paid $189 for each HDD, but nowadays you can a 10 or 12 TB drive for the same price.

Reliability is very important to me, as these hard drives will hold my personal data. I picked the top performer in Backblaze's drive reliability report and all 4 drives have been performing well since 2018.

I use SnapRAID to have a software RAID 5, by dedicating one HDD for parity. That leaves the server with 18 TB of available storage. A cron job runs a SnapRAID sync every night and emails on success or failure.

If a drive fails I might lose up to 24 hours of data, but it shouldn't be fatal as the data should be available somewhere else (e.g. my photos are also in my computer). I prefer the software RAID as I can recover files or undo unwanted changes before the next sync runs. This has proven useful more than once.

The 3 non-parity HDDs are combined into a single logical filesystem using MergerFS. It has worked fine all these years, set and forget.

All drives are LUKS-encrypted. Because the server is headless I set it up so it can be decrypted on boot by connecting an USB drive.

Network

The server is exposed to the Internet because it's going to run services I want to use from anywhere. I configured my home router to give the server a static IP address, and set it as the DMZ.

I don't have a dedicated IP address in my home internet connection, so I use a Python script that updates the necessary DNS records by using my DNS provider's APIs. A cron job executes the script every 10 minutes.

I use Firewalld to manage the firewall. I was using UFW before but it didn't play well with Docker: it will pierce the firewall with the default configuration.

I allow external SSH connections, but only with public key encryption, never passwords.

I use Samba for file access.

UPS

I use an Eaton 5E 850 IUSB in front of the home server, the router and the fiber modem. This gives me a few minutes of runtime if the power goes down. I paid 62€ for it in 2018. A good UPS is expensive, this one is not.

NUT monitors the UPS and gracefully shuts down the server if its battery drops below a certain threshold.

Applications

The home server runs applications that fulfill these criteria:

  • Used by me and maybe family/friends.
  • Need access to private data.
  • Heavier on CPU/RAM/storage than I'd like to put on the VPS.

All applications run with Docker compose.

Caddy

Caddy is a webserver focusing on ease of use. It improves my life by having much simpler configuration with better defaults and automated TLS certificates. Yes, you read that right: Caddy will automatically obtain and renew a TLS certificate for every domain in its configuration. No certbot, no cron, nothing but bliss.

My setup in Docker compose uses the caddy-docker-proxy image. It automates generating a Caddy configuration by looking at the labels of the containers. It's simple, flexible and effective.

Previously I was using the popular Nginx (and I'm old enough to have used Apache). See this article on my migration from nginx-proxy to caddy-docker-proxy.

Nextcloud

Nextcloud is “a safe home for all your data”. It's a platform that not only hosts and syncs files across multiple devices, but it offers a plethora of applications (both first and third-party) to do all kinds of things.

The apps I currently use include:

You can browse the full list of Nextcloud apps.

My setup currently uses the nextcloud image, but this is suboptimal because it runs an Apache instance internally. Given that I'm already running Caddy in front it's a complete waste of resources, but not so much to make it urgent.

I'm assessing migrating to the official all-in-one image. I will write a blog post about it when it's done.

Transmission

Transmission is a BitTorrent client that can run as a daemon, i.e. it runs in the server and I connect to it from my laptop or my phone.

I use the linuxserver/docker-transmission Docker image.

Mumble

Mumble is an “Open Source, Low Latency, High Quality Voice Chat”. I use it mostly for gaming where low-latency really makes a difference.

I use the mumble-docker image.

HTPC (Home-Theater PC)

Originally the home server doubled as media player: it was in the living room and it has a good-enough graphics card to play videos. However after moving to a new house the server is in a different room, so I had to get some kind of HTPC to play my media.

This move happened during the COVID-19 chip shortage so my preferred option was not available. I couldn't wait so I ended up getting a Raspberry Pi 4 Model B, which was in stock.

Nevertheless the Raspberry Pi performs well enough, decodes H.264 and H.265 in hardware, can do audio passthrough to my home theater, and is definitely a solid choice with a big community and great compatibility.

It currently runs Kodi in LibreELEC.

It accesses the media files in the home server through a read-only Samba share.

VPS

I have a small VPS (Virtual Private Server) at Vultr that I dedicate to public content, like this blog and my personal website. It's meant to run services that are open to the Internet, and that should be always up. This is not part of the homelab of course, just self-hosting, but I thought it'd be valuable to include here.

I use Caddy webserver for all static file serving and reverse proxying. Its benefits were already highlighted above so I will not repeat them here.

Besides web hosting the VPS also runs a Shynet instance that tracks analytics in a privacy-respecting way. I got it to work with a SQLite DB, which simplifies deployment and maintenance. I also use Podman to run it. I prefer Podman over Docker because it's lighter (no daemon running) and it's rootless without extra hacks.

If you want to try out Vultr consider using my referral link for some free credits.

Replication

To synchronize data between devices I use Syncthing. It's a beautiful piece of software, fast and end-to-end encrypted. I highly recommend it over something like Dropbox, which is sharing your files with OpenAI.

You might be thinking why would I use a separate app for file syncing when I already got Nextcloud. When I started using Nextcloud in 2018 its file syncing was very buggy for me, which is why I looked at more robust alternatives. From what I hear Nextcloud file syncing has gotten much better, but now I'm happy with my Syncthing setup. If it ain't broke...

I won't go into the details of my whole setup, but these are some Syncthing tips & tricks I have been discovering and applying:

  • The home server acts as a central always-online node. For example I synchronize the photos taken in my smartphone with my laptop, I also share the same directory with the home server so the phone can sync to it even if the laptop is offline. Then when the laptop comes online it can download from the server and/or the phone if available.
  • Syncthing supports configuring multiple addresses per device. Most of the time my devices are in my local network so I set local network addresses first, then DNS names and then dynamic as a fallback.
  • Use QUIC protocol if possible, especially on mobile.
  • Judicious use of file versioning. I'm particularly liberal versioning files in the server, because it's got plenty of storage. I favor staggered file versioning most of the time.

Backups

I use restic to make weekly backups of all important data in all servers. Important is anything that cannot be downloaded again from the Internet. It has worked well all these years, I only had to use it once and it performed as expected. My current backup is 2,5 TB big and it takes 8 hours to run, a few more hours if restic prune runs.

The script that cron runs weekly is published here.

If I had to set up the backups again today I would look into Borg. They have a nice desktop client called Vorta which I have been using for backing up my laptop.

Uptime monitoring

I use UptimeRobot to monitor the uptime of my services. If they are down I get a nice email about it.

At the time of this writing they give 50 monitors with a 5 minute period for free, which is more than enough for my personal use.

If you want to try it out consider using my referral link, I'd appreciate it.

The future

There are many things I'd like to do, and too little time to do them:

  • Monitoring and alerting for both the home server and the VPS.
  • Pihole in the home server.
  • Update Nextcloud setup to use Nextcloud All-in-One.
  • Wireguard VPN to tunnel to the home network from anywhere.
  • Paperless NGX in the home server.
  • Jellyfin in the home server for media access for the family.
  • Set up fail2ban for as many services as possible.
  • Set up game emulators in the media player.

Rest assured there will be new articles as these items get knocked down.

Want me to guide you in your self-hosting journey?

Send me an email with your questions and I will do my best to help you. For longer questions or specific problems we can set up a video chat.

I also do consultancy on web development. Read all the details in my Consultancy page.

Contact me at fidel@openwebconsulting.com.