Self-hosted monitoring: from Netdata through Grafana to Beszel

An honest comparison of self-hosted monitoring stacks from someone who actually runs them. Why I left Netdata, ditched Grafana + Prometheus, and settled on Beszel, plus other options worth knowing about.

People come into self-hosting in different ways. For me, like for many others, it started with one app I wanted to install on my own server. At first it was mostly technical curiosity, a desire to figure out how everything works and learn something along the way. Then you get hooked and start adding another app, then a third, a fourth, and before you know it you have a whole zoo of servers, some of them running several apps each.

So sooner or later the question shows up: how do you keep an eye on this entire zoo of servers and applications? Initially I enjoyed SSHing into each server by hand, running updates manually, watching vital signs through the usual utilities (top, htop, btop), checking app logs, and so on.

TL;DR: After Netdata, then Grafana + Prometheus, I settled on Beszel and I wish I had found it sooner. Below I explain why in detail, who it is for, and what Beszel does not do. This is not a Beszel review; it is about which solutions I tried and what I picked for monitoring my small infrastructure.

But I quickly got tired of playing “cool hacker” from the movies, and the whole thing turned into a dull chore I did not always feel like doing. That said, since many of these servers face the open internet, watching over them, regularly updating apps, checking logs, and so on is not optional. So every self-hoster eventually has to solve the problem of efficiently monitoring their production servers and the apps running on them.

Over the last couple of years I have gone through three different monitoring stacks: first Netdata, then Grafana + Prometheus, and finally Beszel. Each one taught me something, and I ended up nowhere near where I started. If I had read a post like this back at the beginning, it would have helped me a lot.

👉 If you have a handful of VPSes, a couple of mini PCs, a NAS, and maybe a few Docker hosts, this is for you.

👉 If you have a team of 500 engineers, this is not for you. Go back to your Datadog dashboards.

Before I dig into the tools, let me describe the problem, because the right answer depends entirely on what you are trying to do.

Right now I have seven servers on Hetzner (a mix of CX22 and CPX21), all running Ubuntu, all provisioned with Terraform, managed with Ansible, and most of them running Docker containers. I needed:

  1. CPU, memory, disk, and network per host, with a history going back a few weeks.
  2. Resource usage per container (all the interesting stuff lives in Docker).
  3. Alerts when something is on fire or about to catch fire. The single most important part of monitoring.
  4. A clean overview dashboard I can glance at in five seconds when I need to.
  5. Minimal overhead from the collection agent, so it does not eat the server’s resources.
  6. Something I can eventually show to clients. I am building RunMyApp, a managed self-hosting service for small businesses in the EU, and “we proactively monitor your server” is part of my value proposition.

What is not on this list yet: log aggregation, distributed tracing, APM, and SRE-grade observability. Logs are the one thing I still need to solve. So if you have an interesting option, let me know.

Netdata was my first serious monitoring tool, and honestly, it is genuinely cool. The agent installs with a single command. The Cloud dashboard is dense, beautiful, and useful, showing everything: CPU, memory, disk I/O, network, every running process, every Docker container, even per-device temperatures.

For the first three or four servers, Netdata was perfect. One-liner install. Dashboards just work. The integration list is huge. PostgreSQL, Nginx, Redis, Traefik, you name it, there is a collector for it. Alerts via Discord webhook take ten minutes to set up. But in the end I dropped it for two reasons.

First, agent resource usage. The Netdata agent collects an enormous amount of data at high resolution, which is great when you need it and noticeable when you do not. On my modest VPSes, the agent could easily chew through a few percent of CPU and 200 to 500 MB of RAM depending on what the host was running and what you wanted to monitor. On a 4 GB machine running a couple of services, that is a real tax.

Second, pricing. Netdata Cloud is free up to 5 nodes. After that the Business tier is $4.50 per node per month on annual billing, which works out to $54 per node per year. For a fleet of seven nodes that is roughly $378 per year. For a homelab budget, that is a lot.

To be fair, they now have a Homelab tier at $90 per year for unlimited nodes under a fair-use policy. When I made the switch this did not exist yet, and it is a genuinely fair offer if you are using everything strictly for yourself. But two things still hold me back. First, the Homelab tier explicitly forbids commercial use under its policy: “monitoring your company’s infrastructure”, “providing monitoring services to clients”, “use in a business context”. RunMyApp falls into that bucket literally, so for me only the Business tier works. Second, the dashboard lives in their cloud, not on my infrastructure, and I prefer to own the entire stack.

I still really like the platform, and I think Netdata is a great product. If you have fewer than five nodes, or you fit the Homelab tier and you are fine with a cloud-hosted dashboard, just install Netdata. It will genuinely cover all your needs, because it has everything. But if you want fully self-hosted and (almost) zero per-node cost, read on.

The “real” answer in the self-hosting community is always Grafana + Prometheus. It is the open-source observability stack, the industry standard, infinitely flexible, and every employer in the world wants to see it on your resume. How could you not try to figure out this rocket ship?

The initial setup, honestly, was not that scary. Prometheus in Docker, node_exporter on every host, cAdvisor for container metrics, Grafana as the frontend, all glued together with a couple of docker-compose files. With an LLM at hand, I had a working stack up in one evening.

Then I tried to actually use it.

That is where the fun started. The default Grafana dashboards you import from the community gallery look impressive in screenshots, but you get lost in them immediately when you start using them in practice. Especially when you have zero experience in this area (like me).

Half the panels showed “No data” because the metric names in the dashboard did not match what my version of node_exporter was exposing (classic example: node_cpu from old dashboards versus node_cpu_seconds_total from current versions). The other half showed data, but laid out or styled in a way that did not work for me at all.

So I started building my own dashboard. God, that is a separate art form you could spend a lifetime on. To make a panel in Grafana that shows something useful, you have to write PromQL. PromQL is a beautiful and powerful query language, and you do not learn it in one evening. I spent hours figuring out how to express “average CPU usage per host over the last 5 minutes, excluding idle and iowait” without getting back either zero or infinity. In the end I got something that vaguely resembled the Proxmox status page, and for about three days I was proud of it.

Then I added a new server.

Adding a server meant re-labelling, re-checking scrape configs, editing dashboard variables, and reworking template queries so the new host would show up in the dropdowns. Removing a server was even worse, because the historical data stuck around, and the dashboards showed gaps and ghosts. Every infrastructure change turned into dashboard work. Maybe I just do not know how to do it properly. But it is definitely not a quick solution.

On resources: Prometheus alone idled around 300 MB of RAM on the central monitoring host, and node_exporter + cAdvisor added another 50 to 100 MB on every monitored host. With a couple of weeks of history and seven nodes, the central Prometheus crept closer to 600 to 800 MB. Not a disaster, but not exactly light either. Maybe, again, the pros know how to make it use fewer system resources. But I am not a pro. And honestly, I did not even feel like grinding my digital friend (AI) about it.

I gave up after about three weeks. Not because the stack did not work, but because for me the cost of running it was wildly out of proportion with the value I was getting back. I was spending more time tuning Grafana than actually watching my infrastructure.

Verdict: Grafana + Prometheus is the right answer if (a) you have a stable fleet that rarely changes, (b) you genuinely need custom dashboards and queries, (c) you want to learn the stack for professional reasons, or (d) you also need log aggregation (Loki) and want everything in one place. For a small, constantly changing homelab where you just want to know “is the server alive”, it is brutal overkill.

I came across a mention of Beszel in some thread on a self-hosting forum, and at first even the name did not inspire confidence. But then I installed it one fine Sunday morning, and by lunchtime I had migrated all seven servers to the platform, and I have been living on it ever since. I spent more time writing this paragraph than installing Beszel.

Beszel is a lightweight self-hosted server monitoring tool built on PocketBase. MIT-licensed open source, one Go binary plus a tiny agent, and it does exactly what 80% of users actually need from monitoring: host metrics, Docker container stats, historical graphs, and configurable alerts. No PromQL, no YAML scrape configs, no poking around in dashboard JSON. The dashboard is ready out of the box and just works. There are light and dark themes. And I already mentioned the alerts, right?

The numbers speak for themselves. The Hub itself uses about 30 MB of RAM. The agent uses about 10 MB per host. Compare that to 300 MB for Prometheus or several hundred MB per agent for Netdata. On a small VPS, that is the difference between “monitoring is invisible” and “monitoring is another tenant”.

One important architectural note: the Hub connects to the agents (it is a pull model over SSH), not the other way around. That means on a host with an agent, port 45876 needs to be reachable by the Hub, and you configure your firewall accordingly. In exchange, the Hub’s public key lives on the agents, and no tokens or passwords are passed around in the clear.

Here is what I get out of it:

  • A great dashboard showing all servers at once, with CPU, memory, disk, and network info.
  • Click on any server and you see detailed time series for resources plus a list of Docker containers with their resource usage.
  • Configurable global or per-server alerts on CPU, memory, disk, traffic, temperature, and host status, sent wherever Shoutrrr supports: Discord, Telegram, email, ntfy, Slack, the usual lineup.
  • Multi-user support, so you can give clients read-only access.

What I love most: when I add a new server, I run a single command on it, paste the key into the Hub UI, and the new host shows up on the dashboard. No re-templating, no re-labeling, no PromQL. When I remove a server, I delete it from the Hub and it is gone. The dashboard adapts, because the dashboard is the same for everyone.

I want to be honest about this, because this is the main reason Beszel will not be for everyone.

Beszel does not handle logs. It monitors resources. If you need centralized log aggregation, search, and analysis (Loki, the ELK stack, SigNoz, OpenObserve), Beszel is not your tool. You either run something alongside it, or you give up the simplicity and go back to the Grafana stack with Loki bolted on.

In my case, that is fine. I rarely need to query logs across hosts. When I do, I SSH in and do it by hand (well, with carefully built up scripts). But if you are running a production app and you need to correlate errors across services, Beszel alone will not cut it.

What else Beszel does not do: APM, distributed tracing, custom metrics from your apps, synthetic checks of external endpoints, SLO tracking. It is bluntly a simple host and container resource monitor. That focus is exactly what makes it so good at what it does.

The short version, here is how all three stacks look side by side for my scenario (7 nodes on Hetzner):

ToolHub RAMAgent RAMCost 7 nodes/yearSetup timeLogs
Beszel~30 MB~10 MB$05 minutesNo
Netdata Cloud (Business)in cloud200-500 MB$37810 minutesBasic
Netdata Cloud (Homelab)in cloud200-500 MB$90 (non-commercial)10 minutesBasic
Grafana + Prometheus600-800 MB50-100 MB$0Weekends, pluralNeeds Loki

RAM numbers are approximate and depend on your config and retention. But the order of magnitude tells the story.

Since we made it this far, here is how I would install Beszel today. Assuming you have Docker and docker-compose on the machine that will run the Hub, and SSH access to the machines you want to monitor.

On the machine that will become your monitoring Hub, create a directory for Beszel and put this in a docker-compose.yml file:

services:
  beszel:
    image: henrygd/beszel:latest
    container_name: beszel
    restart: unless-stopped
    ports:
      - "8090:8090"
    volumes:
      - ./beszel_data:/beszel_data

Start it:

docker compose up -d

Open http://your-host:8090 in a browser. It will prompt you to create the first admin.

If you want HTTPS (and you do, especially if you are exposing this to the internet), put Caddy or Traefik in front of it. Caddy is simplest. Two lines in a Caddyfile are enough:

beszel.yourdomain.com {
    reverse_proxy beszel:8090
}

For Caddy in a container to reach Beszel by the name beszel, they need to be on the same Docker network. Either start them in one docker-compose file, or give both an external network and connect them. If Caddy is on the host rather than in a container, replace beszel:8090 with localhost:8090. Caddy will fetch a Let’s Encrypt certificate on its own. Hub is ready.

In the Hub UI, click “Add System”. Give it a name (I use mnemonic hostnames, but it can be any text that makes sense to you) and the IP or hostname the Hub can reach the agent at. Beszel shows a dialog with two tabs: Docker and Binary. On both, the Hub public key and the per-system token are already filled in; all you have to do is copy the ready-made command.

And this is where it gets really nice: Beszel made the agent install so simple it deserves a screen recording. I am planning to make one, by the way, so feel free to subscribe to my YouTube channel so you do not miss it.

Option 1 (recommended): a single bash script. On the Binary tab, hit “Copy Linux command” (or Homebrew/Windows/FreeBSD, whichever fits) and paste it on the host you want to monitor:

curl -sL https://get.beszel.dev -o /tmp/install-agent.sh && \
chmod +x /tmp/install-agent.sh && \
/tmp/install-agent.sh \
  -p 45876 \
  -k "ssh-ed25519 AAAAC3Nz... (your Hub's public key)" \
  -t "your-system-token" \
  -url "https://your-hub.example.com"

The script installs the binary, creates a systemd service, sets the token and public key, and starts the agent. A few seconds later the system shows up in the Hub as “up” with live metrics. No docker-compose files, no manual env vars.

Option 2: Docker. If you live in containers and do not want a systemd service on the host, the Docker tab has ready-made “Copy docker compose” and “Copy docker run” buttons. The contents look roughly like this:

services:
  beszel-agent:
    image: henrygd/beszel-agent:latest
    container_name: beszel-agent
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      LISTEN: 45876
      KEY: "ssh-ed25519 AAAA..."
      TOKEN: "your-system-token"
      HUB_URL: "https://your-hub.example.com"

Bring it up the usual way with docker compose up -d.

Bulk rollout with Ansible. Beszel supports a universal token: one token that can register many agents at once, with the Hub automatically creating system records for them. This removes the manual “Add System” step in the UI for every host, and is a perfect fit for Ansible. I keep the token in Ansible Vault, run a simple shell task in a playbook with the curl command across a server group, and the entire fleet registers itself.

In the Hub, open the settings for each system and set thresholds. CPU above 80% for 5 minutes, disk above 90%, memory pressure, host down, whatever matters to you. Add notification channels in user settings, point them at a Discord webhook or Telegram bot, and alerts are live.

Beszel stores everything in ./beszel_data (the volume you mounted). Under the hood that is PocketBase’s SQLite, so there is a catch: a tar of the live database under load can give you a corrupt snapshot. Two options:

  • Easy: stop the container for a minute, tar it, bring it back up. Fine for homelab use.
  • Proper: use sqlite3 .backup for an atomic snapshot without stopping the service.

After that it is standard stuff: a nightly cron job uploads the archive to S3, B2, or wherever you keep your backups. In my setup it goes along with the rest of my PocketBase data.

That is the whole install. Seriously. From zero to monitoring a fleet of seven servers in under an hour if you take your time.

I have not personally tried many of these, but they come up in every monitoring discussion, and I decided to gather this list here for your convenience.

Uptime Kuma is a pretty, simple status-page-style tool for HTTP/TCP/ping uptime checks. It complements Beszel rather than competing with it. Beszel tells you the server is alive and well; Uptime Kuma tells you your services are answering public requests. Run both if you want.

SigNoz is a modern observability platform built on OpenTelemetry with OTLP support. Metrics, logs, and traces in one self-hosted stack, with a more modern feel than Grafana + Loki + Tempo. If you need logs and traces and want one tool, I would look at this first.

Zabbix is enterprise-grade open source that has been around forever. Very powerful, very heavy, with a UI that feels like it was designed in 2010. Worth knowing about, probably overkill for a homelab.

Checkmk Raw is the free version of a commercial product with Nagios genes. A solid option for sysadmin-style monitoring of mixed environments, but the UX feels dated.

Glances is a great single-host tool, CLI and web. It can export to InfluxDB and Prometheus, so it is not strictly limited to one host, but it still loses to Beszel on UX for a fleet. If you need to monitor exactly one machine and you love terminals, Glances is wonderful.

OpenObserve is a new lightweight alternative to Elasticsearch/Loki for logs and metrics. I have not seriously tried it, but it is on my “if I ever need logs” list.

Cockpit is a web-based server admin tool from Red Hat with light built-in monitoring. Good for single-host admin, not really a multi-server dashboard.

The point of this list is not to be exhaustive but to give you a starting set of fallback options. If Beszel does not click, one of these probably will. You can easily find detailed reviews of each one online.

If you strip away the brands, the choice usually boils down to four questions:

  1. How many hosts?
    • Fewer than 5: Netdata free or Beszel.
    • 5 to 50: Beszel, or Netdata Homelab if your projects are non-commercial.
    • 50+: you are probably already in Prometheus or SigNoz territory.
  2. Do you need logs? If yes, look at Grafana + Loki, SigNoz, or OpenObserve. Beszel will not help. If no, Beszel is hard to beat.
  3. How much time do you have for setup and maintenance? Beszel: minutes. Netdata: minutes. The Grafana stack: weekends, plural.
  4. Is this for you or for clients? Self-hosted gives you the trust story. Cloud-hosted gives you less to maintain.

For my needs today, Beszel is the right answer. For yours, maybe not, and that is fine. The real takeaway is this: do not default to the heaviest tool just because it gets talked about the most. Match the tool to your problem.

Is Beszel production-ready? For host and container resource monitoring, yes. For full observability with logs, traces, and APM, no. You need to understand what you are picking.

Does Beszel support Windows? The agent builds for Linux, macOS, and FreeBSD. Windows is not native; you need WSL2 or a container. For a typical Linux-based homelab, this is not a problem.

Can I monitor Kubernetes? Beszel monitors hosts and Docker containers, not Pods as Kubernetes objects. If you run k8s, you need kube-state-metrics plus Prometheus or something similar. Beszel is not for that.

How is Beszel different from Uptime Kuma? Beszel monitors server health from the inside (CPU, RAM, disk, containers). Uptime Kuma monitors services from the outside (does the HTTP endpoint respond, is the port open). They solve different problems and live happily side by side.

What about updates? Will something break? Beszel is a young project and develops actively. Releases come out regularly. I update every couple of weeks with docker compose pull && docker compose up -d, nothing has broken yet. But that is no excuse for not backing up the database before updating.

I went through Netdata, fought with Grafana + Prometheus, and settled on Beszel. The lesson is not that any one of these tools is bad. The lesson is that “monitoring” covers a wide range of daily tasks, and any popular answer is probably aimed at problems most of us do not actually have.

If you have a small server fleet and you want to know what is going on, without burning weekends on PromQL or paying for every node every month, install Beszel this afternoon. Worst case, you waste twenty minutes. Best case, you end up with monitoring you will actually use for years.

And if you do end up using it, the Beszel project is open source work by a single person that has grown into something brilliant. Star the repo, file good bug reports, and if you can, support the maintainer. The self-hosted ecosystem runs on people like this.

And if you would rather not deal with deployment, updates, and monitoring at all, that is exactly the case I am building RunMyApp for: your apps on your server, while we keep an eye on it. If you are interested, reach out.

I hope this post helped you get your bearings. If you want to hear about new posts, subscribe to notifications. I do not spam, because I do not like spam myself.

Good luck.