Skip to content

Roadmap

Info

Current status: ALPHA

East4Ming/Homelab2 Roadmap

Optimized for China

  • YUM/APT Repo
  • Docker Registry
  • NTP Server
  • Add domains to /etc/hosts and coredns configmap

Change OS To Ubuntu 24.04

  • pxe - use netboot.xyz
  • cloud-init - use subiquity autoinstall
  • dnf
  • sysctl
  • automatic
  • kured: rebootSentinelCommand

Non-PXE Install

K3s

  • Version
  • prerequisites
  • k3s config
  • more tls-sans
  • disable cloud provider
  • tailscale
  • embedded-registry: true

Cilium Tuning

  • update cilium version
  • enable native routing mode
  • bpf masquerade
  • enable DSR
  • Bypass iptables connection tracking
  • bandwidthManager
  • pod BBR
  • enable XDPAcceleration
  • envoy DaemonSet (cilium 1.16 default)
  • hubble grafana dashboards
  • netkit

System

Add Tailscale Operator

  • external terraform tailscale
  • ACL
  • OAuth
  • tailscale operator helm
  • tailscale ingress - replace nginx ingress
  • tailscale k8s api server
  • tailscale cert - replace cert-manager
  • tailscale funnel - replace cloudflared
  • tailscale dns - replace external-dns
  • tailscale proxygroup

Observability

Logs

  • Add journald logs

Metrics

  • PV
  • More ServiceMonitor
  • Cilium
  • Volsync
  • K3s kubeControllerManager/kubeScheduler/kubeEtcd and disable kube-proxy(because not used)
  • ArgoCD
  • Kured
  • Loki/Promtail
  • Rook Ceph CSI
  • Gitea
  • Woodpecker(PodMonitor)
  • Dex
  • external-secrets
  • Grafana
  • Zot
  • More PrometheusRules/Alerts
  • ArgoCD
  • Loki/Promtail
  • Woodpecker

Grafana

  • More Dashboards
  • Cilium
  • Woodpecker
  • PV

Security

  • Fail2Ban
  • Disable root login

My Apps

🐛Bug Fix

  • Increase the timeout seconds of Wait for the machines to come online
  • Grafana query loki error
  • too many outstanding requests
  • parse error at line 1, col 71: syntax error: unexpected IDENTIFIER

Homelab Blog

NIX Packages

  • ping
  • nslookup
  • starship(but throuth autoinstall runcmd)
  • krew

Makefile to GoTask

Public Repo

  • Modify TODO:
  • Remove hard codes
  • Remove secrets
  • Add more docs
  • Add more examples
  • Add more templates
  • Modify code/configuration/documentation related to the git repo

Alpha requirements

Literally anything that works.

Beta requirements

Good enough for tinkering and personal usage, and reasonably secure.

  • Automated bare metal provisioning
  • Controller set up (Docker)
  • OS installation (PXE boot)
  • Automated cluster creation (k3s)
  • Automated application deployment (ArgoCD)
  • Automated DNS management
  • Initialize GitOps repository on Gitea automatically
  • Observability
  • Monitoring
  • Logging
  • Alerting
  • SSO
  • Reasonably secure
  • Automated certificate management
  • Declarative secret management
  • Replace all default passwords with randomly generated ones
  • Expose services to the internet securely with Cloudflare Tunnel
  • Only use open-source technologies (except external managed services in ./external)
  • Everything is defined as code
  • Backup solution (3 copies, 2 seperate devices, 1 offsite)
  • Define SLOs:
  • 70% availability (might break in the weekend due to new experimentation)
  • Core applications
  • Gitea
  • Woodpecker
  • Private container registry
  • Homepage

Stable requirements

Can be used in "production" (for family or even small scale businesses).

  • A single command to deploy everything
  • Fast deployment time (from empty hard drive to running services in under 1 hour)
  • Fully automatic, not just automated
  • Bare-metal OS rolling upgrade
  • Kubernetes version rolling upgrade
  • Application version upgrade
  • Encrypted backups
  • Secrets rotation
  • Self healing
  • Secure by default
  • SELinux
  • Network policies
  • Static code analysis
  • Chaos testing
  • Minimal dependency on external services
  • Complete documentation
  • Diagram as code
  • Book (this book)
  • Walkthrough tutorial and feature demo (video)
  • Configuration script for new users
  • More dashboards and alert rules
  • SLOs:
  • 99,9% availability (less than 9 hours of downtime per year)
  • 99,99% data durability
  • Clear upgrade path
  • Additional applications
  • Matrix with bridges
  • VPN server
  • PeerTube
  • Blog
  • Development dashboard

Unplanned

Nice to have

  • Addition applications
  • Mail server
  • Air-gap install
  • Automated testing
  • Security audit
  • Serverless (Knative)
  • Cluster API (last attempt)
  • Split DNS (requires a better router)