Tactical RMM for Sysadmins: Architecture, Hardening, Reproducible Deployment, and Day-2 Operations (with Zero Per-Endpoint Fees)

Published 11/02/2025

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

Tactical RMM (TRMM) is a self-hosted RMM that integrates inventory, monitoring, patching, script execution, and remote access (via MeshCentral) with agents for Windows, Linux, and macOS (the latter two via the project’s sponsorship program). It runs on Django + Vue, NATS, PostgreSQL, and Redis, and is well-orchestrated with Docker Compose. For small/medium environments currently living on AnyDesk/TeamViewer + scripts + GPO, TRMM reduces tool sprawl and, critically, cuts OPEX: €0 per endpoint and full control of data and attack surface.

Below is a guide focused on operations (not marketing): architecture, requirements, IaC + hardening, update runbook, backup/DR, gotchas, and realistic sizing.

1) Reference Architecture

Components (containerized):

Panel/UI: Vue + Django API
NATS: messaging/queue for tasks/telemetry
PostgreSQL: state database
Redis: broker/cache (short queues, counters)
MeshCentral: remote desktop, shell, file transfer
TLS Proxy: Traefik/Caddy/nginx with Let’s Encrypt

Recommended FQDNs (two planes):

rmm.example.com → UI
api.example.com → API/Agents

External ports: 80/443.
Internal ports (not exposed): 4222 (NATS), 5432 (Postgres), 6379 (Redis), 443xx (Mesh sub-services as needed).

HA note: TRMM is designed for single-node. You can “stretch” to light HA (PG in HA, warm backups, active/passive reverse proxy), but it’s not Kubernetes.

2) Requirements and Sizing

Host OS: Ubuntu 22.04 LTS (recommended) or 24.04
Minimum HW: 4 vCPU, 8 GB RAM, 80–120 GB SSD
Recommended HW (≈1,000 endpoints, moderate checks): 8 vCPU, 16–24 GB RAM, NVMe SSD
Deployment time: 60–120 minutes (with DNS/TLS propagated)

Rule-of-thumb sizing (depends on checks, intervals, patch windows):

300–500 endpoints: 4 vCPU / 8 GB → OK
500–1,500 endpoints: 8 vCPU / 16–24 GB → tune NATS/PG (work_mem, autovacuum)
1,500 endpoints: vertical scale and audit checks/intervals; move PG to its own or managed instance; consider splitting by tenant.

3) Network, DNS, and PKI

DNS: A/AAAA for rmm. and api. to host IP.
PKI: Let’s Encrypt HTTP-01 (or DNS-01 for wildcard).
Firewall: open 22/80/443; deny everything else on host (UFW/nftables).
Zero Trust (optional): Cloud proxy (Cloudflare, Zscaler) in front of api. with WAF and rate limiting.

4) Reproducible Deployment (IaC)

4.1 Host bootstrap (Ansible, idempotent)

Install Docker/Compose, git, jq, unzip
Configure TZ, sysctl (fs.inotify, vm.*), journald rotation
Create user without password, minimal sudoers, SSH key-only
UFW with deny by default profile

4.2 Compose stack

Folder layout:

/opt/tacticalrmm/
  docker-compose.yml
  .env
  traefik/ (or caddy/)
  postgres/data
  redis/data
  meshcentral/data
  backup/

Key variables (.env):

RMM_FQDN=rmm.example.com
API_FQDN=api.example.com
TZ=Europe/Madrid
[email protected]

POSTGRES_PASSWORD=XXXXXXXX
REDIS_PASSWORD=XXXXXXXX
NATS_PASSWORD=XXXXXXXX
DJANGO_SECRET_KEY=XXXXXXXX (>= 50 chars)

Compose (high level):

traefik/caddy with 80/443 entrypoints and automatic certs
Services rmm, api, nats, postgres, redis, meshcentral on an internal network
Healthchecks and restart policies

Launch:

docker compose pull
docker compose up -d
docker compose ps

Post-install:

Browse https://rmm.example.com → create admin
Verify https://api.example.com/api/ → HTTP 200
Configure SMTP/alerts, policies, and MeshCentral (2FA and certs)

5) Hardening (Checklist)

TLS mandatory, modern cipher suites (TLS 1.2/1.3)
UI: mandatory 2FA for admins; RBAC least privilege; audit changes
API: rate limiting (traefik/caddy), security headers (CSP, HSTS, X-Frame-Options)
MeshCentral: 2FA, short session expiry, auto-update, UAC consent configured
Internal services (PG/Redis/NATS): bind to internal networks; no public ports
Logs: forward to syslog/ELK/Vector with retention and GDPR compliance
Backups: see §7; encrypt-at-rest (rclone/Restic to S3/Backblaze)
Patching: host unattended-upgrades + monthly compose pull window
Secrets: keep .env and credentials in Vault/sops; never in git

6) Operating Model: Policies, Checks, Patching, Scripts

Structure

Tenants (Organizations) → Sites → Groups → Devices
Inheritable policies (Baseline, AV/EDR, Patching, Alert routing)

Baseline checks

CPU > 85% (5 min), RAM > 85%, disk < 15%, S.M.A.R.T., services up, Event Log (critical IDs), Windows Update status, AV/EDR active, last reboot, RDP state (per policy)

Patching

Patch Tuesday + X days per ring (crit/high → low)
Staggered reboots; pre/post scripts (DISM/SFC, WU reset when needed)
Exclude drivers unless support dictates otherwise

Scripts

Windows: PowerShell (clean %TEMP%, DISM /RestoreHealth, sfc /scannow, WU reset, Chocolatey installs)
Linux: apt/yum updates, systemd health, journald rotation
macOS: brew, profiles (with sponsorship)
Scheduled runner (weekly/monthly)

Remote

Desktop (consent/UAC), real-time shell, files; session recording optional per compliance

7) Backups and DR

What to back up

PostgreSQL: pg_dump (daily) + weekly base backup (pg_basebackup if size grows)
Redis: RDB snapshot (hourly); include off-site
MeshCentral: meshcentral-data/ and config
Configs: docker-compose.yml, .env, secrets (in Vault), cron jobs

Where

S3-compatible (AWS/Wasabi/Backblaze) with private bucket, encryption-at-rest, lifecycle (7/30/90 days)
VM snapshots weekly (cloud provider)

Test restores (quarterly)

Fresh VM → compose → import PG/Redis → validate agent check-in
Targets: Panel RTO < 60 min; PG RPO < 15 min

8) Updates (Runbook)

Identical staging (subset of endpoints)
Maintenance window: snapshot + backups OK
docker compose pull && docker compose up -d
Validate Django migrations, Mesh and UI health
Prune old containers, dangling images; document compose version in internal changelog
Open change ticket with version diff and 24–48 h observation period

9) Observability (SRE-Light)

Host metrics: node_exporter → Prometheus → Grafana (CPU, RAM, FS, net)
PG metrics: postgres_exporter (TPS, bloat, locks, vacuum)
NATS metrics: varz endpoint → exporter
App: /api/ healthchecks, response times, task queues, error rate
Alerts: high API latency, HTTP 5xx, PG connections, disk < 15%, Let’s Encrypt cert expiry

10) Endpoint Security

Windows: MSI via GPO/Intune (/qn), MST if required; code-signing (with sponsorship)
Linux: .deb/systemd; egress whitelist to api.:443
macOS: agent (sponsorship); MDM profiles for permissions/daemon
EDR/AV coexistence: exclude agent/Mesh paths if EDR requires it

11) AnyDesk/TV vs TRMM (at a glance)

Capability	AnyDesk/TV (SaaS)	Tactical RMM (+ MeshCentral, self-hosted)
Remote control	Yes	Yes
Inventory/Monitoring	No/Limited	Yes (integrated)
Patching	No	Yes (policies)
Scripts/Automation	Limited	Yes (PS/BAT/Python/NuShell/Deno)
Alerts	No	Yes (email/SMS/Webhook)
Per-endpoint cost	Yes	No
Data/Compliance	Vendor’s SaaS	Your infrastructure

Migration strategy: 2–4 weeks coexistence, validate remote/UAC, scripted legacy removal.

12) Known Issues and Fast Fixes

Let’s Encrypt rate limits: use wildcard DNS-01 or staging CA for bulk tests
UAC breaks remote: adjust MeshCentral (elevation), enable consent prompt
Agent build flagged by AV: code-sign MSI/EXE; temporary AV exceptions
PG growth: tune autovacuum on events/log tables; date partitioning if volume is high
High agent latency: check DNS/anycast, NATS keepalive, network path (MTR)

13) Ops FAQ

Can I use managed PG (RDS/Aurora/Citus)?
Yes. Reduces host blast radius but adds latency and cost. Enforce SSL, proper parameter groups, and backups.

How do I limit api. exposure?
Rate limit, WAF, optional mTLS, IP allow-lists for panel (not for agents), origin rules in proxy.

Is multi-tenant a thing?
Yes via Organizations/Sites. For hard isolation (data/controls), split instances and/or databases.

SSO?
Available via sponsorship (SAML/OIDC). Alternative: SSO on MeshCentral and mandatory 2FA in TRMM.

14) TL;DR Deployment (key commands)

# Host prep
apt update && apt -y upgrade
apt -y install git curl jq ufw
curl -fsSL https://get.docker.com | sh
usermod -aG docker $USER

# Clone stack (per official docs) and fill .env
docker compose pull
docker compose up -d

# TLS and access
# -> https://rmm.example.com (create admin)
# -> https://api.example.com/api/ (200)

# Backups (PG example)
pg_dump -Fc -U trmm_user trmm_db > /opt/tacticalrmm/backup/trmm_$(date +%F).dump

# Updates
docker compose pull && docker compose up -d
Code language: PHP (php)

Closing

Tactical RMM won’t replace an orchestrator or a full ITSM platform; it will replace the typical “tool quilt” (AnyDesk/TV + scattered scripts + ad-hoc inventories + manual patching) with a coherent, self-hosted stack. For a systems team managing hundreds to low-thousands of endpoints, the TRMM + MeshCentral + solid ops practices combo delivers control, auditability, and predictable cost. The rest—hardening, runbooks, backups, and a culture of continuous improvement—is on us.

Source: Revista Cloud

X (Twitter) Facebook Pinterest LinkedIn Email WhatsApp

SSH3 goes public (in experimental form): the rethink of SSH that rides on QUIC + TLS 1.3, brings OAuth/OIDC, UDP forwarding and even “invisible” servers

Tactical RMM for Sysadmins: Architecture, Hardening, Reproducible Deployment, and Day-2 Operations (with Zero Per-Endpoint Fees)

1) Reference Architecture

2) Requirements and Sizing

3) Network, DNS, and PKI

4) Reproducible Deployment (IaC)

4.1 Host bootstrap (Ansible, idempotent)

4.2 Compose stack

5) Hardening (Checklist)

6) Operating Model: Policies, Checks, Patching, Scripts

7) Backups and DR

8) Updates (Runbook)

9) Observability (SRE-Light)

10) Endpoint Security

11) AnyDesk/TV vs TRMM (at a glance)

12) Known Issues and Fast Fixes

13) Ops FAQ

14) TL;DR Deployment (key commands)

Closing

Related articles

SSH3 goes public (in experimental form): the rethink of SSH that rides on QUIC + TLS 1.3, brings OAuth/OIDC, UDP forwarding and even “invisible” servers

Linux Servers vs. Microsoft Servers: A Real and Updated Comparison

Tailmox: The Open Source Tool That Makes Distributed Proxmox Clustering a Reality

ARM-based dedicated servers – efficiency and advantages over x86 architecture

AWS EC2 Instance Types Explained: When to Use Each One

Secure Hash Algorithm (SHA)

RISC-V: The Open-Source Revolution in Processor Architecture

Reflex, the “pure-Python web apps” framework that aims to unify speed, control, and simple deployment

How to Solve the mysqldump Error 2013: A Practical Guide for Database Administrators

Technical Comparative Analysis: The RISC vs CISC Architecture War and RISC-V’s Ecosystem Deficit