Deployment and setup

OpenClaw Home Server: Self-Healing Infrastructure via SSH and Kubernetes

Turn OpenClaw into a DevOps agent for your home lab. Monitoring, cron jobs, auto-healing, kubectl, security audits, and daily briefings.

Jean-Elie Lecuy

Founder of ClawRapid

SaaS builder writing about OpenClaw, AI agents, and agentic coding, with one goal: make powerful tooling actually usable.

View author page

Published on Mar 3, 20266 min read

Having a home server means being on-call for your own infrastructure. Services going down at 3 AM, certificates expiring silently, disks filling up, Kubernetes pods crash-looping while you're away for the weekend. You wanted control, but what you really got was a second job.

The "self-healing home server" pattern gives OpenClaw SSH access, cron jobs, and structured knowledge of your infrastructure. The agent detects, diagnoses, and fixes common problems before you even receive an alert.

This guide explains how to set up an infrastructure agent that runs continuously, and most importantly, how to do it securely.

The problem: a home lab needs a sysadmin

The classic issues:

Monitoring without response: Grafana/Prometheus detect, but someone has to intervene.
SSH from your phone: slow, stressful, error-prone.
Undocumented knowledge: topology and dependencies live in your head.
Repetitive tasks: log rotation, backups, updates, certificates.
IaC drift: Terraform/Ansible/K8s manifests evolve and break.

A persistent agent can fill the sysadmin role.

The solution: a persistent infrastructure agent with runbooks

OpenClaw can:

connect via SSH to machines
run kubectl on your cluster
execute checks on a schedule
apply fixes (restart pods, correct configs)
send a daily briefing
maintain an action log

But strict rules are necessary. A DevOps agent without guardrails is dangerous.

Skills and prerequisites

You'll need:

SSH (dedicated key)
kubectl if you have Kubernetes (K3s, etc.)
an optional mail/calendar tool (gog)
a runbook library (markdown)

For OpenClaw skills, see: OpenClaw Skills Guide.

Step-by-step setup

Step 1: define the scope in AGENTS.md

## Infrastructure Agent

You are my infrastructure agent.

Access:
- SSH on network machines (e.g., 192.168.1.0/24)
- kubectl on the K3s cluster
- Gmail/Calendar reading via gog (optional)
- runbooks folder: ~/infrastructure/runbooks/

Rules:
- never hardcode secrets
- never push directly to main
- mandatory logging in ~/logs/infra-changes.md
- destructive operations: ask for confirmation
- if in doubt: alert rather than act

Step 2: configure SSH (dedicated key)

ssh-keygen -t ed25519 -f ~/.ssh/openclaw_infra -N ""
ssh-copy-id -i ~/.ssh/openclaw_infra.pub admin@192.168.1.10

Add aliases in ~/.ssh/config.

Step 3: schedule cron jobs (the real product)

Example schedule:

every 15 min: service checks, simple auto-recovery
every hour: CPU/RAM/disk, alerts, notification triage
every 6h: openclaw gateway status, certificates, backup status
daily: 8 AM briefing
nightly: security audit

Prompt:

Set up a check system:
- HTTP endpoints
- DNS
- disk usage
- Kubernetes pod status

If a service goes down:
1) diagnose
2) attempt a safe fix (restart)
3) verify
4) if failure after 2 attempts, alert with logs

Step 4: write runbooks (procedures)

Don't let the agent improvise. Give it checklists.

Examples:

Pod CrashLoopBackOff

kubectl describe pod
kubectl logs --tail=50
if OOMKilled: increase limits
if config issue: check ConfigMap/Secret
kubectl rollout restart
log in infra-changes

Disk full (>90%)

identify large directories
clean docker/journald
check logrotate
if not resolved: alert

Expiring certificate

check cert-manager
renew / recreate
verify TLS

Step 5: daily briefing

Every day at 8 AM, send a briefing:

- Weather
- Calendars
- System health (CPU/RAM/Disk)
- Services UP/DOWN
- Auto-healing actions in the last 24h
- Alerts and items needing attention

Going further: tunnels, secrets, and scanning

Connect ClawRapid to your home network

If your agent runs on a remote server, you need to expose SSH securely. Two popular options:

Tailscale: simple, stable, mesh VPN
WireGuard: total control, slightly more technical

Rule: do not expose SSH to the Internet without a tunnel.

Secrets management

Don't put passwords in files. Use:

a dedicated vault (e.g., 1Password)
environment variables
minimum-scoped tokens

Secret scanning

Add automatic scanning (e.g., TruffleHog) to prevent a secret from ending up in git.

Install a pre-push hook that blocks any commit containing verified secrets.

Escalation principle

if the agent doesn't understand, it alerts
if the action is destructive, it asks for confirmation
if 2 attempts fail, it stops and gives you diagnostics

This is the difference between a useful agent and a dangerous one.

Security: essential guardrails

The number one risk: an agent can expose a secret or perform an irreversible action.

Best practices:

dedicated SSH key, limited privileges
network segmentation
branch protection (mandatory PRs)
secret scanning (TruffleHog as pre-push)
complete logging (SSH, changes)
approvals for sensitive actions

A useful reminder: an agent can "hardcode" a secret if you don't block it.

Concrete auto-healing example

Scenario: at 3:15 AM, a pod crash-loops because an environment variable is misspelled.

With the agent:

check detects the crash
the agent reads the logs
identifies a typo in the ConfigMap
corrects it, redeploys
verifies the service comes back up
logs the action and includes a summary in the morning briefing

You're asleep.

How ClawRapid fits in

An infrastructure agent needs to run 24/7. ClawRapid provides stable OpenClaw hosting with scheduling and heartbeat. Then connect your home lab via a secure tunnel (Tailscale, WireGuard, Cloudflare Tunnel) and give the agent limited SSH access.

FAQ

Is it safe to give SSH access to an AI agent?

Yes, if you apply guardrails: limited privileges, approvals, logging, secret scanning, segmentation.

What if the agent makes a problem worse?

Escalation rule: 2 attempts max, then alert. And destructive actions are forbidden without confirmation.

Do I need Kubernetes?

No. The pattern also works with a NAS, Pi-hole, Docker Compose.

What monitoring tools do you recommend?

Grafana/Prometheus for metrics, Uptime Kuma for endpoints, Loki for logs. The agent reads these signals and acts.

Can I use this on the cloud (AWS/GCP)?

Yes. Replace SSH/kubectl with aws, gcloud, etc. To isolate credentials, combine with OpenClaw + n8n.

How do I start small?

Give SSH access to one machine, add disk + DNS + endpoint checks, and a daily briefing. Expand from there.

OpenClaw Home Server: Self-Healing Infrastructure via SSH and Kubernetes

Turn OpenClaw into a DevOps agent for your home lab. Monitoring, cron jobs, auto-healing, kubectl, security audits, and daily briefings.

Jean-Elie Lecuy

Founder of ClawRapid

SaaS builder writing about OpenClaw, AI agents, and agentic coding, with one goal: make powerful tooling actually usable.

View author page

Published on Mar 3, 20266 min read

This guide explains how to set up an infrastructure agent that runs continuously, and most importantly, how to do it securely.

The problem: a home lab needs a sysadmin

The classic issues:

Monitoring without response: Grafana/Prometheus detect, but someone has to intervene.
SSH from your phone: slow, stressful, error-prone.
Undocumented knowledge: topology and dependencies live in your head.
Repetitive tasks: log rotation, backups, updates, certificates.
IaC drift: Terraform/Ansible/K8s manifests evolve and break.

A persistent agent can fill the sysadmin role.

The solution: a persistent infrastructure agent with runbooks

OpenClaw can:

connect via SSH to machines
run kubectl on your cluster
execute checks on a schedule
apply fixes (restart pods, correct configs)
send a daily briefing
maintain an action log

But strict rules are necessary. A DevOps agent without guardrails is dangerous.

Skills and prerequisites

You'll need:

SSH (dedicated key)
kubectl if you have Kubernetes (K3s, etc.)
an optional mail/calendar tool (gog)
a runbook library (markdown)

For OpenClaw skills, see: OpenClaw Skills Guide.

Step-by-step setup

Step 1: define the scope in AGENTS.md

## Infrastructure Agent

You are my infrastructure agent.

Access:
- SSH on network machines (e.g., 192.168.1.0/24)
- kubectl on the K3s cluster
- Gmail/Calendar reading via gog (optional)
- runbooks folder: ~/infrastructure/runbooks/

Rules:
- never hardcode secrets
- never push directly to main
- mandatory logging in ~/logs/infra-changes.md
- destructive operations: ask for confirmation
- if in doubt: alert rather than act

Step 2: configure SSH (dedicated key)

ssh-keygen -t ed25519 -f ~/.ssh/openclaw_infra -N ""
ssh-copy-id -i ~/.ssh/openclaw_infra.pub admin@192.168.1.10

Add aliases in ~/.ssh/config.

Step 3: schedule cron jobs (the real product)

Example schedule:

every 15 min: service checks, simple auto-recovery
every hour: CPU/RAM/disk, alerts, notification triage
every 6h: openclaw gateway status, certificates, backup status
daily: 8 AM briefing
nightly: security audit

Prompt:

Set up a check system:
- HTTP endpoints
- DNS
- disk usage
- Kubernetes pod status

If a service goes down:
1) diagnose
2) attempt a safe fix (restart)
3) verify
4) if failure after 2 attempts, alert with logs

Step 4: write runbooks (procedures)

Don't let the agent improvise. Give it checklists.

Examples:

Pod CrashLoopBackOff

kubectl describe pod
kubectl logs --tail=50
if OOMKilled: increase limits
if config issue: check ConfigMap/Secret
kubectl rollout restart
log in infra-changes

Disk full (>90%)

identify large directories
clean docker/journald
check logrotate
if not resolved: alert

Expiring certificate

check cert-manager
renew / recreate
verify TLS

Step 5: daily briefing

Every day at 8 AM, send a briefing:

- Weather
- Calendars
- System health (CPU/RAM/Disk)
- Services UP/DOWN
- Auto-healing actions in the last 24h
- Alerts and items needing attention

Going further: tunnels, secrets, and scanning

Connect ClawRapid to your home network

If your agent runs on a remote server, you need to expose SSH securely. Two popular options:

Tailscale: simple, stable, mesh VPN
WireGuard: total control, slightly more technical

Rule: do not expose SSH to the Internet without a tunnel.

Secrets management

Don't put passwords in files. Use:

a dedicated vault (e.g., 1Password)
environment variables
minimum-scoped tokens

Secret scanning

Add automatic scanning (e.g., TruffleHog) to prevent a secret from ending up in git.

Install a pre-push hook that blocks any commit containing verified secrets.

Escalation principle

if the agent doesn't understand, it alerts
if the action is destructive, it asks for confirmation
if 2 attempts fail, it stops and gives you diagnostics

This is the difference between a useful agent and a dangerous one.

Security: essential guardrails

The number one risk: an agent can expose a secret or perform an irreversible action.

Best practices:

dedicated SSH key, limited privileges
network segmentation
branch protection (mandatory PRs)
secret scanning (TruffleHog as pre-push)
complete logging (SSH, changes)
approvals for sensitive actions

A useful reminder: an agent can "hardcode" a secret if you don't block it.

Concrete auto-healing example

Scenario: at 3:15 AM, a pod crash-loops because an environment variable is misspelled.

With the agent:

check detects the crash
the agent reads the logs
identifies a typo in the ConfigMap
corrects it, redeploys
verifies the service comes back up
logs the action and includes a summary in the morning briefing

You're asleep.

How ClawRapid fits in

FAQ

Is it safe to give SSH access to an AI agent?

Yes, if you apply guardrails: limited privileges, approvals, logging, secret scanning, segmentation.

What if the agent makes a problem worse?

Escalation rule: 2 attempts max, then alert. And destructive actions are forbidden without confirmation.

Do I need Kubernetes?

No. The pattern also works with a NAS, Pi-hole, Docker Compose.

What monitoring tools do you recommend?

Grafana/Prometheus for metrics, Uptime Kuma for endpoints, Loki for logs. The agent reads these signals and acts.

Can I use this on the cloud (AWS/GCP)?

Yes. Replace SSH/kubectl with aws, gcloud, etc. To isolate credentials, combine with OpenClaw + n8n.

How do I start small?

Give SSH access to one machine, add disk + DNS + endpoint checks, and a daily briefing. Expand from there.

The problem: a home lab needs a sysadmin

The solution: a persistent infrastructure agent with runbooks

Skills and prerequisites

Step-by-step setup

Step 1: define the scope in AGENTS.md

Step 2: configure SSH (dedicated key)

Step 3: schedule cron jobs (the real product)

Step 4: write runbooks (procedures)

Pod CrashLoopBackOff

Disk full (>90%)

Expiring certificate

Step 5: daily briefing

Going further: tunnels, secrets, and scanning

Connect ClawRapid to your home network

Secrets management

Secret scanning

Escalation principle

Security: essential guardrails

Concrete auto-healing example

How ClawRapid fits in

FAQ

Read next

Related articles

OpenAI + OpenClaw: the GPT 5.4 setup that works

How to Run OpenClaw on Raspberry Pi: Complete 2026 Setup Guide

How to Build a Real-Time Dashboard with OpenClaw Sub-Agents and PostgreSQL

The problem: a home lab needs a sysadmin

The solution: a persistent infrastructure agent with runbooks

Skills and prerequisites

Step-by-step setup

Step 1: define the scope in AGENTS.md

Step 2: configure SSH (dedicated key)

Step 3: schedule cron jobs (the real product)

Step 4: write runbooks (procedures)

Pod CrashLoopBackOff

Disk full (>90%)

Expiring certificate

Step 5: daily briefing

Going further: tunnels, secrets, and scanning

Connect ClawRapid to your home network

Secrets management

Secret scanning

Escalation principle

Security: essential guardrails

Concrete auto-healing example

How ClawRapid fits in

FAQ

Read next

Related articles

OpenAI + OpenClaw: the GPT 5.4 setup that works

How to Run OpenClaw on Raspberry Pi: Complete 2026 Setup Guide

How to Build a Real-Time Dashboard with OpenClaw Sub-Agents and PostgreSQL