Skip to content
🚨
Infrastructure

Agent 911

Your AI agent keeps crashing. Agent 911 keeps it running.

Autonomous watchdog that runs independently of your AI agent. Monitors every 30 seconds. Diagnoses crashes from real error patterns. Auto-fixes before you even notice. Built from 15+ production failures at a real company — because our business depends on it.

$9.11/week
↻ New improvements included with subscription

Instant download · Double-click install (Mac) · PC coming soon · Cancel anytime

🔒 Secure Checkout ⚡ Instant Download 💰 30-Day Money-Back

Sound Familiar? Every One Is Real.

These aren't hypothetical scenarios. Every single one happened to us while running a real business with AI agents. We built Agent 911 because we were tired of being the watchdog ourselves.

😴

3 AM Silence

Your AI agent was running dispatch for your business. At 3 AM, the model provider deprecated your model. Agent died silently. You found out at 8 AM when a client texted "why isn't anyone answering?"

🔥

MacBook on Fire

Local model got stuck in an inference loop. CPU pegged at 600% for 4 hours. Laptop overheating. Agent completely unresponsive. Nobody monitoring, nobody noticed.

👻

Ghost Process

You restarted the gateway but the old process didn't die. Two bots fighting over the same Telegram token. Messages randomly disappearing. Neither bot working right.

💸

Rate Limit Spiral

Hit the rate limit on your model provider. Agent couldn't respond. Cron jobs kept retrying. Each retry hit the limit again. 24-hour lockout. Business running blind.

🧠

Amnesia Overnight

Compression settings too aggressive. Agent compressed 6 times while you slept. Woke up to an AI that forgot everything — clients, projects, active jobs. Starting from scratch.

🔄

Restart Roulette

Agent crashed. You restarted it. Crashed again. Different error. Restarted. Crashed. Spent 45 minutes reading logs and guessing. The fix was one config line.

Every Crash Has a Pattern. We Know All of Them.

Agent 911 doesn't restart blindly. It reads your error logs, matches against 15+ known crash patterns, and applies the specific fix for that failure. Every pattern was discovered in real production.

Crash Pattern What Happens Agent 911 Fix Time to Fix
Model Deprecated Provider removes your model overnight Detects 404 → switches to your fallback model → restarts gateway < 60s
Model Not Found Wrong model name, not pulled, or deleted Detects 404 → switches to fallback → restarts < 60s
Rate Limited Hit provider rate limits (429) Detects 429 → waits cooldown → switches provider → restarts < 90s
Auth Failure API key expired, revoked, or invalid Detects 401 → notifies you immediately (can't auto-fix credentials) Instant alert
Telegram Conflict Two bot instances polling same token Detects conflict → kills duplicate process → restarts survivor < 45s
Telegram Flood Sending too many messages too fast Detects flood control → waits 30s → clean restart < 90s
Interpreter Crash Python process dies mid-execution Detects dead PID → clean restart with fresh process < 40s
Local Model Hung Ollama stuck at 500%+ CPU, no output Detects CPU spike → kills stuck model runner → restarts gateway < 60s
Session Lock Duplicate gateway holding Telegram/Slack lock Detects lock conflict → kills duplicate → restarts clean < 45s
Memory Pressure System runs out of RAM Detects OOM → kills heavy processes (model runners) → restarts < 60s
Config Invalid Bad config file after edit or update Detects validation error → notifies with exact error (can't guess config) Instant alert
Unclosed Session Crash leaves dangling HTTP connections Detects orphan session → clean kill → fresh restart < 40s
Cron Job Cascade Bad cron job crashes the entire gateway Detects gateway down after cron trigger → restart with cron isolation < 60s
Provider Timeout API calls hanging indefinitely Detects unresponsive gateway → force kill → restart with fallback < 90s
Disk Full Logs or cache filling disk Detects disk error → cleans old logs → notifies for manual review Instant alert + partial fix

Why It Actually Works

Most "health check" solutions run inside the agent. When the agent crashes, the health check crashes with it. Agent 911 is architecturally independent — it can't die when your agent dies.

🏗️

Independent Daemon

Runs as a macOS launchd service — completely separate from your agent. Its own process, its own memory, its own crash protection. If your agent dies, Agent 911 is still alive and watching.

🧠

Smart Diagnosis

Doesn't just check "is the process alive?" Reads your actual error logs, matches against 15 known patterns from real production, and applies the specific fix for that failure type. Not a blind restart.

🔄

Model Failover

When your model provider goes down, Agent 911 reads your config, finds your fallback model, switches the config, and restarts. Your agent comes back on a different model within 60 seconds.

🛡️

Loop Protection

If your agent keeps crashing (5+ times in an hour), Agent 911 stops restarting and escalates to you. Prevents infinite restart loops that make everything worse.

📱

Telegram Notifications

Every crash and fix gets a clear Telegram notification: what happened, what Agent 911 did, and the current status. You wake up to "auto-healed at 3 AM" instead of "your agent is dead."

📊

Crash Analytics

Full history of every crash, every auto-fix, every config patch. See patterns: "rate limited 3x this week" tells you to switch providers. "Telegram conflict every restart" tells you to check for duplicates.

🔧

Config Auto-Patch

When Agent 911 switches your model or fixes a config issue, it logs exactly what changed and why. Full audit trail. You can review and revert any automatic change.

Zero Config Install

Auto-detects your OpenClaw or Hermes installation. One command to install, one command to start. Reads your existing config for model fallbacks and notification settings.

By the Numbers

30s
Detection time
15+
Known crash patterns
< 60s
Typical auto-recovery
24/7
Independent monitoring
Real Story — April 2026

"In one week, our AI agent crashed 8 times. Model provider deprecated our model overnight. Telegram bot token conflicts between two agents. Local model stuck at 600% CPU cooking the MacBook. Rate limits locking us out for 24 hours. Compression settings wiped operational memory while we slept.

Every single crash, I had to manually diagnose, fix, and restart. At 3 AM. On weekends. While running a real business.

That's why I built Agent 911. Every crash pattern in this product is one I lived through. It's not theoretical — it's the watchdog I needed and didn't have."

— Barry Daoust, Founder (SmartHomes.us)

How It Works

The Loop (Every 30 Seconds)

1
Check pulse
Is the gateway process alive? Is the log file being written to?
2
If alive → sleep 30s
Everything's fine. Check again in 30 seconds. Zero overhead.
3
If dead → diagnose
Read last 50 lines of error log. Match against 15 known crash patterns. Identify root cause.
4
Apply targeted fix
Model dead? Switch to fallback. Duplicate process? Kill it. Rate limited? Wait and retry. Not just restart — the right fix.
5
Restart & verify
Clean restart. Wait for health check. Confirm the gateway is actually serving, not just running.
6
Notify
Telegram message: "Hermes was down. Diagnosis: model deprecated. Fix: switched to Opus fallback. Status: running ✅"

Double-Click to Install

Auto-detects Hermes Agent and Hermes. One command to install, one to start.

# Install Agent 911
python3 agent911.py install
# Start monitoring
launchctl load ~/Library/LaunchAgents/ai.agent911.watchdog.plist
# Check status anytime
python3 agent911.py status

Works with Hermes Agent and Hermes Agent on macOS. Linux support coming soon.

Frequently Asked Questions

What if Agent 911 itself crashes?

It runs as a macOS launchd service with KeepAlive enabled. If Agent 911 dies, macOS automatically restarts it within seconds. The watchdog watches itself.

Will it blindly restart my agent in a loop?

No. Agent 911 has built-in loop protection: maximum 5 restarts per hour, 2-minute cooldown between attempts. If it can't fix the problem, it stops and notifies you instead of making things worse.

Does it work with Hermes Agent?

Yes. Agent 911 auto-detects both Hermes Agent and Hermes installations. Same crash patterns, same auto-fixes, same notifications. One watchdog for both.

What if my model provider goes down permanently?

Agent 911 reads your fallback model configuration and automatically switches. If your primary is on OpenRouter and it dies, Agent 911 switches to your Anthropic direct API (or whatever fallback you've configured).

Can I add custom crash patterns?

Yes. The crash patterns are in a simple JSON array in the config. Add your own regex pattern, diagnosis text, and auto-fix action. Future updates will include community-contributed patterns.

How much resources does it use?

Almost zero. When your agent is running normally, Agent 911 sleeps for 30 seconds, checks one process, and goes back to sleep. No API calls, no model inference, no network traffic. It only wakes up when something is wrong.

Is it a subscription?

$9.11/week subscription. Cancel anytime, no contracts. As we discover new crash patterns in production, your Agent 911 gets them automatically — always up to date.

Stop Babysitting Your AI Agent.

Agent 911 watches so you don't have to. Auto-diagnose, auto-fix, auto-notify. Sleep through the night knowing your agent is covered.

Subscribe — $9.11/week

Instant download · Double-click install (Mac) · PC coming soon · 30-day money-back guarantee