IPMI Monitor

Web-based server hardware monitoring via IPMI and Redfish

View the Project on GitHub cryptolabsza/ipmi-monitor

IPMI Monitor

Free, self-hosted IPMI/BMC monitoring for your server fleet.

GitHub Docker Build License: MIT

Collect System Event Logs (SEL), monitor sensors, track ECC errors, gather SSH system logs, and get alerts โ€” all from a beautiful web dashboard.


๐Ÿ“– Documentation

Guide Description
User Guide Complete documentation for using IPMI Monitor
IPMI SEL Reference Decode BMC event logs and troubleshoot hardware issues
Developer Guide Git workflow, releases, CI/CD

๐Ÿš€ Quick Start (v1.1.1)

Deploy everything with a single command using a config file:

# Install from dev branch (latest features)
pipx install git+https://github.com/cryptolabsza/ipmi-monitor.git@dev

# Deploy with config file (no prompts)
sudo ipmi-monitor quickstart -c /path/to/config.yaml -y

See examples/ipmi-config.yaml for a complete config template.

Interactive Setup

# Install pipx (prerequisite)
apt install pipx -y && pipx ensurepath
source ~/.bashrc

# Install the CLI tool
pipx install ipmi-monitor

# Run the quickstart wizard (use full path since pipx bin isn't in sudo PATH)
sudo ~/.local/bin/ipmi-monitor quickstart

Thatโ€™s it! The wizard will:

Docker Run (Alternative)

docker run -d \
  --name ipmi-monitor \
  -p 5000:5000 \
  -v ipmi_data:/app/data \
  -e IPMI_USER=admin \
  -e IPMI_PASS=YOUR_BMC_PASSWORD \
  -e ADMIN_PASS=YOUR_ADMIN_PASSWORD \
  -e SECRET_KEY=YOUR_RANDOM_SECRET_KEY \
  ghcr.io/cryptolabsza/ipmi-monitor:latest

Then open http://localhost:5000 and add your servers!

See User Guide for Docker Compose setup.


๐Ÿ’ป CLI Commands

After installation, use the ipmi-monitor CLI:

Command Description
sudo ipmi-monitor quickstart โšก One-command Docker deployment (recommended)
ipmi-monitor status Show container status
ipmi-monitor logs [-f] View container logs
ipmi-monitor start Start containers
ipmi-monitor stop Stop containers
ipmi-monitor restart Restart containers
ipmi-monitor upgrade Pull latest image & restart
ipmi-monitor add-server Add a server interactively
ipmi-monitor list-servers List configured servers
ipmi-monitor setup-ssl Set up HTTPS reverse proxy
ipmi-monitor uninstall Uninstall IPMI Monitor (with options)
ipmi-monitor version Show detailed version info
ipmi-monitor setup-ssl Retry Letโ€™s Encrypt SSL setup

๐Ÿ“ธ Screenshots

Dashboard Main dashboard showing 39 servers with real-time status

Events
Event Log - SEL events
Sensors
Live Sensors
Inventory
Hardware Inventory
System Logs
SSH System Logs

โœจ Features

๐Ÿ†“ Free Self-Hosted

Feature Description
๐Ÿ” SEL Collection Parallel IPMI event collection (32 workers)
๐Ÿ“Š Real-time Dashboard Auto-refreshing server status cards
๐ŸŒก๏ธ Sensor Monitoring Temperature, fan, voltage, power readings
๐Ÿ’พ ECC Tracking Identify which DIMM has memory errors
๐ŸŽฎ GPU Health Detect NVIDIA Xid errors via SSH
๐Ÿ“œ SSH System Logs Collect dmesg, journalctl, syslog, mcelog
๐Ÿ–ฅ๏ธ Platform Logs Collect Vast.ai daemon and RunPod agent logs
๐Ÿ”ง Hardware Errors AER, PCIe, ECC errors parsed automatically
๐Ÿšจ Alerts Email, Telegram, webhook notifications
โœ… Alert Resolution Notify when issues clear
๐Ÿ“ˆ Prometheus Native /metrics endpoint for Grafana
๐Ÿ” User Management Admin and read-only access levels
๐Ÿ“ฅ Backup/Restore Export everything for disaster recovery
๐Ÿ”ƒ BMC Reset Cold/warm reset without affecting host OS
๐Ÿณ Docker Ready Multi-arch images (amd64/arm64)
๐Ÿ”„ Auto-Updates Watchtower keeps containers updated

๐Ÿ†• Whatโ€™s New in v1.1.1

Feature Description
๐Ÿ“ฆ Quickstart Wizard One-command Docker deployment with CryptoLabs Proxy, SSL, Watchtower
๐ŸŒ CryptoLabs Proxy Unified reverse proxy with Fleet Management landing page at /
๐Ÿ”— DC Overview Import Auto-detect DC Overview installation and import servers/SSH keys
๐Ÿ” SSH Key Management Auto-detect keys, paste content, or generate new ED25519 keys
๐Ÿ“œ SSH Log Collection Optional SSH log collection (dmesg, syslog, GPU errors) during setup
๐Ÿš€ Initial Data Collection Fresh installs auto-collect sensors/events with progress modal
๐Ÿ”’ Auto SSL Renewal Certbot container automatically obtains/renews Letโ€™s Encrypt certs
๐ŸŒ Subpath Routing Deploy at /ipmi/ alongside other CryptoLabs services
๐Ÿท๏ธ Site Name Branding Configure site name via DC Overview for consistent branding
๐Ÿ–ฅ๏ธ Vast.ai/RunPod Logs Auto-collects daemon logs when deployed via DC Overview with exporters
๐Ÿ”„ Watchtower Integration Automatic container updates every 5 minutes
๐Ÿ‘ค Read-Write Role New role with settings access but no user management
๐Ÿ“ฅ Fixed Export/Import Alert rules now export/import correctly
๐Ÿ“‹ SEL Management Enable/disable event logging, view SEL info, get SEL time
๐Ÿ’š Sensor Highlighting Changed sensor values pulse green after refresh
โณ Diagnostics Loading States Download buttons show progress to prevent double-clicks
๐Ÿ“Š Grafana Config prometheus.yml example and endpoint documentation
๐Ÿ›ก๏ธ Uninstall Options Choose to remove containers, config, or both

๐Ÿค– AI Features (Optional)

Upgrade with AI-powered insights from CryptoLabs:

Feature Description
๐Ÿ“Š Daily Summaries AI-generated fleet health with GPU focus
๐Ÿ”ง Maintenance Tasks Auto-generated from events
๐Ÿ“ˆ Predictions Failure warnings before they happen
๐Ÿ” Root Cause Analysis AI explains what went wrong
๐Ÿ’ฌ AI Chat Ask questions about your servers
๐Ÿค– Recovery Agent Autonomous GPU recovery with escalation
๐Ÿข Multi-Site One account, multiple datacenters
๐Ÿ“‹ Task Queue AI sends recovery tasks for execution

Start your free trial: Settings โ†’ AI Features โ†’ Start Free Trial


โš™๏ธ Configuration

Variable Default Description
APP_NAME IPMI Monitor Displayed in header
IPMI_USER admin Default BMC username
IPMI_PASS (required) Default BMC password
ADMIN_PASS changeme Dashboard admin password
SECRET_KEY (auto) Flask session secret (set this!)
POLL_INTERVAL 300 Seconds between collections
SSH_LOG_INTERVAL (disabled) Minutes between SSH log collection

๐Ÿ”’ Security

IPMI Monitor is designed for production datacenter environments:


๐Ÿ—๏ธ Architecture

IPMI Monitor runs as Docker containers with CryptoLabs Proxy for unified reverse proxy:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         Your Server                                 โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ cryptolabs-proxy           Port 80/443 (HTTP/HTTPS)          โ”‚  โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ /          โ†’ Fleet Management Landing Page              โ”‚  โ”‚
โ”‚  โ”‚  โ”œโ”€โ”€ /ipmi/     โ†’ IPMI Monitor                               โ”‚  โ”‚
โ”‚  โ”‚  โ””โ”€โ”€ /dc/       โ†’ DC Overview (if installed)                 โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                           โ”‚                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ ipmi-monitor              Port 5000 (internal)                โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Flask web application with SQLite                          โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Background workers (IPMI polling, SSH log collection)      โ”‚ โ”‚
โ”‚  โ”‚  โ€ข Initial data collection on first start                     โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                           โ”‚                                         โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ certbot                  Auto SSL renewal (every 12h)         โ”‚ โ”‚
โ”‚  โ”‚ watchtower               Auto container updates (every 5m)    โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚                              โ”‚
              โ–ผ                              โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  BMC/IPMI       โ”‚          โ”‚  Server OS      โ”‚
    โ”‚  (port 623)     โ”‚          โ”‚  (SSH port 22)  โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Live Example: dc.cryptolabs.co.za - Fleet Management at /, IPMI Monitor at /ipmi/


๐Ÿ“‹ API Reference

IPMI Monitor exposes 150+ REST API endpoints. Here are the most commonly used:

Dashboard & Events

Endpoint Description
GET / Web dashboard
GET /api/servers List all servers with status
GET /api/events Get events (supports filters)
GET /api/stats Dashboard statistics
GET /api/maintenance Maintenance tasks
GET /api/recovery-logs Recovery action history
GET /api/uptime Server uptime data

Server Management

Endpoint Description
GET /api/servers/managed All configured servers
POST /api/servers/add Add new server
PUT /api/servers/{bmc_ip} Update server config
DELETE /api/servers/{bmc_ip} Remove server
POST /api/servers/import Bulk import servers
GET /api/servers/export Export server list

Per-Server Operations

Endpoint Description
GET /server/{bmc_ip} Server detail page
GET /api/server/{bmc_ip}/events Serverโ€™s events
GET /api/sensors/{bmc_ip} Live sensor readings
GET /api/server/{bmc_ip}/ssh-logs SSH system logs
POST /api/servers/{bmc_ip}/inventory Collect inventory
POST /api/server/{bmc_ip}/power/{action} Power control (on/off/reset)
POST /api/server/{bmc_ip}/bmc/{action} BMC reset (cold/warm)
POST /api/server/{bmc_ip}/investigate Post-recovery investigation

SSH & Credentials

Endpoint Description
GET /api/ssh-keys List stored SSH keys
POST /api/ssh-keys Add SSH key
POST /api/test/bmc Test BMC connection
POST /api/test/ssh Test SSH connection
POST /api/ssh-logs/collect-now Trigger SSH log collection

Alerts & Notifications

Endpoint Description
GET /api/alerts/rules Alert rules
POST /api/alerts/rules Create alert rule
GET /api/alerts/history Fired alerts
GET /api/alerts/notifications Notification channels
POST /api/alerts/notifications/{type}/test Test notification

System & Monitoring

Endpoint Description
GET /metrics Prometheus metrics
GET /health Health check
GET /api/version Version info
GET /api/version/check Check for updates
POST /api/collect Trigger IPMI collection

AI Features

Endpoint Description
GET /api/ai/status AI sync status
GET /api/ai/config AI configuration
POST /api/ai/sync Trigger AI sync
GET /api/ai/results Cached AI results

See User Guide for complete endpoint documentation.



๐Ÿ†˜ Support


MIT License ยท Made with โค๏ธ by CryptoLabs