Building a Real-Time Kubernetes Operations Dashboard
How I built FlexDeck, a full-stack operations dashboard with real-time K8s monitoring, GitLab CI/CD visualization, and AI model management using Go and SolidJS.
Tech Stack
TL;DR
- FlexDeck is a real-time K8s + CI/CD + model-serving dashboard I built because kubectl, Grafana, and the GitLab UI never converge into a single operator view when something is actually wrong.
- Sub-100ms update latency comes from a Go backend that fans K8s informer events to a SolidJS frontend over WebSocket. No polling.
- The real design lesson is that operator dashboards live or die by how quickly you can answer “what changed, and is it my fault?” Everything in FlexDeck is structured around that question.
- Three clusters monitored simultaneously, six months at 99.9% uptime, still a single Go service and a small Solid app.
Overview
I run a small “production-like” homelab: GitOps-managed K3s, multiple services, and GPU workloads that are sensitive to small regressions. The boring truth is that most of the time I’m not writing Kubernetes YAML. I’m debugging:
- why a rollout is stuck,
- why a pod is thrashing,
- why a build artifact didn’t land where I expected,
- why the GPU node is “healthy” but not actually usable.
FlexDeck is the dashboard I built because I got tired of context-switching between kubectl, Grafana, and the GitLab UI. The goal was not “a pretty dashboard.” It was a fast, real-time, single place to see cluster + CI/CD + model state with enough detail to troubleshoot.
The Challenge
Running a production-like homelab environment means dealing with:
- Multiple clusters: K3s for applications, Harvester for infrastructure
- GitLab CI/CD: Dozens of pipelines across 40+ repositories
- AI workloads: GPU scheduling, model health, inference metrics
- Ops ergonomics: too many tabs, too much scrolling, not enough “what changed?”
I needed a dashboard that would:
- Show real-time cluster state without manual refresh
- Visualize CI/CD pipelines with enough detail to debug failures
- Monitor GPU utilization and AI model health
- Work on both desktop and mobile for on-the-go checks
The Approach
Technology Choices
Backend: Go
- Excellent Kubernetes client libraries
- Low memory footprint for always-on service
- Strong concurrency primitives for real-time streaming
Frontend: SolidJS
- Fine-grained reactivity without Virtual DOM overhead
- Excellent performance for frequent updates
- Smaller bundle than React for faster mobile loads
Data streaming: WebSocket + Server-Sent Events
- WebSocket for bidirectional communication (kubectl exec, logs)
- SSE for unidirectional updates (cluster state, metrics)
Architecture
Implementation Details
Kubernetes Watch Streams
The Go backend uses informers to watch cluster resources efficiently:
type ClusterWatcher struct {
clientset *kubernetes.Clientset
informers informers.SharedInformerFactory
updates chan ResourceUpdate
}
func (w *ClusterWatcher) WatchPods(ctx context.Context) {
informer := w.informers.Core().V1().Pods().Informer()
informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: func(obj interface{}) {
pod := obj.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "added",
Resource: podToDTO(pod),
}
},
UpdateFunc: func(old, new interface{}) {
pod := new.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "updated",
Resource: podToDTO(pod),
}
},
DeleteFunc: func(obj interface{}) {
pod := obj.(*corev1.Pod)
w.updates <- ResourceUpdate{
Type: "pod",
Action: "deleted",
Resource: podToDTO(pod),
}
},
})
}
Fine-Grained Reactivity with SolidJS
SolidJS signals update only the specific DOM elements that need to change:
function PodCard(props: { pod: Pod }) {
// Only re-renders when this specific pod's status changes
const statusColor = () => {
switch (props.pod.status) {
case 'Running':
return 'text-green-400';
case 'Pending':
return 'text-yellow-400';
case 'Failed':
return 'text-red-400';
default:
return 'text-gray-400';
}
};
return (
<div class="pod-card">
<span class="pod-name">{props.pod.name}</span>
<span class={`pod-status ${statusColor()}`}>{props.pod.status}</span>
<Show when={props.pod.restarts > 0}>
<span class="restart-count">Restarts: {props.pod.restarts}</span>
</Show>
</div>
);
}
GitLab CI/CD Visualization
For CI/CD pipelines, I built a custom visualization using D3.js with particle effects showing job flow:
- Pipelines displayed as directed graphs
- Jobs animate between stages as they progress
- Failed jobs pulse red for visibility
- Click-to-expand for job logs
GPU Monitoring
The dashboard integrates with DCGM (NVIDIA) and ROCm metrics for GPU health:
type GPUMetrics struct {
DeviceID string `json:"device_id"`
Utilization float64 `json:"utilization"`
MemoryUsed int64 `json:"memory_used"`
MemoryTotal int64 `json:"memory_total"`
Temperature int `json:"temperature"`
PowerDraw float64 `json:"power_draw"`
ActiveModel string `json:"active_model,omitempty"`
}
Results
Performance Metrics
These numbers are from my reference machine and homelab cluster and should be treated as directional.
| Metric | Target | Achieved |
|---|---|---|
| Initial load time | <2s | 1.2s |
| Update latency | <200ms | <100ms |
| Memory usage (backend) | <100MB | 65MB |
| Bundle size (frontend) | <500KB | 380KB |
Operational Impact
Before FlexDeck:
- Switching between
kubectl, Grafana, and GitLab UI constantly - Missing pipeline failures until builds broke
- No mobile access to cluster state
After FlexDeck:
- Single pane of glass for all operations
- Real-time notifications for failures
- Check cluster health from phone during incidents
Visualizations Built
- Cluster topology: Interactive node and pod layout
- Pipeline DAG: Directed graph with job status animations
- Resource utilization: Real-time charts for CPU/memory/GPU
- Namespace overview: Grid view with health indicators
- Model registry: AI model versions and deployment status
Lessons Learned
SolidJS for Real-Time UIs
The choice of SolidJS over React paid off significantly:
- No batched updates: Changes appear instantly, not on next tick
- No re-render cascades: Parent updates don't re-render children
- Smaller runtime: 7KB vs React's 40KB+
The mental model is different (signals vs state), but for dashboards with many independent updating components, it's worth learning.
Kubernetes Informers vs Polling
Initially I polled the API server every 5 seconds. Switching to informers:
- Reduced API server load by 95%
- Eliminated "stale data" UX issues
- Enabled instant status updates
WebSocket Reconnection
Real-time connections fail. Robust reconnection is essential:
func (c *Client) maintainConnection(ctx context.Context) {
backoff := time.Second
maxBackoff := time.Minute
for {
select {
case <-ctx.Done():
return
default:
}
if err := c.connect(); err != nil {
log.Printf("Connection failed: %v, retrying in %v", err, backoff)
time.Sleep(backoff)
backoff = min(backoff*2, maxBackoff)
continue
}
backoff = time.Second // Reset on success
c.handleMessages(ctx)
}
}
Future Improvements
- Multi-tenant support: Share dashboard with team members
- Alert integration: PagerDuty/Slack notifications from dashboard
- Cost tracking: Integrate with cloud billing APIs
- Capacity planning: Predictive scaling recommendations
Conclusion
Building a custom operations dashboard was more work than wiring up existing tools, but the result is exactly what I needed: fast, focused, and built around the workflows I actually use. The combination of Go’s streaming model (informers + channels) and SolidJS’s fine-grained reactivity makes “real-time” feel real, not simulated with polling spinners.
Interested in similar solutions?
Let's discuss how I can help with your project.