DevOps4 monthsCompleted

Building a Real-Time Kubernetes Operations Dashboard

How I built FlexDeck, a full-stack operations dashboard with real-time K8s monitoring, GitLab CI/CD visualization, and AI model management using Go and SolidJS.

November 1, 2025·4 min read

<100ms

Update Latency

real-time data refresh

Clusters

monitored simultaneously

99.9%

Uptime

over 6 months

Tech Stack

Backend

GoWebSocket

Frontend

SolidJSD3.js

Infrastructure

Kubernetes

Monitoring

Prometheus

TL;DR

FlexDeck is a real-time K8s + CI/CD + model-serving dashboard I built because kubectl, Grafana, and the GitLab UI never converge into a single operator view when something is actually wrong.
Sub-100ms update latency comes from a Go backend that fans K8s informer events to a SolidJS frontend over WebSocket. No polling.
The real design lesson is that operator dashboards live or die by how quickly you can answer “what changed, and is it my fault?” Everything in FlexDeck is structured around that question.
Three clusters monitored simultaneously, six months at 99.9% uptime, still a single Go service and a small Solid app.

Overview

I run a small “production-like” homelab: GitOps-managed K3s, multiple services, and GPU workloads that are sensitive to small regressions. The boring truth is that most of the time I’m not writing Kubernetes YAML. I’m debugging:

why a rollout is stuck,
why a pod is thrashing,
why a build artifact didn’t land where I expected,
why the GPU node is “healthy” but not actually usable.

FlexDeck is the dashboard I built because I got tired of context-switching between kubectl, Grafana, and the GitLab UI. The goal was not “a pretty dashboard.” It was a fast, real-time, single place to see cluster + CI/CD + model state with enough detail to troubleshoot.

Diagram showing a GitOps-managed K3s cluster on Harvester with an ingress layer, a VM worker pool, and a dedicated GPU worker pool for inference workloads. — **Figure 1.** The baseline environment: GitOps-managed K3s with dedicated GPU workers for inference and media workloads.

The Challenge

Running a production-like homelab environment means dealing with:

Multiple clusters: K3s for applications, Harvester for infrastructure
GitLab CI/CD: Dozens of pipelines across 40+ repositories
AI workloads: GPU scheduling, model health, inference metrics
Ops ergonomics: too many tabs, too much scrolling, not enough “what changed?”

I needed a dashboard that would:

Show real-time cluster state without manual refresh
Visualize CI/CD pipelines with enough detail to debug failures
Monitor GPU utilization and AI model health
Work on both desktop and mobile for on-the-go checks

The Approach

Technology Choices

Backend: Go

Excellent Kubernetes client libraries
Low memory footprint for always-on service
Strong concurrency primitives for real-time streaming

Frontend: SolidJS

Fine-grained reactivity without Virtual DOM overhead
Excellent performance for frequent updates
Smaller bundle than React for faster mobile loads

Data streaming: WebSocket + Server-Sent Events

WebSocket for bidirectional communication (kubectl exec, logs)
SSE for unidirectional updates (cluster state, metrics)

Architecture

Diagram showing SolidJS clients connecting via WebSocket/SSE to a Go backend, which watches Kubernetes clusters and polls the GitLab API for CI/CD state. — **Figure 2.** FlexDeck treats infrastructure state as a stream: clients subscribe to a Go backend that watches clusters and CI/CD.

Implementation Details

Kubernetes Watch Streams

The Go backend uses informers to watch cluster resources efficiently:

type ClusterWatcher struct {
    clientset *kubernetes.Clientset
    informers informers.SharedInformerFactory
    updates   chan ResourceUpdate
}

func (w *ClusterWatcher) WatchPods(ctx context.Context) {
    informer := w.informers.Core().V1().Pods().Informer()

    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            pod := obj.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "added",
                Resource: podToDTO(pod),
            }
        },
        UpdateFunc: func(old, new interface{}) {
            pod := new.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "updated",
                Resource: podToDTO(pod),
            }
        },
        DeleteFunc: func(obj interface{}) {
            pod := obj.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "deleted",
                Resource: podToDTO(pod),
            }
        },
    })
}

Fine-Grained Reactivity with SolidJS

SolidJS signals update only the specific DOM elements that need to change:

function PodCard(props: { pod: Pod }) {
  // Only re-renders when this specific pod's status changes
  const statusColor = () => {
    switch (props.pod.status) {
      case 'Running':
        return 'text-green-400';
      case 'Pending':
        return 'text-yellow-400';
      case 'Failed':
        return 'text-red-400';
      default:
        return 'text-gray-400';
    }
  };

  return (
    <div class="pod-card">
      <span class="pod-name">{props.pod.name}</span>
      <span class={`pod-status ${statusColor()}`}>{props.pod.status}</span>
      <Show when={props.pod.restarts > 0}>
        <span class="restart-count">Restarts: {props.pod.restarts}</span>
      </Show>
    </div>
  );
}

GitLab CI/CD Visualization

For CI/CD pipelines, I built a custom visualization using D3.js with particle effects showing job flow:

Pipelines displayed as directed graphs
Jobs animate between stages as they progress
Failed jobs pulse red for visibility
Click-to-expand for job logs

GPU Monitoring

The dashboard integrates with DCGM (NVIDIA) and ROCm metrics for GPU health:

type GPUMetrics struct {
    DeviceID       string  `json:"device_id"`
    Utilization    float64 `json:"utilization"`
    MemoryUsed     int64   `json:"memory_used"`
    MemoryTotal    int64   `json:"memory_total"`
    Temperature    int     `json:"temperature"`
    PowerDraw      float64 `json:"power_draw"`
    ActiveModel    string  `json:"active_model,omitempty"`
}

Results

Performance Metrics

These numbers are from my reference machine and homelab cluster and should be treated as directional.

Metric	Target	Achieved
Initial load time	<2s	1.2s
Update latency	<200ms	<100ms
Memory usage (backend)	<100MB	65MB
Bundle size (frontend)	<500KB	380KB

Operational Impact

Before FlexDeck:

Switching between kubectl, Grafana, and GitLab UI constantly
Missing pipeline failures until builds broke
No mobile access to cluster state

After FlexDeck:

Single pane of glass for all operations
Real-time notifications for failures
Check cluster health from phone during incidents

Visualizations Built

Cluster topology: Interactive node and pod layout
Pipeline DAG: Directed graph with job status animations
Resource utilization: Real-time charts for CPU/memory/GPU
Namespace overview: Grid view with health indicators
Model registry: AI model versions and deployment status

Lessons Learned

SolidJS for Real-Time UIs

The choice of SolidJS over React paid off significantly:

No batched updates: Changes appear instantly, not on next tick
No re-render cascades: Parent updates don't re-render children
Smaller runtime: 7KB vs React's 40KB+

The mental model is different (signals vs state), but for dashboards with many independent updating components, it's worth learning.

Kubernetes Informers vs Polling

Initially I polled the API server every 5 seconds. Switching to informers:

Reduced API server load by 95%
Eliminated "stale data" UX issues
Enabled instant status updates

WebSocket Reconnection

Real-time connections fail. Robust reconnection is essential:

func (c *Client) maintainConnection(ctx context.Context) {
    backoff := time.Second
    maxBackoff := time.Minute

    for {
        select {
        case <-ctx.Done():
            return
        default:
        }

        if err := c.connect(); err != nil {
            log.Printf("Connection failed: %v, retrying in %v", err, backoff)
            time.Sleep(backoff)
            backoff = min(backoff*2, maxBackoff)
            continue
        }

        backoff = time.Second // Reset on success
        c.handleMessages(ctx)
    }
}

Future Improvements

Multi-tenant support: Share dashboard with team members
Alert integration: PagerDuty/Slack notifications from dashboard
Cost tracking: Integrate with cloud billing APIs
Capacity planning: Predictive scaling recommendations

Conclusion

Building a custom operations dashboard was more work than wiring up existing tools, but the result is exactly what I needed: fast, focused, and built around the workflows I actually use. The combination of Go’s streaming model (informers + channels) and SolidJS’s fine-grained reactivity makes “real-time” feel real, not simulated with polling spinners.

Interested in similar solutions?

Let's discuss how I can help with your project.

Get in Touch View More Case Studies