Skip to main content
Back to Case Studies
DevOps4 monthsCompleted

Building a Real-Time Kubernetes Operations Dashboard

How I built FlexDeck, a full-stack operations dashboard with real-time K8s monitoring, GitLab CI/CD visualization, and AI model management using Go and SolidJS.

November 1, 2025·4 min read
<100ms
Update Latency
real-time data refresh
3
Clusters
monitored simultaneously
99.9%
Uptime
over 6 months

Tech Stack

Backend
GoWebSocket
Frontend
SolidJSD3.js
Infrastructure
Kubernetes
Monitoring
Prometheus

TL;DR

  • FlexDeck is a real-time K8s + CI/CD + model-serving dashboard I built because kubectl, Grafana, and the GitLab UI never converge into a single operator view when something is actually wrong.
  • Sub-100ms update latency comes from a Go backend that fans K8s informer events to a SolidJS frontend over WebSocket. No polling.
  • The real design lesson is that operator dashboards live or die by how quickly you can answer “what changed, and is it my fault?” Everything in FlexDeck is structured around that question.
  • Three clusters monitored simultaneously, six months at 99.9% uptime, still a single Go service and a small Solid app.

Overview

I run a small “production-like” homelab: GitOps-managed K3s, multiple services, and GPU workloads that are sensitive to small regressions. The boring truth is that most of the time I’m not writing Kubernetes YAML. I’m debugging:

  • why a rollout is stuck,
  • why a pod is thrashing,
  • why a build artifact didn’t land where I expected,
  • why the GPU node is “healthy” but not actually usable.

FlexDeck is the dashboard I built because I got tired of context-switching between kubectl, Grafana, and the GitLab UI. The goal was not “a pretty dashboard.” It was a fast, real-time, single place to see cluster + CI/CD + model state with enough detail to troubleshoot.

Diagram showing a GitOps-managed K3s cluster on Harvester with an ingress layer, a VM worker pool, and a dedicated GPU worker pool for inference workloads.
Figure 1. The baseline environment: GitOps-managed K3s with dedicated GPU workers for inference and media workloads.

The Challenge

Running a production-like homelab environment means dealing with:

  • Multiple clusters: K3s for applications, Harvester for infrastructure
  • GitLab CI/CD: Dozens of pipelines across 40+ repositories
  • AI workloads: GPU scheduling, model health, inference metrics
  • Ops ergonomics: too many tabs, too much scrolling, not enough “what changed?”

I needed a dashboard that would:

  1. Show real-time cluster state without manual refresh
  2. Visualize CI/CD pipelines with enough detail to debug failures
  3. Monitor GPU utilization and AI model health
  4. Work on both desktop and mobile for on-the-go checks

The Approach

Technology Choices

Backend: Go

  • Excellent Kubernetes client libraries
  • Low memory footprint for always-on service
  • Strong concurrency primitives for real-time streaming

Frontend: SolidJS

  • Fine-grained reactivity without Virtual DOM overhead
  • Excellent performance for frequent updates
  • Smaller bundle than React for faster mobile loads

Data streaming: WebSocket + Server-Sent Events

  • WebSocket for bidirectional communication (kubectl exec, logs)
  • SSE for unidirectional updates (cluster state, metrics)

Architecture

Diagram showing SolidJS clients connecting via WebSocket/SSE to a Go backend, which watches Kubernetes clusters and polls the GitLab API for CI/CD state.
Figure 2. FlexDeck treats infrastructure state as a stream: clients subscribe to a Go backend that watches clusters and CI/CD.

Implementation Details

Kubernetes Watch Streams

The Go backend uses informers to watch cluster resources efficiently:

type ClusterWatcher struct {
    clientset *kubernetes.Clientset
    informers informers.SharedInformerFactory
    updates   chan ResourceUpdate
}

func (w *ClusterWatcher) WatchPods(ctx context.Context) {
    informer := w.informers.Core().V1().Pods().Informer()

    informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
        AddFunc: func(obj interface{}) {
            pod := obj.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "added",
                Resource: podToDTO(pod),
            }
        },
        UpdateFunc: func(old, new interface{}) {
            pod := new.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "updated",
                Resource: podToDTO(pod),
            }
        },
        DeleteFunc: func(obj interface{}) {
            pod := obj.(*corev1.Pod)
            w.updates <- ResourceUpdate{
                Type:     "pod",
                Action:   "deleted",
                Resource: podToDTO(pod),
            }
        },
    })
}

Fine-Grained Reactivity with SolidJS

SolidJS signals update only the specific DOM elements that need to change:

function PodCard(props: { pod: Pod }) {
  // Only re-renders when this specific pod's status changes
  const statusColor = () => {
    switch (props.pod.status) {
      case 'Running':
        return 'text-green-400';
      case 'Pending':
        return 'text-yellow-400';
      case 'Failed':
        return 'text-red-400';
      default:
        return 'text-gray-400';
    }
  };

  return (
    <div class="pod-card">
      <span class="pod-name">{props.pod.name}</span>
      <span class={`pod-status ${statusColor()}`}>{props.pod.status}</span>
      <Show when={props.pod.restarts > 0}>
        <span class="restart-count">Restarts: {props.pod.restarts}</span>
      </Show>
    </div>
  );
}

GitLab CI/CD Visualization

For CI/CD pipelines, I built a custom visualization using D3.js with particle effects showing job flow:

  • Pipelines displayed as directed graphs
  • Jobs animate between stages as they progress
  • Failed jobs pulse red for visibility
  • Click-to-expand for job logs

GPU Monitoring

The dashboard integrates with DCGM (NVIDIA) and ROCm metrics for GPU health:

type GPUMetrics struct {
    DeviceID       string  `json:"device_id"`
    Utilization    float64 `json:"utilization"`
    MemoryUsed     int64   `json:"memory_used"`
    MemoryTotal    int64   `json:"memory_total"`
    Temperature    int     `json:"temperature"`
    PowerDraw      float64 `json:"power_draw"`
    ActiveModel    string  `json:"active_model,omitempty"`
}

Results

Performance Metrics

These numbers are from my reference machine and homelab cluster and should be treated as directional.

MetricTargetAchieved
Initial load time<2s1.2s
Update latency<200ms<100ms
Memory usage (backend)<100MB65MB
Bundle size (frontend)<500KB380KB

Operational Impact

Before FlexDeck:

  • Switching between kubectl, Grafana, and GitLab UI constantly
  • Missing pipeline failures until builds broke
  • No mobile access to cluster state

After FlexDeck:

  • Single pane of glass for all operations
  • Real-time notifications for failures
  • Check cluster health from phone during incidents

Visualizations Built

  1. Cluster topology: Interactive node and pod layout
  2. Pipeline DAG: Directed graph with job status animations
  3. Resource utilization: Real-time charts for CPU/memory/GPU
  4. Namespace overview: Grid view with health indicators
  5. Model registry: AI model versions and deployment status

Lessons Learned

SolidJS for Real-Time UIs

The choice of SolidJS over React paid off significantly:

  • No batched updates: Changes appear instantly, not on next tick
  • No re-render cascades: Parent updates don't re-render children
  • Smaller runtime: 7KB vs React's 40KB+

The mental model is different (signals vs state), but for dashboards with many independent updating components, it's worth learning.

Kubernetes Informers vs Polling

Initially I polled the API server every 5 seconds. Switching to informers:

  • Reduced API server load by 95%
  • Eliminated "stale data" UX issues
  • Enabled instant status updates

WebSocket Reconnection

Real-time connections fail. Robust reconnection is essential:

func (c *Client) maintainConnection(ctx context.Context) {
    backoff := time.Second
    maxBackoff := time.Minute

    for {
        select {
        case <-ctx.Done():
            return
        default:
        }

        if err := c.connect(); err != nil {
            log.Printf("Connection failed: %v, retrying in %v", err, backoff)
            time.Sleep(backoff)
            backoff = min(backoff*2, maxBackoff)
            continue
        }

        backoff = time.Second // Reset on success
        c.handleMessages(ctx)
    }
}

Future Improvements

  • Multi-tenant support: Share dashboard with team members
  • Alert integration: PagerDuty/Slack notifications from dashboard
  • Cost tracking: Integrate with cloud billing APIs
  • Capacity planning: Predictive scaling recommendations

Conclusion

Building a custom operations dashboard was more work than wiring up existing tools, but the result is exactly what I needed: fast, focused, and built around the workflows I actually use. The combination of Go’s streaming model (informers + channels) and SolidJS’s fine-grained reactivity makes “real-time” feel real, not simulated with polling spinners.

Interested in similar solutions?

Let's discuss how I can help with your project.