Pavan Madduri pmady

Senior Cloud Platform Engineer building GPU/AI infrastructure at scale.
CNCF Golden Kubestronaut. Oracle ACE Associate. Dragonfly Community Member.
31+ PRs across 17 open-source projects in CNCF, ASWF, and beyond.
If GPUs need scheduling, scaling, or observability on Kubernetes — that's what I build.

⚡ What I'm Building


🎮 GPU Autoscaling	KEDA External Scaler with native NVML metrics, DaemonSet architecture, scaling profiles for vLLM, Triton, and training workloads. Referenced in KEDA #7538 and published on CNCF Blog.
🔬 GPU NUMA Topology	Volcano scheduler plugin for NUMA-aware GPU placement — topology discovery via sysfs, CRD extensions, and cross-socket affinity optimization.
📡 GPU Observability	OpenTelemetry Collector receiver for GPU metrics (NVML-native) and Docker Desktop Extension for real-time GPU monitoring dashboards.
🧠 Topology-Aware AIOps	Knowledge graph of Kubernetes resources with graph-based root-cause traversal, AlertManager webhook integration, and blast-radius analysis.
☁️ Platform Engineering	Kubernetes, ArgoCD, Crossplane, Docker, KEDA — production platforms serving enterprise workloads at scale.
📝 Technical Writing	21 published articles across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.

🏆 Certifications & Recognition

Golden Kubestronaut — All five Kubernetes certifications: KCNA, CKA, CKAD, CKS, KCSA

🚀 Featured Projects

🎮 KEDA GPU Scaler

KEDA External gRPC Scaler for GPU/AI workloads

🎮 Native NVML — Direct GPU metrics via go-nvml
🚀 Scaling Profiles — vLLM, Triton, training presets
📦 DaemonSet — Per-node GPU metric collection
🔄 Scale-to-Zero — GPU-aware idle detection
📈 Prometheus — Optional /metrics endpoint

Tech: Go · gRPC · NVIDIA NVML · Kubernetes · Helm

Referenced in KEDA #7538 | CNCF Blog

📡 OpenTelemetry GPU Receiver

OpenTelemetry Collector receiver for GPU metrics

🔋 NVIDIA NVML — GPU utilization, memory, temperature
📊 OTel Native — Standard OTLP export pipeline
🖥️ Multi-GPU — All devices on the node
📈 Prometheus — Built-in Prometheus exporter

Tech: Go · OpenTelemetry Collector SDK · NVML

🐳 Docker GPU Dashboard Extension

Real-time NVIDIA GPU metrics in Docker Desktop

📊 Live Dashboard — Utilization, memory, temperature, power
📈 History Charts — 2-minute rolling Recharts graphs
🚦 Alert Thresholds — Color-coded green/yellow/red
🎭 Mock Mode — Develop without GPU hardware

Tech: Go · React · Recharts · Docker Extension SDK · NVML

🧠 Kube Topology Agent

K8s knowledge graph & automated root-cause analysis

🗺️ Knowledge Graph — Real-time resource topology
🔍 Root-Cause Traversal — Graph-based incident investigation
🎮 GPU Aware — Training/inference/batch classification
🔔 AlertManager — Webhook integration for auto-investigation

Tech: Go · Kubernetes API · Gorilla Mux · Helm

More projects: KubeAI Autoscaler · Ingress2Gateway · Golden Kubestronaut Learning · LLMOps

🌱 Open Source Contributions

31+ PRs across 17 projects in CNCF, ASWF, and open-source foundations.

CNCF (Cloud Native Computing Foundation)

Project	Description	Contributions
Dragonfly	P2P-based file distribution and image acceleration	client#1861 - Fix error chain propagation in backend stream failures, client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql
Kubernetes	Production-Grade Container Orchestration	#53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation
TiKV	Distributed transactional key-value database	#19225 - Add AGENTS.md for AI agent guidance
Volcano	Cloud-native batch scheduling for AI/HPC	#5328 - Fix typos in scheduler comments, #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs
HAMi	Heterogeneous AI Computing Virtualization Middleware	#1893 - Add unit tests for nvinternal info, mig, and watch packages
KEDA	Kubernetes Event-driven Autoscaling	keda-docs#1658 - Removing metricName from the kedadocs, keda-docs#1769 - Fix datadog scaler typos across all versions, #7538 - GPU/AI inference scaler architectural analysis
Metal³	Bare metal host provisioning for Kubernetes	#624 - Fix redirect links in tryit.md
OpenTelemetry	Observability framework	#8632 - Add .NET troubleshooting page
kpt	Kubernetes-native packaging and resource management	#4278 - Fix kpt fn doc command for KRM functions expecting input
traceAI	Open-source LLM observability SDK	#165 - Fix exporter shutdown and thread safety in Python SDK, #166 - Add Go SDK with OpenAI instrumentor

ASWF (Academy Software Foundation)

Project	Description	Contributions
OpenColorIO	Color management library	#2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework
OpenCue	Cloud rendering management system	#2134 - Add scheduled subscription recalculation task
OpenImageIO	Image processing library	#4976 - Fix IBA::compare_Yee() channel access
RAWtoACES	RAW to ACES image conversion	#222 - Add build developer documentation
xSTUDIO	Playback and review application	#186 - Fix broken build guide links

🧰 Tech Stack

�� GitHub Stats

Stats updated on 2026-06-11 15:25 UTC

🐍 Contribution Activity

🤝 Let's Connect

Building GPU infrastructure for Kubernetes? Working on CNCF projects? Let's collaborate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly