Skip to content
View pmady's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report pmady

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pmady/README.md

Typing SVG

profile views  


Senior Cloud Platform Engineer building GPU/AI infrastructure at scale.
CNCF Golden Kubestronaut. Oracle ACE Associate. Dragonfly Community Member.
31+ PRs across 17 open-source projects in CNCF, ASWF, and beyond.
If GPUs need scheduling, scaling, or observability on Kubernetes — that's what I build.


⚡ What I'm Building

🎮 GPU Autoscaling KEDA External Scaler with native NVML metrics, DaemonSet architecture, scaling profiles for vLLM, Triton, and training workloads. Referenced in KEDA #7538 and published on CNCF Blog.
🔬 GPU NUMA Topology Volcano scheduler plugin for NUMA-aware GPU placement — topology discovery via sysfs, CRD extensions, and cross-socket affinity optimization.
📡 GPU Observability OpenTelemetry Collector receiver for GPU metrics (NVML-native) and Docker Desktop Extension for real-time GPU monitoring dashboards.
🧠 Topology-Aware AIOps Knowledge graph of Kubernetes resources with graph-based root-cause traversal, AlertManager webhook integration, and blast-radius analysis.
☁️ Platform Engineering Kubernetes, ArgoCD, Crossplane, Docker, KEDA — production platforms serving enterprise workloads at scale.
📝 Technical Writing 21 published articles across CNCF Blog, IEEE ComSoc, Platform Engineering, VKTR, Cloud Native Now, and Medium.

🏆 Certifications & Recognition

Golden Kubestronaut — All five Kubernetes certifications: KCNA, CKA, CKAD, CKS, KCSA


🚀 Featured Projects

Stars CI License

KEDA External gRPC Scaler for GPU/AI workloads

  • 🎮 Native NVML — Direct GPU metrics via go-nvml
  • 🚀 Scaling Profiles — vLLM, Triton, training presets
  • 📦 DaemonSet — Per-node GPU metric collection
  • 🔄 Scale-to-Zero — GPU-aware idle detection
  • 📈 Prometheus — Optional /metrics endpoint

Tech: Go · gRPC · NVIDIA NVML · Kubernetes · Helm

Referenced in KEDA #7538 | CNCF Blog

Stars License

OpenTelemetry Collector receiver for GPU metrics

  • 🔋 NVIDIA NVML — GPU utilization, memory, temperature
  • 📊 OTel Native — Standard OTLP export pipeline
  • 🖥️ Multi-GPU — All devices on the node
  • 📈 Prometheus — Built-in Prometheus exporter

Tech: Go · OpenTelemetry Collector SDK · NVML

Stars License

Real-time NVIDIA GPU metrics in Docker Desktop

  • 📊 Live Dashboard — Utilization, memory, temperature, power
  • 📈 History Charts — 2-minute rolling Recharts graphs
  • 🚦 Alert Thresholds — Color-coded green/yellow/red
  • 🎭 Mock Mode — Develop without GPU hardware

Tech: Go · React · Recharts · Docker Extension SDK · NVML

Stars License

K8s knowledge graph & automated root-cause analysis

  • 🗺️ Knowledge Graph — Real-time resource topology
  • 🔍 Root-Cause Traversal — Graph-based incident investigation
  • 🎮 GPU Aware — Training/inference/batch classification
  • 🔔 AlertManager — Webhook integration for auto-investigation

Tech: Go · Kubernetes API · Gorilla Mux · Helm

More projects: KubeAI Autoscaler · Ingress2Gateway · Golden Kubestronaut Learning · LLMOps


🌱 Open Source Contributions

31+ PRs across 17 projects in CNCF, ASWF, and open-source foundations.

CNCF (Cloud Native Computing Foundation)

Project Description Contributions
Dragonfly P2P-based file distribution and image acceleration client#1861 - Fix error chain propagation in backend stream failures, client#1665 - Add Hugging Face backend support with hf:// protocol, client#1673 - Add ModelScope backend support with modelscope:// protocol, d7y.io#386 - Add hf:// protocol documentation, d7y.io#398 - Add P2P-accelerated AI model downloads blog post, helm-charts#455 - Add injector support to helm chart, helm-charts#480 - Replace deprecated bitnamilegacy/mysql with bitnami/mysql
Kubernetes Production-Grade Container Orchestration #53891 - Document deployment.kubernetes.io/* annotations, #53892 - Add kubectl apply view-last-applied documentation
TiKV Distributed transactional key-value database #19225 - Add AGENTS.md for AI agent guidance
Volcano Cloud-native batch scheduling for AI/HPC #5328 - Fix typos in scheduler comments, #5095 - GPU NUMA topology awareness in scheduler, apis#229 - Add GPUInfo type to NumatopoSpec CRD, resource-exporter#12 - GPU NUMA topology discovery via sysfs
HAMi Heterogeneous AI Computing Virtualization Middleware #1893 - Add unit tests for nvinternal info, mig, and watch packages
KEDA Kubernetes Event-driven Autoscaling keda-docs#1658 - Removing metricName from the kedadocs, keda-docs#1769 - Fix datadog scaler typos across all versions, #7538 - GPU/AI inference scaler architectural analysis
Metal³ Bare metal host provisioning for Kubernetes #624 - Fix redirect links in tryit.md
OpenTelemetry Observability framework #8632 - Add .NET troubleshooting page
kpt Kubernetes-native packaging and resource management #4278 - Fix kpt fn doc command for KRM functions expecting input
traceAI Open-source LLM observability SDK #165 - Fix exporter shutdown and thread safety in Python SDK, #166 - Add Go SDK with OpenAI instrumentor

ASWF (Academy Software Foundation)

Project Description Contributions
OpenColorIO Color management library #2229 - Add release signing workflow, #2230 - Add Dependabot configuration, #2243 - Add Vulkan unit test framework
OpenCue Cloud rendering management system #2134 - Add scheduled subscription recalculation task
OpenImageIO Image processing library #4976 - Fix IBA::compare_Yee() channel access
RAWtoACES RAW to ACES image conversion #222 - Add build developer documentation
xSTUDIO Playback and review application #186 - Fix broken build guide links

🧰 Tech Stack


�� GitHub Stats

GitHub Stats

Stats updated on 2026-06-11 15:25 UTC

🐍 Contribution Activity


🤝 Let's Connect

Building GPU infrastructure for Kubernetes? Working on CNCF projects? Let's collaborate.

Pinned Loading

  1. llmops llmops Public

    🚀 The Ultimate Curated List of LLMOps Tools, Frameworks, and Resources - A comprehensive collection of the best tools for Large Language Model Operations

    Shell 11 5

  2. pmady pmady Public

    12 2

  3. golden-kubestronaut-learning golden-kubestronaut-learning Public

    A comprehensive learning resource for achieving Kubestronaut and Golden Kubestronaut status through CNCF certifications

    Markdown 18 7

  4. kubeai-autoscaler kubeai-autoscaler Public

    Go 12 7

  5. ingress2gateway ingress2gateway Public

    Convert Kubernetes Ingress objects to Gateway API resources - Web GUI and REST API

    Python 8 3

  6. keda-gpu-scaler keda-gpu-scaler Public

    KEDA External gRPC Scaler for GPU workloads — native NVML metrics via DaemonSet, no Prometheus required

    Go 60 25