AI/ML Workloads on Kubernetes: Simplifying GPU Cluster Management

Online Webinar | February 26, 2026 | 1:00 PM - 2:00 PM CT

Overview

AI and ML workloads are pushing Kubernetes to its limits. GPU node pools, resource quotas, job scheduling, and cost management add layers of complexity that most teams aren't prepared for.

Join Ben Ghazi for a practical session on managing AI/ML infrastructure on Kubernetes with Codiac. We'll cover how to provision GPU clusters, deploy training jobs, manage resource quotas, and keep costs under control—all without drowning in YAML or custom operators.

Whether you're running your first ML training pipeline or scaling a production inference fleet, this webinar will give you actionable patterns for simplifying GPU cluster management.

Reserve Your Spot

What You'll Learn

How to provision and manage GPU node pools across AWS and Azure with Codiac
Deploying ML training jobs with proper resource quotas and scheduling
Cost optimization strategies: Zombie Mode for dev/test GPU environments
Real-world patterns for scaling inference workloads
Live demo: Deploy an ML training pipeline in minutes

Location

Online (Zoom)

What to Expect

Live demo with Q&A. Bring your questions about running AI/ML on Kubernetes!

Speakers

Ben Ghazi, Co-Founder, Codiac

Who Should Attend

ML Engineers & Data Scientists deploying to Kubernetes
DevOps & Platform Engineers supporting ML teams
Engineering Managers overseeing AI/ML infrastructure
Anyone managing GPU workloads on Kubernetes