Back

AI/ML Workloads on Kubernetes: Simplifying GPU Cluster Management

Online Webinar | February 26, 2026 | 1:00 PM - 2:00 PM CT

Overview

AI and ML workloads are pushing Kubernetes to its limits. GPU node pools, resource quotas, job scheduling, and cost management add layers of complexity that most teams aren't prepared for.

Join Ben Ghazi for a practical session on managing AI/ML infrastructure on Kubernetes with Codiac. We'll cover how to provision GPU clusters, deploy training jobs, manage resource quotas, and keep costs under control—all without drowning in YAML or custom operators.

Whether you're running your first ML training pipeline or scaling a production inference fleet, this webinar will give you actionable patterns for simplifying GPU cluster management.

What You'll Learn

  • How to provision and manage GPU node pools across AWS and Azure with Codiac
  • Deploying ML training jobs with proper resource quotas and scheduling
  • Cost optimization strategies: Zombie Mode for dev/test GPU environments
  • Real-world patterns for scaling inference workloads
  • Live demo: Deploy an ML training pipeline in minutes

Location

Online (Zoom)

What to Expect

Live demo with Q&A. Bring your questions about running AI/ML on Kubernetes!

Speakers

  • Ben Ghazi, Co-Founder, Codiac

Who Should Attend

  • ML Engineers & Data Scientists deploying to Kubernetes
  • DevOps & Platform Engineers supporting ML teams
  • Engineering Managers overseeing AI/ML infrastructure
  • Anyone managing GPU workloads on Kubernetes
Link copied to your clipboard