Get in Touch

Course Outline

Foundations of Agentic Systems in Production

  • Agentic architectures: loops, tools, memory, and orchestration layers.
  • The lifecycle of agents: development, deployment, and continuous operation.
  • Challenges associated with production-scale agent management.

Infrastructure and Deployment Models

  • Deploying agents in containerized and cloud environments.
  • Scaling patterns: horizontal versus vertical scaling, concurrency, and throttling.
  • Multi-agent orchestration and workload balancing.

Monitoring and Observability

  • Key metrics: latency, success rate, memory usage, and agent call depth.
  • Tracing agent activity and call graphs.
  • Instrumenting observability using Prometheus, OpenTelemetry, and Grafana.

Logging, Auditing, and Compliance

  • Centralized logging and structured event collection.
  • Compliance and auditability within agentic workflows.
  • Designing audit trails and replay mechanisms for debugging purposes.

Performance Tuning and Resource Optimization

  • Reducing inference overhead and optimizing agent orchestration cycles.
  • Model caching and lightweight embeddings for faster retrieval.
  • Load testing and stress scenarios for AI pipelines.

Cost Control and Governance

  • Understanding agent cost drivers: API calls, memory, compute, and external integrations.
  • Tracking agent-level costs and implementing chargeback models.
  • Automation policies to prevent agent sprawl and idle resource consumption.

CI/CD and Rollout Strategies for Agents

  • Integrating agent pipelines into CI/CD systems.
  • Testing, versioning, and rollback strategies for iterative agent updates.
  • Progressive rollouts and safe deployment mechanisms.

Failure Recovery and Reliability Engineering

  • Designing for fault tolerance and graceful degradation.
  • Retry, timeout, and circuit breaker patterns for agent reliability.
  • Incident response and post-mortem frameworks for AI operations.

Capstone Project

  • Building and deploying an agentic AI system with full monitoring and cost tracking.
  • Simulating load, measuring performance, and optimizing resource usage.
  • Presenting the final architecture and monitoring dashboard to peers.

Summary and Next Steps

Requirements

  • A robust understanding of MLOps and production machine learning systems.
  • Experience with containerized deployments (Docker/Kubernetes).
  • Familiarity with cloud cost optimization and observability tools.

Audience

  • MLOps engineers.
  • Site Reliability Engineers (SREs).
  • Engineering managers responsible for AI infrastructure.
 21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

  • Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
  • Flexible Schedule: Dates and times adapted to your team's agenda.
  • Format: Online (live), In-company (at your offices), or Hybrid.
Investment

Price per private group, online live training, starting from 3900 € + VAT*

Contact us for an exact quote and to hear our latest promotions

Testimonials (3)

Provisional Upcoming Courses (Contact Us For More Information)

Related Categories