Course Outline
Foundations of Agentic Systems in Production
- Agentic architectures: loops, tools, memory, and orchestration layers.
- The lifecycle of agents: development, deployment, and continuous operation.
- Challenges associated with production-scale agent management.
Infrastructure and Deployment Models
- Deploying agents in containerized and cloud environments.
- Scaling patterns: horizontal versus vertical scaling, concurrency, and throttling.
- Multi-agent orchestration and workload balancing.
Monitoring and Observability
- Key metrics: latency, success rate, memory usage, and agent call depth.
- Tracing agent activity and call graphs.
- Instrumenting observability using Prometheus, OpenTelemetry, and Grafana.
Logging, Auditing, and Compliance
- Centralized logging and structured event collection.
- Compliance and auditability within agentic workflows.
- Designing audit trails and replay mechanisms for debugging purposes.
Performance Tuning and Resource Optimization
- Reducing inference overhead and optimizing agent orchestration cycles.
- Model caching and lightweight embeddings for faster retrieval.
- Load testing and stress scenarios for AI pipelines.
Cost Control and Governance
- Understanding agent cost drivers: API calls, memory, compute, and external integrations.
- Tracking agent-level costs and implementing chargeback models.
- Automation policies to prevent agent sprawl and idle resource consumption.
CI/CD and Rollout Strategies for Agents
- Integrating agent pipelines into CI/CD systems.
- Testing, versioning, and rollback strategies for iterative agent updates.
- Progressive rollouts and safe deployment mechanisms.
Failure Recovery and Reliability Engineering
- Designing for fault tolerance and graceful degradation.
- Retry, timeout, and circuit breaker patterns for agent reliability.
- Incident response and post-mortem frameworks for AI operations.
Capstone Project
- Building and deploying an agentic AI system with full monitoring and cost tracking.
- Simulating load, measuring performance, and optimizing resource usage.
- Presenting the final architecture and monitoring dashboard to peers.
Summary and Next Steps
Requirements
- A robust understanding of MLOps and production machine learning systems.
- Experience with containerized deployments (Docker/Kubernetes).
- Familiarity with cloud cost optimization and observability tools.
Audience
- MLOps engineers.
- Site Reliability Engineers (SREs).
- Engineering managers responsible for AI infrastructure.
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 3900 € + VAT*
Contact us for an exact quote and to hear our latest promotions
Testimonials (3)
The trainer is patient and very helpful. He knows the topic well.
CLIFFORD TABARES - Universal Leaf Philippines, Inc.
Course - Agentic AI for Business Automation: Use Cases & Integration
Good mixvof knowledge and practice
Ion Mironescu - Facultatea S.A.I.A.P.M.
Course - Agentic AI for Enterprise Applications
The mix of theory and practice and of high level and low level perspectives