Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
AI Sovereignty and LLM Local Deployment
- Risks associated with cloud LLMs: data retention, training on inputs, and foreign jurisdiction.
- Ollama architecture: model server, registry, and OpenAI-compatible API.
- Comparison with vLLM, llama.cpp, and Text Generation Inference.
- Model licensing: terms for Llama, Mistral, Qwen, and Gemma.
Installation and Hardware Setup
- Installing Ollama on Linux with CUDA and ROCm support.
- CPU-only fallback and AVX/AVX2 optimization.
- Docker deployment and persistent volume mapping.
- Multi-GPU setup and VRAM allocation strategies.
Model Management
- Downloading models from the Ollama registry: ollama pull llama3.
- Importing GGUF models from HuggingFace and TheBloke.
- Quantization levels: trade-offs between Q4_K_M, Q5_K_M, and Q8_0.
- Model switching and limits on concurrent model loading.
Custom Modelfiles
- Writing Modelfile syntax: FROM, PARAMETER, SYSTEM, TEMPLATE.
- Tuning temperature, top_p, and repeat_penalty.
- System prompt engineering for role-specific behaviour.
- Creating and publishing custom models to the local registry.
API Integration
- OpenAI-compatible /v1/chat/completions endpoint.
- Streaming responses and JSON mode.
- Integrating with LangChain, LlamaIndex, and custom applications.
- Authentication and rate limiting using a reverse proxy.
Performance Optimization
- Context window sizing and KV cache management.
- Batch inference and parallel request handling.
- CPU thread allocation and NUMA awareness.
- Monitoring GPU utilization and memory pressure.
Security and Compliance
- Network isolation for model serving endpoints.
- Input filtering and output moderation pipelines.
- Audit logging of prompts and completions.
- Model provenance and hash verification.
Requirements
- Intermediate knowledge of Linux and container administration.
- High-level understanding of machine learning and transformer models.
- Familiarity with REST APIs and JSON.
Audience
- AI engineers and developers looking to replace cloud LLM APIs.
- Organisations with data sensitivity constraints that prohibit cloud model usage.
- Government and defence teams requiring air-gapped language models.
14 Hours
Custom Corporate Training
Training solutions designed exclusively for businesses.
- Customized Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
- Flexible Schedule: Dates and times adapted to your team's agenda.
- Format: Online (live), In-company (at your offices), or Hybrid.
Price per private group, online live training, starting from 2600 € + VAT*
Contact us for an exact quote and to hear our latest promotions