Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
課程簡介
Introduction and Diagnostic Foundations
- Overview of failure modes in LLM systems and common Ollama-specific issues
- Establishing reproducible experiments and controlled environments
- Debugging toolset: local logs, request/response captures, and sandboxing
Reproducing and Isolating Failures
- Techniques for creating minimal failing examples and seeds
- Stateful vs stateless interactions: isolating context-related bugs
- Determinism, randomness, and controlling nondeterministic behavior
Behavioral Evaluation and Metrics
- Quantitative metrics: accuracy, ROUGE/BLEU variants, calibration, and perplexity proxies
- Qualitative evaluations: human-in-the-loop scoring and rubric design
- Task-specific fidelity checks and acceptance criteria
Automated Testing and Regression
- Unit tests for prompts and components, scenario and end-to-end tests
- Creating regression suites and golden example baselines
- CI/CD integration for Ollama model updates and automated validation gates
Observability and Monitoring
- Structured logging, distributed traces, and correlation IDs
- Key operational metrics: latency, token usage, error rates, and quality signals
- Alerting, dashboards, and SLIs/SLOs for model-backed services
Advanced Root Cause Analysis
- Tracing through graphed prompts, tool calls, and multi-turn flows
- Comparative A/B diagnosis and ablation studies
- Data provenance, dataset debugging, and addressing dataset-induced failures
Safety, Robustness, and Remediation Strategies
- Mitigations: filtering, grounding, retrieval augmentation, and prompt scaffolding
- Rollback, canary, and phased rollout patterns for model updates
- Post-mortems, lessons learned, and continuous improvement loops
Summary and Next Steps
最低要求
- Strong experience building and deploying LLM applications
- Familiarity with Ollama workflows and model hosting
- Comfort with Python, Docker, and basic observability tooling
Audience
- AI engineers
- ML Ops professionals
- QA teams responsible for production LLM systems
35 時間: