Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
課程簡介
Introduction to AIOps
- What is AIOps and why it matters
- Traditional monitoring vs. AIOps-driven observability
- AIOps architecture and key components
Collecting and Normalizing Operational Data
- Types of observability data: metrics, logs, and traces
- Ingesting data from multiple sources (servers, containers, cloud)
- Using agents and exporters (Prometheus, Beats, Fluentd)
Data Correlation and Anomaly Detection
- Time series correlation and statistical methods
- Using ML models for anomaly detection
- Detecting incidents across distributed systems
Alerting and Noise Reduction
- Designing intelligent alert rules and thresholds
- Suppression, deduplication, and alert grouping
- Integrating with Alertmanager, Slack, PagerDuty, or Opsgenie
Root Cause Analysis and Visualization
- Using dashboards to visualize metrics and detect trends
- Exploring events and timelines for RCA
- Tracing issues across layers with distributed tracing tools
Automation and Remediation
- Triggering automated scripts or workflows from incidents
- Integrating with ITSM systems (ServiceNow, Jira)
- Use cases: self-healing, scaling, traffic rerouting
Open Source and Commercial AIOps Platforms
- Overview of tools: Prometheus, Grafana, ELK, Moogsoft, Dynatrace
- Evaluation criteria for selecting an AIOps platform
- Demo and hands-on with a selected stack
Summary and Next Steps
最低要求
- An understanding of IT operations and system monitoring concepts
- Experience with monitoring tools or dashboards
- Familiarity with basic log and metric formats
Audience
- Operations teams responsible for infrastructure and applications
- Site Reliability Engineers (SREs)
- IT monitoring and observability teams
14 時間: