Alerting with Time Series

Fabian Reinartz

In a Cloud Native infrastructure, failure is normal and expected. The loss of a single node or a dozen hard drives is gracefully handled by the systems running a datacenter and there is no reason to page someone at 4am. This calls for an alerting system that understands service availability at a global scope, yet is still able to give detailed reports if and when there is a service-impacting incident. This talk explores how time series based alerting solves this problem, the Prometheus architecture behind it, and how practical anomaly detection can be implemented.

Language: English

Level: Intermediate

Go to speaker's detail