Alerting with Time Series
In a Cloud Native infrastructure, failure is normal and expected. The loss of a single node or a dozen hard drives is gracefully handled by the systems running a datacenter and there is no reason to page someone at 4am. This calls for an alerting system that understands service availability at a global scope, yet is still able to give detailed reports if and when there is a service-impacting incident. This talk explores how time series based alerting solves this problem, the Prometheus architecture behind it, and how practical anomaly detection can be implemented.
Fabian Reinartz is a software engineer at CoreOS and one of the core developers of Prometheus, a monitoring system and time series database. Previously, he was a production engineer at SoundCloud and worked on information retrieval during his time at Saarland University.