Introduction to Thanos

Prometheus’s simple and reliable operational model is one of its major selling points. However, past a certain scale, we’ve identified a few shortcomings. To resolve those, we’re today officially announcing Thanos, an open source project by Improbable to seamlessly transform existing Prometheus deployments in clusters around the world into a unified monitoring system with unbounded historical data storage. It’s available on Github here.

Global query view

Prometheus encourages a functional sharding approach. Even single Prometheus server provides enough scalability to free users from the complexity of horizontal sharding in virtually all use cases.

While this is a great deployment model, you often want to access all the data through the same API or UI – that is, a global view. For example, you can render multiple queries in a Grafana graph, but each query can be done only against a single Prometheus server. With Thanos, on the other hand, you can query and aggregate data from multiple Prometheus servers, because all of them are available from a single endpoint.

Previously, to enable global view at Improbable, we arranged our Prometheus instances in a multiple-level Hierarchical Federation. That meant setting up a single meta-Prometheus server that scraped a portion of the metrics from each “leaf” server.

More details on original blog post