Dynatrace introduces support for OpenTelemetry histograms, which visualize and make it easier to understand the distribution of data. These histograms enable, for example, response time analysis for services and help to define and monitor service-level objectives that can be alerted on.
Imagine you’re using a lot of OpenTelemetry and Prometheus metrics on a crucial platform. You’re gathering a lot of data, but you can’t make sense of it. You need to visualize the distribution of your measurements to identify patterns, outliers, and trends. But there’s a problem: Your current tools don’t support histograms.
Incorporating histograms is not just a technical upgrade; it’s a necessity for any observability professional. By starting with histograms, you can unlock deeper insights and drive more informed decisions in your projects.
Dynatrace has introduced the support for OpenTelemetry histograms in connection with the new visualization options in Dashboards and Notebooks. The histograms are supported starting from Dynatrace version 1.301.
In this blog, we will focus on histograms and why to use them. We will cover their main value and possibilities in OpenTelemetry.
What are histograms, and why use them?
A histogram is a specific type of metric that allows users to understand the distribution of data points over a period of time. This is particularly useful for metrics such as response times or payload sizes, where understanding variability and outliers is important. By analyzing how data points are spread out, teams can detect patterns or trends that might not be visible through simple averages or totals.
Histograms are commonly used to define and monitor service-level objectives (SLOs). They can help determine the percentage of requests that meet a specific response-time threshold, which is essential for maintaining service quality.
In practice, histograms are useful when the measurement distribution is relevant and the data sets are large. Teams can also change queries to get answers on already-collected data without needing to redefine metrics or wait for new data to accumulate.
Breaking down the benefits of OpenTelemetry histograms
OpenTelemetry instrumentation automatically generates histograms for HTTP client and server request durations. This feature, available by default for OTel-instrumented services, allows users a standard way to measure and compare response times across different services consistently.
Moreover, the OpenTelemetry Collector can measure service span durations, categorized by span names, span kinds, and status codes. The span metrics connector creates these measurements and presents them as histograms, which can be analyzed in Dynatrace for deeper insights.
Histograms also enhance the self-monitoring capabilities of the Collector. It reports batch sizes and HTTP/RPC measurements of its own pipelines as histograms, providing valuable metrics for performance monitoring. This self-monitoring aspect is crucial for maintaining the health and efficiency of the Collector itself, ensuring that it can handle the demands of large-scale data collection and processing without degradation.
Additionally, the Collector supports converting Prometheus and StatsD histograms into the OpenTelemetry protocol (OTLP), making them compatible with Dynatrace. By exporting metrics from different sources into a single platform, teams can achieve a holistic view of their system’s performance, facilitating proactive issue resolution and faster decision-making.
Percentiles to simplify analysis
Percentiles are statistical measures that divide a data set into 100 equal parts, providing a way to interpret specific points within your histograms. For instance, the 90th percentile (p90) is the value below which 90% of the data falls.
In practical applications, percentiles are particularly useful for web performance analysis. By examining the p90 percentile, you can identify the maximum response time experienced by 90% of users. This insight is crucial for optimizing performance for the majority of users. However, it also highlights that the remaining 10% of users experience longer wait times, which could lead to dissatisfaction.
With the Dynatrace Grail data lakehouse, extracting percentiles from histograms is straightforward, especially when using Notebooks. The percentile graphs can be seamlessly integrated into dashboards, providing clear and actionable insights.
What about managed Dynatrace?
All managed Dynatrace customers who don’t have Grail can still access histogram summaries (min|max|sum|count) and buckets, and they can use Data Explorer for histogram visualization. It’s important to note, however, that the percentile calculation requires Grail (Dynatrace SaaS).
Support for explicit and exponential histograms
The first metrics API/SDK release in the OpenTelemetry project introduced histograms with explicit bucket boundaries. These histograms are very popular and are also widely used by Prometheus. Dynatrace now fully supports them.
Later, OpenTelemetry introduced exponential histograms, with each consecutive bucket exponentially larger than the previous one. These histograms are more efficient in carrying a high dynamic range of different values and ensure that the relative error for every bucket remains stable. Dynatrace now supports exponential histograms by calculating histogram summaries (min, max, sum, count). But for now, percentile calculation and buckets are available only for explicit bucket histograms.
Try OpenTelemetry histograms
To experiment with OpenTelemetry histograms, you can deploy the OpenTelemetry Demo Application (Astronomy shop) with the span metrics connector. See this blog about exporting the data from the demo app to Dynatrace.
To learn more about the histograms in Dynatrace, see Histogram Visualization in Dynatrace docs.
As a leading contributor to the OpenTelemetry project, Dynatrace is committed to advancing its features and maximizing its value. By collaborating with the community and other vendors, Dynatrace ensures that OpenTelemetry remains cutting-edge, accessible, and user-friendly for everyone.