Debug School

Cover image for AMQ Streams Monitoring
Suyash Sambhare
Suyash Sambhare

Posted on

AMQ Streams Monitoring

Monitoring data allows you to monitor the performance and health of AMQ Streams. You can configure your deployment to capture metrics data for analysis and notifications.

Metrics data is useful when investigating issues with connectivity and data delivery. Metrics data can identify under-replicated partitions or the rate at which messages are consumed. Alerting rules can provide time-critical notifications on such metrics through a specified communications channel. Monitoring visualizations present real-time metrics data to help determine when and how to update the configuration of your deployment. Example metrics configuration files are provided with AMQ Streams.

Distributed tracing complements the gathering of metrics data by providing a facility for end-to-end tracking of messages through AMQ Streams.

Metrics and monitoring tools

AMQ Streams can employ the following tools for metrics and monitoring:

  • Prometheus
  • Kafka Exporter
  • Grafana
  • OpenTelemetry
  • Cruise Control

Monitoring

Prometheus

Prometheus can extract metrics data from Kafka components and the AMQ Streams Operators.

To use Prometheus to obtain metrics data and provide alerts, Prometheus and the Prometheus Alertmanager plugin must be deployed. Kafka resources must also be deployed or redeployed with metrics configuration to expose the metrics data.

Prometheus scrapes the exposed metrics data for monitoring. Alertmanager issues alerts when conditions indicate potential problems, based on pre-defined alerting rules.

Sample metrics and alerting rules configuration files are provided with AMQ Streams. The sample alerting mechanism provided with AMQ Streams is configured to send notifications to a Slack channel.

Grafana

Grafana uses the metrics data exposed by Prometheus to present dashboard visualizations for monitoring.

A deployment of Grafana is required, with Prometheus added as a data source. Example dashboards, supplied with AMQ Streams as JSON files, are imported through the Grafana interface to present monitoring data.

Kafka Exporter

Kafka Exporter is an open source project to enhance monitoring of Apache Kafka brokers and clients. Kafka Exporter is deployed with a Kafka cluster to extract additional Prometheus metrics data from Kafka brokers related to offsets, consumer groups, consumer lag, and topics. You can use the Grafana dashboard provided to visualize the data collected by Prometheus from Kafka Exporter.

A sample configuration file, alerting rules and Grafana dashboard for Kafka Exporter are provided with AMQ Streams.

Distributed tracing

Distributed tracing tracks the progress of transactions between applications in a distributed system. In a microservices architecture, tracing tracks the progress of transactions between services. Trace data is useful for monitoring application performance and investigating issues with target systems and end-user applications.

In AMQ Streams, tracing facilitates the end-to-end tracking of messages: from source systems to Kafka, and then from Kafka to target systems and applications. Distributed tracing complements the monitoring of metrics in Grafana dashboards, as well as the component loggers.

Support for tracing is built in to the following Kafka components:

  1. MirrorMaker to trace messages from a source cluster to a target cluster
  2. Kafka Connect to trace messages consumed and produced by Kafka Connect
  3. Kafka Bridge to trace messages between Kafka and HTTP client applications
  4. Tracing is not supported for Kafka brokers.
  5. You enable and configure tracing for these components through their custom resources. You add tracing configuration using spec.template properties.

You enable tracing by specifying a tracing type using the spec.tracing.type property:

opentelemetry

Specify type: opentelemetry to use OpenTelemetry. By Default, OpenTelemetry uses the OTLP (OpenTelemetry Protocol) exporter and endpoint to get trace data. You can specify other tracing systems supported by OpenTelemetry, including Jaeger tracing. To do this, you change the OpenTelemetry exporter and endpoint in the tracing configuration.

Cruise Control

Cruise Control is an open source system that supports the following Kafka operations:

  1. Monitoring cluster workload
  2. Rebalancing a cluster based on predefined constraints
  3. The operations help with running a more balanced Kafka cluster that uses broker pods more efficiently.

A typical cluster can become unevenly loaded over time. Partitions that handle large amounts of message traffic might not be evenly distributed across the available brokers. To rebalance the cluster, administrators must monitor the load on brokers and manually reassign busy partitions to brokers with spare capacity.

Cruise Control automates the cluster rebalancing process. It constructs a workload model of resource utilization for the cluster—​based on CPU, disk, and network load—​and generates optimization proposals (that you can approve or reject) for more balanced partition assignments. A set of configurable optimization goals is used to calculate these proposals.

You can generate optimization proposals in specific modes. The default full mode rebalances partitions across all brokers. You can also use the add-brokers and remove-brokers modes to accommodate changes when scaling a cluster up or down.

When you approve an optimization proposal, Cruise Control applies it to your Kafka cluster. You configure and generate optimization proposals using a KafkaRebalance resource. You can configure the resource using an annotation so that optimization proposals are approved automatically or manually.

Ref: https://access.redhat.com/documentation/en-us/red_hat_amq_streams/2.6/html/amq_streams_on_openshift_overview/metrics-overview_str

Top comments (0)