Suyash Sambhare

Posted on Jul 17, 2024

Observability on OpenShift

#devops #redhat #openshift #observability

Red Hat OpenShift Observability helps users promptly detect and fix issues before they affect systems or applications by offering real-time visibility, monitoring, and analysis of various system metrics, logs, traces, and events. OpenShift Container Platform provides the following observability components to assist guarantee the dependability, efficiency, and security of your applications and infrastructure:

Monitoring
Logging
Distributed tracing
Open Telemetry
Network Observability
Power monitoring

To provide a cohesive observability solution, Red Hat OpenShift Observability integrates open-source observability tools and technologies. Red Hat OpenShift Observability's components cooperate to assist you in gathering, storing, delivering, analyzing, and visualizing data.

Monitoring

With metrics and personalized alerts for CPU and memory utilization, network connectivity, and other resource usage, you can keep an eye on the in-cluster health and performance of your OpenShift Container Platform apps. The Cluster Monitoring Operator deploys and oversees the various components of the monitoring stack.

Every OpenShift Container Platform installation includes monitoring stack components by default, which are overseen by the Cluster Monitoring Operator (CMO). Prometheus, Alertmanager, Thanos Querier, and other elements are some of these components. In order to enable Remote Health Monitoring for clusters, the CMO also installs the Telemeter Client, which transmits a portion of data from platform Prometheus instances to Red Hat.

Creating a cluster monitoring config map

You can configure the core OpenShift Container Platform monitoring components by creating the cluster-monitoring-config ConfigMap object in the openshift-monitoring project. The Cluster Monitoring Operator (CMO) then configures the core components of the monitoring stack.

Check whether the cluster-monitoring-config ConfigMap object exists: $ oc -n openshift-monitoring get configmap cluster-monitoring-config
If the ConfigMap object does not exist, create the following YAML manifest. In this example the file is called cluster-monitoring-config.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml:

Apply the configuration to create the ConfigMap object: $ oc apply -f cluster-monitoring-config.yaml

Configuring the monitoring stack

Start by editing the cluster-monitoring-config ConfigMap object in the openshift-monitoring project: $ oc -n openshift-monitoring edit configmap cluster-monitoring-config
Add your desired configuration under data/config.yaml as a key-value pair: <component_name>: <component_configuration>
Save the file to apply the changes.

The monitoring stack imposes additional resource requirements, so ensure you have sufficient resources. Also, avoid unsupported modifications to maintain compatibility and stability.

The monitoring stack in OpenShift typically includes the following components:

Prometheus: A time-series database and monitoring system that collects metrics from various services and applications.
Grafana: A visualization tool that allows you to create dashboards and charts based on Prometheus data.
Alertmanager: Handles alerts generated by Prometheus and sends notifications via various channels (email, Slack, etc.).
Node Exporter: Collects system-level metrics from each node in the cluster.
Kube-state Metrics: Provides metrics about Kubernetes objects (pods, deployments, etc.).
Cluster Monitoring Operator: Manages the deployment and configuration of the monitoring components.

Logging

Collect, display, send, and keep log data in order to diagnose problems, locate bottlenecks in performance, and discover security risks. Users can set up the LokiStack deployment in logging to provide personalized alarms and tracked metrics.

On an OpenShift Container Platform cluster, you can use logging as a cluster administrator to gather and compile infrastructure logs, application container logs, and node system audit logs. Logs can be forwarded to the log outputs of your choice, such as Red Hat managed log storage on-cluster. Depending on your deployed log storage option, you can also view your log data in the Kibana or OpenShift Container Platform web consoles.
The logs must be deployed, updated, and maintained by the operators. You can schedule logging pods and other resources required to support logging by creating a ClusterLogging custom resource (CR) when the Operators are deployed. Additionally, you can design a ClusterLogForwarder CR to designate which logs are gathered and how they are changed and where they go.

Logging architecture

The major components of the logging are:

Collector: Pods are deployed to every OpenShift Container Platform node via the collector, a daemonset. Every node's log data is gathered, processed, and sent to designated outputs. Either the old Fluentd collector or the Vector collector can be used. It is intended to deprecate and delete Fluentd in a later version. During the current release lifecycle, Red Hat offers bug fixes and support for this feature; however, upgrades are no longer being made to it. You can use Vector as an alternative to Fluentd.
Log store: The log store stores log data for further analysis and serves as the log forwarder's default output. You can utilize the default LokiStack log store, the classic Elasticsearch log store, or route logs to other external log stores.
Visualization: You can use a UI component to view a visual representation of your log data. The UI provides a graphical interface to search, query, and view stored logs. The OpenShift Container Platform web console UI is provided by enabling the OpenShift Container Platform console plugin.

Logging collects container logs and node logs. These are categorized into types:

Application logs: Container logs generated by user applications running in the cluster, except infrastructure container applications.
Infrastructure logs: Container logs generated by infrastructure namespaces: openshift*, kube*, or default, as well as journald messages from nodes.
Audit logs: Logs generated by auditd, the node audit system, which are stored in the /var/log/audit/audit.log file, and logs from the auditd, kube-apiserver, openshift-apiserver services, as well as the ovn project if enabled.

Installing Logging operator

To install the logging subsystem with Elasticsearch in OpenShift using the web console, follow these steps:

Install the OpenShift Elasticsearch Operator:
- In the OpenShift Container Platform web console, go to Operators → OperatorHub.
- Search for "OpenShift Logging" and choose Red Hat OpenShift Logging.
- Click Install and select a specific namespace for installation.
Wait for both operators to show "Succeeded" status in the Installed Operators screen.
On the terminal, log in as kubeadmin.
Apply the logging/cluster-logging-instance.yaml file to your cluster using the oc tool to create an instance of Kibana.

apiVersion: logging.openshift.io/v1
kind: ClusterLogging
metadata:
  name: instance 
  namespace: openshift-logging
spec:
  managementState: Managed 
  logStore:
    type: elasticsearch 
    retentionPolicy: 
      application:
        maxAge: 1d
      infra:
        maxAge: 7d
      audit:
        maxAge: 7d
    elasticsearch:
      nodeCount: 3 
      storage:
        storageClassName: <storage_class_name> 
        size: 200G
      resources: 
          limits:
            memory: 16Gi
          requests:
            memory: 16Gi
      proxy: 
        resources:
          limits:
            memory: 256Mi
          requests:
            memory: 256Mi
      redundancyPolicy: SingleRedundancy
  visualization:
    type: kibana 
    kibana:
      replicas: 1
  collection:
    type: fluentd 
    fluentd: {}

Distributed tracing

Store and visualize large volumes of requests passing through distributed systems, across the whole stack of microservices, and under heavy loads. Use it for monitoring distributed transactions, gathering insights into your instrumented services, network profiling, performance and latency optimization, root cause analysis, and troubleshooting the interaction between components in modern cloud-native microservices-based applications.

Red Hat build of OpenTelemetry

Instrument, generate, collect, and export telemetry traces, metrics, and logs to analyze and understand your software’s performance and behavior. Use open-source back ends like Tempo or Prometheus, or use commercial offerings. Learn a single set of APIs and conventions, and own the data that you generate.

Network Observability

Observe the network traffic for OpenShift Container Platform clusters and create network flows with the Network Observability Operator. View and analyze the stored network flows information in the OpenShift Container Platform console for further insight and troubleshooting.

Power monitoring

Monitor the power usage of workloads and identify the most power-consuming namespaces running in a cluster with key power consumption metrics, such as CPU or DRAM measured at the container level. Visualize energy-related system statistics with the Power monitoring Operator.

Ref: https://docs.openshift.com/container-platform/4.16/observability/index.html

Debug School