Debug School

Cover image for Monitoring Solutions
Suyash Sambhare
Suyash Sambhare

Posted on

Monitoring Solutions

Site Reliability Engineering

An SRE needs to complete the following tasks:

  • Maintain a healthy pipeline and simplify deployment.
  • Monitor availability and status for platform and managed services. Maintain application availability, reliability, and performance from the product user's perspective.
  • Detect problems before they affect customers, solve problems, and prevent them from happening again by creating manual or automated runbooks.
  • Set common goals with developers, and provide developers access to relevant logs, events, and performance metrics that are necessary to troubleshoot and quickly resolve problems.

They can be achieved with the following goals:

  • Monitoring infrastructure: Improve platform observability and monitor all infrastructures in one user interface
  • Managing built-in issues and incidents: Respond to and recover from various outages or slow-downs
  • Configuring and managing alerts: Monitor alerts and incidents, analyze the business impact, and respond in time
  • Synthetic monitoring: Prevent high-impact incidents from happening again
  • Managing actions: Automate and streamline incident response
  • Application perspectives: Capture necessary information to optimize the performance of applications and services
  • Monitoring infrastructure: Analyze resource usage and optimize capacity utilization
  • Service level objectives: Establish Service level objectives (SLO) and set thresholds to measure service performance
  • RCA: Identify the root cause of issues with Root Cause Analysis
  • Smart alerts: Automate anomaly detection and notification
  • Integrations, SDKs, and APIs: Leverage existing tools and technologies to enhance application performance

DevOps

DevOps can discover the traditional and modern services of the application, and deploy applications by using the DevOps pipeline to drive a consistent process for delivering changes and ensure that applications meet their goals of stability and security. See the following tasks:

  • Keep every system running at maximum reliability.
  • Predict and mitigate further incidents.
  • Mitigate incidents without help from developers.
  • Fulfill customer service level agreements (SLA) with solid SLOs (latency, traffic, errors, and saturation).

This can be achieved with the following goals:

  • Synthetic monitoring: Create Synthetic tests to actively monitor applications
  • Root cause analysis: Identify the root cause of issues
  • Built-in events reference: Monitor applications and infrastructure proactively
  • Building custom dashboards: Build custom dashboards, and provide insights into the application and infrastructure performance
  • Pipeline feedback: Examine the release impact
  • Backend correlation: Manage the backend infrastructure
  • Automatic discovery and monitoring: Locate and monitor all the components of an application or service
  • Setting up alert channels: Set up alert channels to notify issues that impact SLAs
  • Infrastructure correlation: Integrate infrastructure and application monitoring

IT Infrastructure Administrator

An IT admin needs to keep the performance of on-premises IT infrastructure to its best ability. See the following tasks:

  • Set up and configure infrastructures and software.
  • Perform capacity planning and optimization.
  • Monitor availability and status for on-premises IT infrastructures.
  • Monitor alerts and incidents, analyze the business impact, and respond in time.
  • Perform root cause analysis and relay information to developers or SREs.
  • Respond to and recover from various outages or slow-downs. Prevent high-impact incidents from happening again.

This can be achieved with the following goals:

  • Leveraging the dynamic graph: Understand all the physical and logical dependencies of components to set up and configure them
  • Service level objective: Create and manage Service Level Objectives (SLOs) to analyze the quality of service delivered
  • Monitoring infrastructure: Track resource usage to optimize capacity
  • Mobile app monitoring: Track your usage on the go
  • Monitoring websites
  • Analyze Infrastructure: Compare infrastructure entities to quickly isolate the scope of infrastructure issues
  • Root cause analysis: Detect the root cause of issues to facilitate quick resolution

Monitoring

Developer

A developer needs to code new microservices and cloud-native applications instead of worrying about infrastructure or resource provisioning. See the following tasks:

  • Get all the meaningful information to understand how the application (and some infrastructure) components interact with each other, and how bugs or issues spread and impact the global product.
  • Check whether any actions are needed for optimization.
  • Analyze problems, explore options, find the optimal way to solve them, and then code the solution.
  • Pay more attention to application development, and drive more innovation to keep up with competitors.

This can be achieved with the following goals:

  • AutoProfile: Locate performance hot spots and bottlenecks at the code level
  • Analyze profiles: Visualize application performance to identify and optimize the hot path
  • Application perspectives: View all components of an application or service to create, build, and support
  • Analyze infrastructure: Relate infrastructure issues to application impact
  • Pipeline feedback: Access release impact to anticipate and address potential issues
  • Synthetic monitoring: Create Synthetic tests to predict issues before they occur
  • Root cause analysis: Correlate performance data from different sources to identify the root cause of a performance issue

Ref: https://www.ibm.com/docs/en/instana-observability/current?topic=overview-personas-use-cases

Top comments (0)