Monitoring Solutions

#devops #monitoring #instana #sre

Site Reliability Engineering

An SRE needs to complete the following tasks:

Maintain a healthy pipeline and simplify deployment.
Monitor availability and status for platform and managed services. Maintain application availability, reliability, and performance from the product user's perspective.
Detect problems before they affect customers, solve problems, and prevent them from happening again by creating manual or automated runbooks.
Set common goals with developers, and provide developers access to relevant logs, events, and performance metrics that are necessary to troubleshoot and quickly resolve problems.

They can be achieved with the following goals:

Monitoring infrastructure: Improve platform observability and monitor all infrastructures in one user interface
Managing built-in issues and incidents: Respond to and recover from various outages or slow-downs
Configuring and managing alerts: Monitor alerts and incidents, analyze the business impact, and respond in time
Synthetic monitoring: Prevent high-impact incidents from happening again
Managing actions: Automate and streamline incident response
Application perspectives: Capture necessary information to optimize the performance of applications and services
Monitoring infrastructure: Analyze resource usage and optimize capacity utilization
Service level objectives: Establish Service level objectives (SLO) and set thresholds to measure service performance
RCA: Identify the root cause of issues with Root Cause Analysis
Smart alerts: Automate anomaly detection and notification
Integrations, SDKs, and APIs: Leverage existing tools and technologies to enhance application performance

DevOps

DevOps can discover the traditional and modern services of the application, and deploy applications by using the DevOps pipeline to drive a consistent process for delivering changes and ensure that applications meet their goals of stability and security. See the following tasks:

Keep every system running at maximum reliability.
Predict and mitigate further incidents.
Mitigate incidents without help from developers.
Fulfill customer service level agreements (SLA) with solid SLOs (latency, traffic, errors, and saturation).

This can be achieved with the following goals:

Synthetic monitoring: Create Synthetic tests to actively monitor applications
Root cause analysis: Identify the root cause of issues
Built-in events reference: Monitor applications and infrastructure proactively
Building custom dashboards: Build custom dashboards, and provide insights into the application and infrastructure performance
Pipeline feedback: Examine the release impact
Backend correlation: Manage the backend infrastructure
Automatic discovery and monitoring: Locate and monitor all the components of an application or service
Setting up alert channels: Set up alert channels to notify issues that impact SLAs
Infrastructure correlation: Integrate infrastructure and application monitoring

IT Infrastructure Administrator

An IT admin needs to keep the performance of on-premises IT infrastructure to its best ability. See the following tasks:

Set up and configure infrastructures and software.
Perform capacity planning and optimization.
Monitor availability and status for on-premises IT infrastructures.
Monitor alerts and incidents, analyze the business impact, and respond in time.
Perform root cause analysis and relay information to developers or SREs.
Respond to and recover from various outages or slow-downs. Prevent high-impact incidents from happening again.

This can be achieved with the following goals:

Leveraging the dynamic graph: Understand all the physical and logical dependencies of components to set up and configure them
Service level objective: Create and manage Service Level Objectives (SLOs) to analyze the quality of service delivered
Monitoring infrastructure: Track resource usage to optimize capacity
Mobile app monitoring: Track your usage on the go
Monitoring websites
Analyze Infrastructure: Compare infrastructure entities to quickly isolate the scope of infrastructure issues
Root cause analysis: Detect the root cause of issues to facilitate quick resolution

Developer

A developer needs to code new microservices and cloud-native applications instead of worrying about infrastructure or resource provisioning. See the following tasks:

Get all the meaningful information to understand how the application (and some infrastructure) components interact with each other, and how bugs or issues spread and impact the global product.
Check whether any actions are needed for optimization.
Analyze problems, explore options, find the optimal way to solve them, and then code the solution.
Pay more attention to application development, and drive more innovation to keep up with competitors.

This can be achieved with the following goals:

AutoProfile: Locate performance hot spots and bottlenecks at the code level
Analyze profiles: Visualize application performance to identify and optimize the hot path
Application perspectives: View all components of an application or service to create, build, and support
Analyze infrastructure: Relate infrastructure issues to application impact
Pipeline feedback: Access release impact to anticipate and address potential issues
Synthetic monitoring: Create Synthetic tests to predict issues before they occur
Root cause analysis: Correlate performance data from different sources to identify the root cause of a performance issue

Ref: https://www.ibm.com/docs/en/instana-observability/current?topic=overview-personas-use-cases

Debug School

Monitoring Solutions

Site Reliability Engineering

DevOps

IT Infrastructure Administrator

Developer

Top comments (0)