The Site Reliability Guardian provides an automated change impact analysis to validate service availability, performance, and capacity objectives across various systems. This enables DevOps platform engineers to make the right release decisions for new versions and empowers SREs to apply Service-Level Objectives (SLOs) for their critical services.

While the Dynatrace Site Reliability Guardian simplifies the adoption of DevOps and SRE best practices to ensure reliable, secure, and high-quality releases in general, the provisioned workflow is key for automating those best practices in particular. Learn in this sample how the workflow will act on changes in your environment and how it will perform a validation to make the right decision in a releasing or progressive delivery process.

In this lab exercise, you will learn more about the workflow leveraged by the Site Reliability Guardian .

image

Workflows allows you to:

Tasks to complete this step

  1. Create a guardian
    • In Dynatrace from menu on the left, select Apps -> Site Reliability Guardian.
    • On the All Guardians overview page, select + Guardian.
    • Click on create without template. A new guardian is displayed in the editor.
    • Provide a name for the guardian: my-first-guardian
    • Add the following objective example by defining the name, a DQL, and specifying a target as well as a warning threshold.
      • Objective name: Error rate
      • Description: The error rate objective derives the number of error logs in ratio to the total number of log lines. The target is set to be less than 2% with a warning indicator at 1.5% of error logs.
      • DQL to calculate error rate:
            fetch logs
            | fieldsAdd errors = toLong(loglevel == "ERROR")
            | summarize errorRate = sum(errors)/count() * 100
        
        
        image
      • Click on run query, select the last 1 hour to previw results of your current error rate.
    • Set thresholds for this objective:
      • Select Lower than the these numbers is good
      • Failure: 1
      • Warning: 0.4
      For other examples, please see: Site Reliability Guardian objective examples.
    • Click on Create to create the guardian.
    • Click on Validate to perform a manual validation of the objective.

In todays lab for Workflow's we'll leverage the Guardian we created in previous step and run that on scheduled basis to ensure we always meet our SLO

Tasks to complete this step

  1. Sign in to Dynatrace.
  2. In the Dynatrace Launcher, select Workflows.
  3. Select Add New workflow to create your first workflow. The Workflow editor opens.
  4. Select the workflow title ("Untitled Workflow") and customize the name e.g. "Guardian Check".
  5. In the Choose trigger section, select the Time interval trigger.
    • Set the Run every (min) parameter to 10 minutes and Rule parameter to everydayimage
  6. To add the first task, select + on the trigger node of the workflow graph. image
  7. In the Choose action section, which lists all available actions for tasks, select "Site Reliability Guardian". The workflow now has its first task and shows the input configuration for that task on the right.
    • For the Guardian, select "my-first-guardian"
    • For the timeframe, select "Last 1 Hour" image
  8. From the left menu click on save to save the workflow image
  9. Select Run to execute the workflow.

In this section, you should have completed the following:

✅ Understand what a Site Reliability Guardian is and how it can strengten SRE practices

✅ Understand use cases for Workflows

✅ Created a Site Reliability Guardain.

✅ Created a simple workflow using the SRG