Monitoring Automation

Last Updated 12/13/2021, 8:34:21 AM UTC
About 16 min read

Polaris monitoring automation allows you to established formalized, consistent and repeatable monitoring processes. It is a simple monitoring configuration management system that allows you to deploy the same tasks on multiple agents across many environments while at the same time retaining control over monitoring parameters which might be specific to each target. This formalized process allows you to proof and evolve your monitoring processes from a single point of control.

Examples

You can find examples and reference automation configurations in https://github.com/arisant/ms-polaris (opens new window)

# Terminology

The following terms are used throughout this guide to describe the polaris monitoring automation system.

monitoring profile, or profile
Defines what monitoring processes are deployed, where they are deployed and how they should be configured. profiles are the single source of truth for your entire set of monitoring processes for an environment
parameterized configuration, or configuration, or template
A monitoring task defined as a reusable collector task or action with its corresponding configuration. The configuration is a text template, allowing for parameterization where it is required. Examples of monitoring tasks are "collect cpu utilization stats every 5m", "collect file system utilization stats every 12h", "check for blocking locks in some database every 1m", "auto allocate more space to some database when depleted"
configuration package, or package
Defines a monitoring process as a set of monitoring tasks (configurations). Examples of monitoring processes might be "Oracle database instance" which can include tasks such as "Instance up/down", "Listener up/down", "Start instance when down", "Start listener when down", etc. Inside a profile, you define a list of packages that you want to deploy for a set of agents and the parameter values to use for the package configurations. The system can then generate the corresponding task configuration files and configuration index
configuration index, or index
Represents the desired configuration state of the monitored environment. This is derived by the system from the configured profiles. Polaris automatically calculates and applies the deltas between the current monitoring state and the desired state in the index. The deltas are the collector tasks or actions that must be deployed, un-deployed or modified on each agent

# Workflow Overview

# Define Monitoring Processes

This is the entry point to the monitoring automation system. SMEs create and maintain reusable monitoring processes in terms of configuration and package files which are maintained in one or more git repositories. Operators, instantiate monitoring processes through profile files which reference the required packages and provide values for configuration parameters. profile files are also maintained in a git repository. When profiles are created or modified, or when the referenced packages and configurations are modified, operators simply have to run a single command through the Polaris CLI to create or update all the relevant configuration files and calculate the new desired monitoring state (index file).

Monitoring automation - define monitoring processes

As shown in the figure above, the steps in this workflow are:

Push packages and configurations changes to git
Push profile changes to git
Run the configuration generator and review resulting configurations
Push configurations and index file to git

# Monitoring State Synchronization

This is an automated workflow where Polaris is continuously watching a monitoring site's git repo for changes to the monitoring index file. When changes are detected, the system calculates the delta between the desired and current monitoring states and applies all the required changes to the appropriate agents by creating new tasks and updating or removing existing tasks. Operators can use the Polaris CLI to query the monitoring state before, during and after the synchronization process. Additionally, an operator can request Polaris to synchronize the monitoring state at any time if it diverges from the desired state for any reason.

Monitoring automation - monitoring state synchronization

As shown in the figure above, the steps in this workflow are:

Operator pushes configurations and index to git
Polaris picks up changes and calculates the monitoring state delta
Polaris applies delta to agents
Operator queries monitoring state
Operator requests monitoring state synchronization

# Monitoring Process Asset Organization

As described above, monitoring processes are composed from files which are maintained in git repositories. Each monitoring process component (profile, package, template) is defined using yaml in it's own file, organized in file hierarchies under predefined directory paths:

autoconf/profiles/, root directory for profile files
autoconf/packages/, root directory for package files
autoconf/templates/, root directory for parameterized configuration files

An example configuration hierarchy might look like this:

.
├── packages
│   └── oracle
│       ├── db
│       │   ├── backups.yaml              # templates/oracle/db/rman.yaml
│       │   └── base.yaml                 # templates/oracle/db/* excluding rman.yaml
│       └── wls
│           └── base.yaml                 # templates/oracle/wls/*
├── profiles
│   ├── oracle-dbs.yaml                   # all oracle database monitoring processes 
│   └── weblogic-servers.yaml             # all weblogic monitoring processes
└── templates
    └── oracle
        ├── db
        │   ├── alloc-tablespace.yaml     # tablespace allocation actions
        │   ├── db-status.yaml            # database up/down
        │   ├── listener-status.yaml      # listener up/down
        │   ├── logmon-alertlog.yaml      # alert logs
        │   ├── rman.yaml                 # backup status
        │   ├── stats.yaml                # basic stats from v$ views
        │   └── ts-status.yaml            # tablespace utilization stats
        └── wls
            ├── diag-log.yaml             # server diagnostic logs
            ├── domain-health.yaml        # domain health probe
            └── server-log.yaml           # server logs

Profiles are specific to each monitoring site, so they are maintained in the Polaris git repository for each site. Packages and templates can be shared by many sites, or they can be specific to one site. As such, Polaris can use package and template assets from multiple git repositories. All you need to do, is clone the required git repositories and point the Polaris generator to the asset paths in each repository using an autoconf.yaml configuration:

git clone https://github.com/arisant/ms-polaris.git /home/me/projects/ms-polaris
git clone http://some-site.com/repos/polaris.git /home/me/projects/some-site

# one or more package and template repositories
pkg_repos:
  - /home/me/projects/ms-polaris/autoconf
  - /home/me/projects/some-site/autoconf

# one profile repository
profile_repo: /home/me/projects/some-site/autoconf/autoconf

# one repository for the generated assets
dest_repo: /home/me/projects/some-site/autoconf/assets

# Templates

Templates define reusable collector tasks or actions which are parameterized via golang text templates enhanced with sprig functions.

References

Each template is uniquely identified by a template_name field. Polaris can validate the parameters provided to text templates during instantiation through a schema defined in the parameters field for the template:

Property	Type	Required	Description
`required`	`[]string`	No	the names of the required template parameters
`properties`	`map[string]object`	No	describes each template parameter. Keys are the parameter names and values are the parameter descriptors

Parameters are described according to their type:

Property	Type	Required	Description
`type`	`string`	Yes	one of `string`, `object`, `array`
`description`	`string`	No	Human readable description
`items`	`object`		describes the contents of an `array` property
`properties`	`object`		describes the contents of an `object` property

Quoting template placehoder values

You should quote template placeholders {{}} in yaml values only when they evaluate to a string that contains yaml indicator characters:

{, }, [, ], &, *, #, ?, |, -, <, >, =, !, %, @, :, `, ,

For example

# quote {{.msg}} value because .msg may contain yaml spacial characters
msg: "{{.msg}}"

# Collector Task Templates

A collector task template defines the runtime parameters for a polaris collector task using the following properties:

Property	Type	Required	Description
`task_name`	`string`	Yes	task name which can be templated
`task_plugin`	`string`	Yes	plugin for this task
`task_sched`	`string`	Yes	schedule as a duration or cron expression
`task_conf_path`	`string`	Yes	path where to store the generated configuration file for this task. Can be templated

The plugin configuration for the collector task, is a named nested text template defined inside the collector task text template. It is declared with a define and end template action with name task. This allows you to define any valid yaml for the plugin configuration within this section that can be understood and validated by your editor and you.

template_name: <unique template identifier>
parameters:
task_name: <templated task name>
task_plugin: <plugin name>
task_sched: <schedule>
task_conf_path: <templated config path>

# nested collector plugin configuration template declaration
# {{define "task"}}
# any valid yaml plugin templated configuration goes in here 
# {{end}}

Below is an example configuration for a collector task to monitor a Weblogic domain's health. It uses the restful-metrics plugin to hit the WLS management/tenant-monitoring/servers admin server URL. This template can be applied to any weblogic domain by setting the appropriate values for each of the parameters.

template_name: WLS domain health monitor
parameters:
  required: [host, domain, admin_url, admin_user, admin_password]
  properties:
    domain:
      type: string
      description: Weblogic domain to monitor
    admin_url:
      type: string
      description: Weblogic domain admin URL
    admin_password:
      type: string
      description: Weblogic domain admin auth password (a secret)
    admin_user:
      type: string 
      description: Weblogic domain auth username
    host:
      type: string
      description: hostname of agent running this task

task_sched: 5m
task_plugin: restful-metrics
task_name: "WLS domain health status ({{.domain}})"
task_conf_path: "{{.host}}/wls/{{.domain}}/health.yaml" 

# {{define "task"}}
outputs:
  - id: "health check/{{.domain}}"
    url: "{{.admin_url}}/management/tenant-monitoring/servers"
    method: GET
    username: "{{.admin_user}}"
    password: "{{.admin_password}}"
    data_path: body.items
    headers:
      "Accept": "application/json"
    timeout: 10s
    source: "{{.host}}"
    response_type: json
    metric:
      name: "wls/health"
      value:
        path: health
      dimensions:
        server:
          path: name
# {{end}}

When applied with parameter values

host: idp.prd.site.com
domain: identity_provider
admin_url: https://idp.prd.site.com:7001/wls/admin
admin_user: weblogic
admin_password: ${secret_password}

...it will create collector task "WLS domain health status (identity_provider)" running every 5m with its configuration in file idp.prd.site.com/wls/identity_provider/health.yaml, with content:

outputs:
  - id: "health check/identity_provider"
    url: "https://idp.prd.site.com:7001/wls/admin/management/tenant-monitoring/servers"
    method: GET
    username: "weblogic"
    password: "${secret_password}"
    data_path: body.items
    headers:
      "Accept": "application/json"
    timeout: 10s
    source: "idp.prd.site.com"
    response_type: json
    metric:
      name: "wls/health"
      value:
        path: health
      dimensions:
        server:
          path: name

# Action Templates

An action template defines the runtime parameters for a polaris action using the following properties:

Property	Type	Required	Description
`plugin`	`string`	Yes	plugin for this action
`conf_path`	`string`	Yes	path where to store the generated configuration file for this action. Can be templated

The plugin configuration for the action, is a named nested text template defined inside the action text template. It is declared with a define and end template action with name task, just like for collector tasks.

template_name: <unique template identifier>
parameters:
plugin: <plugin name>
conf_path: <templated config path>

# nested action plugin configuration template declaration
# {{define "task"}}
# any valid yaml plugin templated configuration goes in here
# {{end}}

Below is an example configuration for an action template which allocates storage to depleted Oracle tablespaces

template_name: Oracle Tablespace Allocator
parameters:
  required: [instances, host]
  properties:
    host: 
      type: string
      description: hostname for agent running this action
    instances:
      type: array
      description: the oracle instances to allocate storage for
      items:
        type: object
        required: [sid, oracle_home]
        properties:
          sid:
            type: string
            description: oracle instance name
          oracle_home:
            type: string
            description: oracle home directory
          ts_auto_alloc:
            type: object
            description: auto allocation parameters
            properties:
              allowed_tbs:
                type: string
                description: "regexp for tablespace names that auto allocation is allowed (default: all)"
      
plugin: ora-ts-alloc
conf_path: "{{.host}}/alloc_tablespace.yaml"

# {{define "task"}}          
timeout: 2m
allocators:
# {{- range .instances}}
# {{- if .ts_auto_alloc}}
  - instance_name: {{.sid}}
    oracle_home: {{.oracle_home}}
    oracle_sid: {{.sid}}
    oracle_logon: {{default "/ as sysdba" .oracle_logon}}
    suppress_alerts: false
    # UNDO tablespaces ignored by default
    allowed_tablespaces: {{.ts_auto_alloc.allowed_tbs}}

    default_big_file_policy:
      autoextend_max_size: 5120g

    default_small_file_policy:
      omf: true
      initial_size: 4g
      tablespace_free_limit: 16g
# {{- end}}
# {{- end}}
# {{end}}

When applied with parameter values

host: ora.db.site.com
instances:
  - sid: ORCL_BILLING
    oracle_home: /opt/oracle/product/19c/dbhome_1
    ts_auto_alloc:
      allowed_tbs: ts =~ (DATA_.+)|(INDEX_.+)
  - sid: ORCL_DW
    oracle_home: /opt/oracle/product/19c/dbhome_2
    oracle_logon: / as sysoper
    ts_auto_alloc:

...it will create the action configuration in file ora.db.site.com/alloc_tablespace.yaml, with content:

timeout: 2m
allocators:
  - instance_name: ORCL_BILLING
    oracle_home: /opt/oracle/product/19c/dbhome_1
    oracle_sid: ORCL_BILLING
    oracle_logon: "/ as sysdba"
    suppress_alerts: false
    # UNDO tablespaces ignored by default
    allowed_tablespaces: ts =~ (DATA_.+)|(INDEX_.+)

    default_big_file_policy:
      autoextend_max_size: 5120g

    default_small_file_policy:
      omf: true
      initial_size: 4g
      tablespace_free_limit: 16g
  - instance_name: ORCL_DW
    oracle_home: /opt/oracle/product/19c/dbhome_2
    oracle_sid: ORCL_DW
    oracle_logon: "/ as sysoper"
    suppress_alerts: false
    # UNDO tablespaces ignored by default
    allowed_tablespaces:

    default_big_file_policy:
      autoextend_max_size: 5120g

    default_small_file_policy:
      omf: true
      initial_size: 4g
      tablespace_free_limit: 16g

# Packages

A package file composes a monitoring process by using individual collector and action template files. Profiles can then reference packages instead of individual template, making the configuration simpler to understand.

Property	Type	Required	Description
`name`	`string`	Yes	name of the package
`collectors`	`[]object`	No	collectors participating in this package
`actions`	`[]object`	No	actions participating in this package

The parameters for each template are exposed by the package to profiles under a namespace - multiple templates in a package can share the same namespace. This template parameter namespacing can resolve naming conflicts between parameters from different packages or templates, and more importantly controls how collector tasks and actions are generated by Polaris. For collectors, the parameter namespace is expected to be an array and the generator will create one task for each array element. For actions, the parameter namespace is expected to be an object and the generator will always create a single action from the contents of the namespace.

The collectors and actions list members are defined as follows:

Property	Type	Required	Description
`name`	`string`	Yes	collector\|action name
`template`	`string`	Yes	path to collector\|action template file
`template_params`	`string`	Yes	parameter namespace in profile configuration to read template parameter values from
`when`	`string`	No	boolean expression, from template parameters, which determines if the collector\|action will be applied

Collector Task and Action generation

Collectors: template_params is expected to be an array and Polaris will create one task for each array element
Actions: template_params is expected to be an object and Polaris will always create a single action per agent

Below is an example configuration for an Oracle database basic monitoring process

name: basic oracle database monitoring
collectors:
  - name: alert log collector
    template: oracle/db/logmon-alertlog.yaml
    template_params: instances
  - name: db-status
    template: oracle/db/db-status.yaml
    template_params: instances
  - name: tablespace status
    template: oracle/db/ts-status.yaml
    template_params: instances
  - name: listener status
    template: oracle/db/listener-status.yaml
    template_params: listeners
  - name: db stats
    template: oracle/db/stats.yaml
    template_params: dsns
actions:
  - name: auto allocate storage to depleted tablespaces
    template: oracle/db/alloc-tablespace.yaml
    template_params: instances
    when: ts_auto_alloc

# Profiles

Profiles define which package are applied to what agents and with what parameters:

Property	Type	Required	Description
`name`	`string`	Yes	profile name
`packages`	`[]string`	Yes	the configuration packages to apply to this profile
`task_opts`	`map[string]object`	No	optional overrides for specific tasks by name for all agents in profile
`inject`	`map[string]object`	No	injects parameters to all agents
`defaults`	`map[string]object`	No	default values to apply per configuration parameter
`agents`	`map[string]object`	Yes	agents to apply package with per agent parameters

# Task Options

Overrides default configuration for collector schedule. Configurations are identified by name from package definition:

Property	Type	Required	Description
`schedule`	`string`	Yes	the schedule override

# Agents

Define the agents to apply configuration packages to as keys to a map of tasks options and parameters. The root parameter names must match the template_params configuration in the corresponding packages.

Property	Type	Required	Description
`task_opts`	`object`	No	optional overrides for specific tasks by name for agent
`parameters`	`map[string]object`	Yes	package parameters to template parameters

The example configuration below applies the Oracle database backup and basic monitoring packages to 3 agents:

# monitors oracle instances and rman backup statuses
name: oracle databases
packages:
  - oracle/db/base.yaml
  - oracle/db/backups.yaml

# default task overrides
task_opts:
 # sampling rate for rman stats to 12h instead of default 1h
  "rman stats": 
    schedule: 12h

# these defaults will be applied when absent from agent paramters   
defaults:
  # alert log collector, db-status, tablespace status collector defaults 
  instances:
    tns_admin: ""
    oracle_home: /opt/oracle/product/19c/dbhome_1
    oracle_logon: "/ as sysdba"
    ts_limits:
      default_free: 15%
  # listener status defaults
  listeners:
    oracle_home: /opt/oracle/product/19c/dbhome_1
   
agents:
  # billing agent monitoring processes
  billing:
    # we want rman stats to be collected at 2h intervals for all databases monitored by billing agent
    task_opts:
      "rman stats": 
        schedule: 2h
    parameters:
      # one db stats collector for each DSN 
      dsns:
        - dsn: billing.prd.example.com
      # one log collector, db-status, tablespace status collector for each sid
      # one auto allocate storage to depleted tablespaces action for all sids
      instances:
        - sid: billing
          oracle_home: /opt/oracle/product/19c/dbhome_2
          alert_log: /opt/oracle/diag/rdbms/billing/billing/trace/alert_billing.log
          ts_auto_alloc:
            allowed_tbs: ts !~ "USERS"
        - sid: billing-dr
          alert_log: /opt/oracle/diag/rdbms/billing/billing-dr/trace/alert_billing-dr.log          
      # one listener status collector for each listener
      listeners:
        - name: LSRN
          services:
            - svc1
            - svc2
  # dev data warehouse agent monitoring processes
  dwhd:
    parameters:
      instances:
        - sid: dwdev
          alert_log: /opt/oracle/diag/rdbms/dwdev/dwdev/trace/alert_dwdev.log
          ts_limits: 
            default_free: 20%
      listeners:
        - name: LSRN
          services:
            - svc1
  # production data warehouse agent monitoring processes
  dwhp:
    parameters:
      dsns:
        - dsn: dwprd.prd.example.com
        - dsn: dwdev.dev.example.com
      instances:
        - sid: dwprd
          ts_auto_alloc:
            allowed_tbs: ts !~ "USERS"
          alert_log: /opt/oracle/diag/rdbms/dwprd/dwprd/trace/alert_dwprd.log
      listeners:
        - name: LSRN
          services:
            - svc1
            - svc2

# Working with the polaris CLI

Clone the relevant git repos and use an autoconf.yaml configuration to point the generator to the package repositories, profile repository and destination repository paths:

git clone https://github.com/arisant/ms-polaris.git /home/me/projects/ms-polaris
git clone http://some-site.com/repos/polaris.git /home/me/projects/some-site

# one or more package and template repositories
pkg_repos:
  - /home/me/projects/ms-polaris/autoconf
  - /home/me/projects/some-site/autoconf

# one profile repository
profile_repo: /home/me/projects/some-site/autoconf/autoconf

# one repository for the generated assets
dest_repo: /home/me/projects/some-site/autoconf/assets

# Generate Monitoring Processes

Simply run the generator CLI against the profiles you created:

myrmex autoconf generate /home/me/projects/autoconf.yaml

# Synchronization

Once you verified the generated configurations, push them to git. Polaris will pick up the new configurations and apply them to the corresponding agents. To check on the synchronization state:

myrmex autoconf state

If at any point you need to manually synchronize the current configuration state:

myrmex autoconf sync

← Myrmex CLI Git DevOps →