Monitoring Automation
- Last Updated 12/13/2021, 8:34:21 AM UTC
- About 16 min read
Polaris monitoring automation allows you to established formalized, consistent and repeatable monitoring processes. It is a simple monitoring configuration management system that allows you to deploy the same tasks on multiple agents across many environments while at the same time retaining control over monitoring parameters which might be specific to each target. This formalized process allows you to proof and evolve your monitoring processes from a single point of control.
Examples
You can find examples and reference automation configurations in https://github.com/arisant/ms-polaris (opens new window)
# Terminology
The following terms are used throughout this guide to describe the polaris monitoring automation system.
monitoring profile
, orprofile
Defines what monitoring processes are deployed, where they are deployed and how they should be configured.profiles
are the single source of truth for your entire set of monitoring processes for an environmentparameterized configuration
, orconfiguration
, ortemplate
A monitoring task defined as a reusable collector task or action with its corresponding configuration. The configuration is a text template, allowing for parameterization where it is required. Examples of monitoring tasks are "collect cpu utilization stats every 5m", "collect file system utilization stats every 12h", "check for blocking locks in some database every 1m", "auto allocate more space to some database when depleted"configuration package
, orpackage
Defines a monitoring process as a set of monitoring tasks (configurations
). Examples of monitoring processes might be "Oracle database instance" which can include tasks such as "Instance up/down", "Listener up/down", "Start instance when down", "Start listener when down", etc. Inside aprofile
, you define a list ofpackages
that you want to deploy for a set of agents and the parameter values to use for the packageconfigurations
. The system can then generate the corresponding task configuration files andconfiguration index
configuration index
, orindex
Represents the desired configuration state of the monitored environment. This is derived by the system from the configuredprofiles
. Polaris automatically calculates and applies the deltas between the current monitoring state and the desired state in the index. The deltas are the collector tasks or actions that must be deployed, un-deployed or modified on each agent
# Workflow Overview
# Define Monitoring Processes
This is the entry point to the monitoring automation system. SMEs create and maintain reusable monitoring processes in terms of configuration
and package
files which are maintained in one or more git
repositories. Operators, instantiate monitoring processes through profile
files which reference the required packages
and provide values for configuration parameters. profile
files are also maintained in a git
repository. When profiles
are created or modified, or when the referenced packages
and configurations
are modified, operators simply have to run a single command through the Polaris CLI to create or update all the relevant configuration files and calculate the new desired monitoring state (index
file).
As shown in the figure above, the steps in this workflow are:
- Push
packages
andconfigurations
changes togit
- Push
profile
changes togit
- Run the configuration generator and review resulting configurations
- Push configurations and
index
file togit
# Monitoring State Synchronization
This is an automated workflow where Polaris is continuously watching a monitoring site's git
repo for changes to the monitoring index
file. When changes are detected, the system calculates the delta between the desired and current monitoring states and applies all the required changes to the appropriate agents by creating new tasks and updating or removing existing tasks. Operators can use the Polaris CLI to query the monitoring state before, during and after the synchronization process. Additionally, an operator can request Polaris to synchronize the monitoring state at any time if it diverges from the desired state for any reason.
As shown in the figure above, the steps in this workflow are:
- Operator pushes
configurations
andindex
to git - Polaris picks up changes and calculates the monitoring state delta
- Polaris applies delta to agents
- Operator queries monitoring state
- Operator requests monitoring state synchronization
# Monitoring Process Asset Organization
As described above, monitoring processes are composed from files which are maintained in git
repositories. Each monitoring process component (profile
, package
, template
) is defined using yaml
in it's own file, organized in file hierarchies under predefined directory paths:
autoconf/profiles/
, root directory for profile filesautoconf/packages/
, root directory for package filesautoconf/templates/
, root directory for parameterized configuration files
An example configuration hierarchy might look like this:
.
├── packages
│ └── oracle
│ ├── db
│ │ ├── backups.yaml # templates/oracle/db/rman.yaml
│ │ └── base.yaml # templates/oracle/db/* excluding rman.yaml
│ └── wls
│ └── base.yaml # templates/oracle/wls/*
├── profiles
│ ├── oracle-dbs.yaml # all oracle database monitoring processes
│ └── weblogic-servers.yaml # all weblogic monitoring processes
└── templates
└── oracle
├── db
│ ├── alloc-tablespace.yaml # tablespace allocation actions
│ ├── db-status.yaml # database up/down
│ ├── listener-status.yaml # listener up/down
│ ├── logmon-alertlog.yaml # alert logs
│ ├── rman.yaml # backup status
│ ├── stats.yaml # basic stats from v$ views
│ └── ts-status.yaml # tablespace utilization stats
└── wls
├── diag-log.yaml # server diagnostic logs
├── domain-health.yaml # domain health probe
└── server-log.yaml # server logs
Profiles are specific to each monitoring site, so they are maintained in the Polaris git
repository for each site. Packages and templates can be shared by many sites, or they can be specific to one site. As such, Polaris can use package and template assets from multiple git
repositories. All you need to do, is clone the required git
repositories and point the Polaris generator to the asset paths in each repository using an autoconf.yaml
configuration:
git clone https://github.com/arisant/ms-polaris.git /home/me/projects/ms-polaris
git clone http://some-site.com/repos/polaris.git /home/me/projects/some-site
# one or more package and template repositories
pkg_repos:
- /home/me/projects/ms-polaris/autoconf
- /home/me/projects/some-site/autoconf
# one profile repository
profile_repo: /home/me/projects/some-site/autoconf/autoconf
# one repository for the generated assets
dest_repo: /home/me/projects/some-site/autoconf/assets
# Templates
Templates define reusable collector tasks or actions which are parameterized via golang
text templates enhanced with sprig
functions.
Each template is uniquely identified by a template_name
field. Polaris can validate the parameters provided to text templates during instantiation through a schema defined in the parameters
field for the template:
Property | Type | Required | Description |
---|---|---|---|
required | []string | No | the names of the required template parameters |
properties | map[string]object | No | describes each template parameter. Keys are the parameter names and values are the parameter descriptors |
Parameters are described according to their type:
Property | Type | Required | Description |
---|---|---|---|
type | string | Yes | one of string , object , array |
description | string | No | Human readable description |
items | object | describes the contents of an array property | |
properties | object | describes the contents of an object property |
Quoting template placehoder values
You should quote template placeholders {{}}
in yaml
values only when they evaluate to a string that contains yaml
indicator characters:
{
, }
, [
, ]
, &
, *
, #
, ?
, |
, -
, <
, >
, =
, !
, %
, @
, :
, `
, ,
For example
# quote {{.msg}} value because .msg may contain yaml spacial characters
msg: "{{.msg}}"
# Collector Task Templates
A collector task template defines the runtime parameters for a polaris collector task using the following properties:
Property | Type | Required | Description |
---|---|---|---|
task_name | string | Yes | task name which can be templated |
task_plugin | string | Yes | plugin for this task |
task_sched | string | Yes | schedule as a duration or cron expression |
task_conf_path | string | Yes | path where to store the generated configuration file for this task. Can be templated |
The plugin configuration for the collector task, is a named nested text template defined inside the collector task text template. It is declared with a define
and end
template action with name task
. This allows you to define any valid yaml
for the plugin configuration within this section that can be understood and validated by your editor and you.
template_name: <unique template identifier>
parameters:
task_name: <templated task name>
task_plugin: <plugin name>
task_sched: <schedule>
task_conf_path: <templated config path>
# nested collector plugin configuration template declaration
# {{define "task"}}
# any valid yaml plugin templated configuration goes in here
# {{end}}
Below is an example configuration for a collector task to monitor a Weblogic domain's health. It uses the restful-metrics
plugin to hit the WLS management/tenant-monitoring/servers
admin server URL. This template can be applied to any weblogic domain by setting the appropriate values for each of the parameters.
template_name: WLS domain health monitor
parameters:
required: [host, domain, admin_url, admin_user, admin_password]
properties:
domain:
type: string
description: Weblogic domain to monitor
admin_url:
type: string
description: Weblogic domain admin URL
admin_password:
type: string
description: Weblogic domain admin auth password (a secret)
admin_user:
type: string
description: Weblogic domain auth username
host:
type: string
description: hostname of agent running this task
task_sched: 5m
task_plugin: restful-metrics
task_name: "WLS domain health status ({{.domain}})"
task_conf_path: "{{.host}}/wls/{{.domain}}/health.yaml"
# {{define "task"}}
outputs:
- id: "health check/{{.domain}}"
url: "{{.admin_url}}/management/tenant-monitoring/servers"
method: GET
username: "{{.admin_user}}"
password: "{{.admin_password}}"
data_path: body.items
headers:
"Accept": "application/json"
timeout: 10s
source: "{{.host}}"
response_type: json
metric:
name: "wls/health"
value:
path: health
dimensions:
server:
path: name
# {{end}}
When applied with parameter values
host: idp.prd.site.com
domain: identity_provider
admin_url: https://idp.prd.site.com:7001/wls/admin
admin_user: weblogic
admin_password: ${secret_password}
...it will create collector task "WLS domain health status (identity_provider)" running every 5m with its configuration in file idp.prd.site.com/wls/identity_provider/health.yaml
, with content:
outputs:
- id: "health check/identity_provider"
url: "https://idp.prd.site.com:7001/wls/admin/management/tenant-monitoring/servers"
method: GET
username: "weblogic"
password: "${secret_password}"
data_path: body.items
headers:
"Accept": "application/json"
timeout: 10s
source: "idp.prd.site.com"
response_type: json
metric:
name: "wls/health"
value:
path: health
dimensions:
server:
path: name
# Action Templates
An action template defines the runtime parameters for a polaris action using the following properties:
Property | Type | Required | Description |
---|---|---|---|
plugin | string | Yes | plugin for this action |
conf_path | string | Yes | path where to store the generated configuration file for this action. Can be templated |
The plugin configuration for the action, is a named nested text template defined inside the action text template. It is declared with a define
and end
template action with name task
, just like for collector tasks.
template_name: <unique template identifier>
parameters:
plugin: <plugin name>
conf_path: <templated config path>
# nested action plugin configuration template declaration
# {{define "task"}}
# any valid yaml plugin templated configuration goes in here
# {{end}}
Below is an example configuration for an action template which allocates storage to depleted Oracle tablespaces
template_name: Oracle Tablespace Allocator
parameters:
required: [instances, host]
properties:
host:
type: string
description: hostname for agent running this action
instances:
type: array
description: the oracle instances to allocate storage for
items:
type: object
required: [sid, oracle_home]
properties:
sid:
type: string
description: oracle instance name
oracle_home:
type: string
description: oracle home directory
ts_auto_alloc:
type: object
description: auto allocation parameters
properties:
allowed_tbs:
type: string
description: "regexp for tablespace names that auto allocation is allowed (default: all)"
plugin: ora-ts-alloc
conf_path: "{{.host}}/alloc_tablespace.yaml"
# {{define "task"}}
timeout: 2m
allocators:
# {{- range .instances}}
# {{- if .ts_auto_alloc}}
- instance_name: {{.sid}}
oracle_home: {{.oracle_home}}
oracle_sid: {{.sid}}
oracle_logon: {{default "/ as sysdba" .oracle_logon}}
suppress_alerts: false
# UNDO tablespaces ignored by default
allowed_tablespaces: {{.ts_auto_alloc.allowed_tbs}}
default_big_file_policy:
autoextend_max_size: 5120g
default_small_file_policy:
omf: true
initial_size: 4g
tablespace_free_limit: 16g
# {{- end}}
# {{- end}}
# {{end}}
When applied with parameter values
host: ora.db.site.com
instances:
- sid: ORCL_BILLING
oracle_home: /opt/oracle/product/19c/dbhome_1
ts_auto_alloc:
allowed_tbs: ts =~ (DATA_.+)|(INDEX_.+)
- sid: ORCL_DW
oracle_home: /opt/oracle/product/19c/dbhome_2
oracle_logon: / as sysoper
ts_auto_alloc:
...it will create the action configuration in file ora.db.site.com/alloc_tablespace.yaml
, with content:
timeout: 2m
allocators:
- instance_name: ORCL_BILLING
oracle_home: /opt/oracle/product/19c/dbhome_1
oracle_sid: ORCL_BILLING
oracle_logon: "/ as sysdba"
suppress_alerts: false
# UNDO tablespaces ignored by default
allowed_tablespaces: ts =~ (DATA_.+)|(INDEX_.+)
default_big_file_policy:
autoextend_max_size: 5120g
default_small_file_policy:
omf: true
initial_size: 4g
tablespace_free_limit: 16g
- instance_name: ORCL_DW
oracle_home: /opt/oracle/product/19c/dbhome_2
oracle_sid: ORCL_DW
oracle_logon: "/ as sysoper"
suppress_alerts: false
# UNDO tablespaces ignored by default
allowed_tablespaces:
default_big_file_policy:
autoextend_max_size: 5120g
default_small_file_policy:
omf: true
initial_size: 4g
tablespace_free_limit: 16g
# Packages
A package
file composes a monitoring process by using individual collector and action template
files. Profiles can then reference packages instead of individual template, making the configuration simpler to understand.
Property | Type | Required | Description |
---|---|---|---|
name | string | Yes | name of the package |
collectors | []object | No | collectors participating in this package |
actions | []object | No | actions participating in this package |
The parameters for each template are exposed by the package to profiles
under a namespace - multiple templates in a package can share the same namespace. This template parameter namespacing can resolve naming conflicts between parameters from different packages or templates, and more importantly controls how collector tasks and actions are generated by Polaris. For collectors, the parameter namespace is expected to be an array and the generator will create one task for each array element. For actions, the parameter namespace is expected to be an object and the generator will always create a single action from the contents of the namespace.
The collectors
and actions
list members are defined as follows:
Property | Type | Required | Description |
---|---|---|---|
name | string | Yes | collector|action name |
template | string | Yes | path to collector|action template file |
template_params | string | Yes | parameter namespace in profile configuration to read template parameter values from |
when | string | No | boolean expression, from template parameters, which determines if the collector|action will be applied |
Collector Task and Action generation
- Collectors:
template_params
is expected to be an array and Polaris will create one task for each array element - Actions:
template_params
is expected to be an object and Polaris will always create a single action per agent
Below is an example configuration for an Oracle database basic monitoring process
name: basic oracle database monitoring
collectors:
- name: alert log collector
template: oracle/db/logmon-alertlog.yaml
template_params: instances
- name: db-status
template: oracle/db/db-status.yaml
template_params: instances
- name: tablespace status
template: oracle/db/ts-status.yaml
template_params: instances
- name: listener status
template: oracle/db/listener-status.yaml
template_params: listeners
- name: db stats
template: oracle/db/stats.yaml
template_params: dsns
actions:
- name: auto allocate storage to depleted tablespaces
template: oracle/db/alloc-tablespace.yaml
template_params: instances
when: ts_auto_alloc
# Profiles
Profiles define which package are applied to what agents and with what parameters:
Property | Type | Required | Description |
---|---|---|---|
name | string | Yes | profile name |
packages | []string | Yes | the configuration packages to apply to this profile |
task_opts | map[string]object | No | optional overrides for specific tasks by name for all agents in profile |
inject | map[string]object | No | injects parameters to all agents |
defaults | map[string]object | No | default values to apply per configuration parameter |
agents | map[string]object | Yes | agents to apply package with per agent parameters |
# Task Options
Overrides default configuration for collector schedule. Configurations are identified by name from package definition:
Property | Type | Required | Description |
---|---|---|---|
schedule | string | Yes | the schedule override |
# Agents
Define the agents to apply configuration packages to as keys to a map of tasks options and parameters. The root parameter names must match the template_params
configuration in the corresponding packages.
Property | Type | Required | Description |
---|---|---|---|
task_opts | object | No | optional overrides for specific tasks by name for agent |
parameters | map[string]object | Yes | package parameters to template parameters |
The example configuration below applies the Oracle database backup and basic monitoring packages to 3 agents:
# monitors oracle instances and rman backup statuses
name: oracle databases
packages:
- oracle/db/base.yaml
- oracle/db/backups.yaml
# default task overrides
task_opts:
# sampling rate for rman stats to 12h instead of default 1h
"rman stats":
schedule: 12h
# these defaults will be applied when absent from agent paramters
defaults:
# alert log collector, db-status, tablespace status collector defaults
instances:
tns_admin: ""
oracle_home: /opt/oracle/product/19c/dbhome_1
oracle_logon: "/ as sysdba"
ts_limits:
default_free: 15%
# listener status defaults
listeners:
oracle_home: /opt/oracle/product/19c/dbhome_1
agents:
# billing agent monitoring processes
billing:
# we want rman stats to be collected at 2h intervals for all databases monitored by billing agent
task_opts:
"rman stats":
schedule: 2h
parameters:
# one db stats collector for each DSN
dsns:
- dsn: billing.prd.example.com
# one log collector, db-status, tablespace status collector for each sid
# one auto allocate storage to depleted tablespaces action for all sids
instances:
- sid: billing
oracle_home: /opt/oracle/product/19c/dbhome_2
alert_log: /opt/oracle/diag/rdbms/billing/billing/trace/alert_billing.log
ts_auto_alloc:
allowed_tbs: ts !~ "USERS"
- sid: billing-dr
alert_log: /opt/oracle/diag/rdbms/billing/billing-dr/trace/alert_billing-dr.log
# one listener status collector for each listener
listeners:
- name: LSRN
services:
- svc1
- svc2
# dev data warehouse agent monitoring processes
dwhd:
parameters:
instances:
- sid: dwdev
alert_log: /opt/oracle/diag/rdbms/dwdev/dwdev/trace/alert_dwdev.log
ts_limits:
default_free: 20%
listeners:
- name: LSRN
services:
- svc1
# production data warehouse agent monitoring processes
dwhp:
parameters:
dsns:
- dsn: dwprd.prd.example.com
- dsn: dwdev.dev.example.com
instances:
- sid: dwprd
ts_auto_alloc:
allowed_tbs: ts !~ "USERS"
alert_log: /opt/oracle/diag/rdbms/dwprd/dwprd/trace/alert_dwprd.log
listeners:
- name: LSRN
services:
- svc1
- svc2
# Working with the polaris CLI
Clone the relevant git repos and use an autoconf.yaml
configuration to point the generator to the package repositories, profile repository and destination repository paths:
git clone https://github.com/arisant/ms-polaris.git /home/me/projects/ms-polaris
git clone http://some-site.com/repos/polaris.git /home/me/projects/some-site
# one or more package and template repositories
pkg_repos:
- /home/me/projects/ms-polaris/autoconf
- /home/me/projects/some-site/autoconf
# one profile repository
profile_repo: /home/me/projects/some-site/autoconf/autoconf
# one repository for the generated assets
dest_repo: /home/me/projects/some-site/autoconf/assets
# Generate Monitoring Processes
Simply run the generator CLI against the profiles you created:
myrmex autoconf generate /home/me/projects/autoconf.yaml
# Synchronization
Once you verified the generated configurations, push them to git. Polaris will pick up the new configurations and apply them to the corresponding agents. To check on the synchronization state:
myrmex autoconf state
If at any point you need to manually synchronize the current configuration state:
myrmex autoconf sync