Commit Graph

6 Commits

Author SHA1 Message Date
Matt Jordan 0bb38796b7 res_prometheus: Add metrics for PJSIP outbound registrations
When monitoring Asterisk instances, it's often useful to know when an
outbound registration fails, as this often maps to the notion of a trunk
and having a trunk fail is usually a "bad thing". As such, this patch
adds monitoring metrics that track the state of PJSIP outbound registrations.
It does this by looking for the Registry events coming across the Stasis
system topic, and publishing those as metrics to Prometheus. Note that
while this may support other outbound registration types (IAX2, SIP, etc.)
those haven't been tested. Your mileage may vary.

(And why are you still using IAX2 and SIP? It's 2019 folks. Get with the
program.)

This patch also adds Sorcery observers to handle modifications to the
underlying PJSIP outbound registration objects. This is useful when a
reload is triggered that modifies the properties of an outbound registration,
or when ARI push configuration is used and an object is updated or
deleted. Because we rely on properties of the registration object to
define the metric (label key/value pairs), we delete the relevant metric when
we notice that something has changed and wait for a new Stasis message to
arrive to re-create the metric.

ASTERISK-28403

Change-Id: If01420e38530fc20b6dd4aa15cd281d94cd2b87e
2019-05-22 08:25:19 -05:00
Matt Jordan a2648b22eb res_prometheus: Add CLI commands
This patch adds a few CLI commands to the res_prometheus module to aid
system administrators setting up and configuring the module. This includes:

* prometheus show status: Display basic statistics about the Prometheus
  module, including its essential configuration, when it was last scraped,
  and how long the scrape took. The last two bits of information are useful
  when Prometheus isn't generating metrics appropriately, as it will at
  least tell you if Asterisk has had its HTTP route hit by the remote
  server.

* prometheus show metrics: Dump the current metrics to the CLI. Useful for
  system administrators to see what metrics are currently available without
  having to cURL or go to Prometheus itself.

ASTERISK-28403

Change-Id: Ic09813e5e14b901571c5c96ebeae2a02566c5172
2019-05-22 08:24:39 -05:00
Matt Jordan 066280f0cc res_prometheus: Add Asterisk bridge metrics
This patch adds basic Asterisk bridge statistics to the res_prometheus
module. This includes:

* asterisk_bridges_count: The current number of bridges active on the
  system.

* asterisk_bridges_channels_count: The number of channels active in a
  bridge.

In all cases, enough information is provided with each bridge metric
to determine a unique instance of Asterisk that provided the data, along
with the technology, subclass, and creator of the bridge.

ASTERISK-28403

Change-Id: Ie27417dd72c5bc7624eb2a7a6a8829d7551788dc
2019-05-21 21:43:02 -05:00
Matt Jordan ed6cd13b5b res_prometheus: Add Asterisk endpoint metrics
This patch adds basic Asterisk endpoint statistics to the res_prometheus
module. This includes:

* asterisk_endpoints_state: The current state (unknown, online, offline)
  for each defined endpoint.

* asterisk_endpoints_channels_count: The current number of channels
  associated with a given endpoint.

* asterisk_endpoints_count: The current number of defined endpoints.

In all cases, enough information is provided with each endpoint metric
to determine a unique instance of Asterisk that provided the data, as well
as the underlying technology and resource definition.

ASTERISK-28403

Change-Id: I46443963330c206a7d12722d08dcaabef672310e
2019-05-21 20:47:50 -05:00
Matt Jordan 0760af71ad res_prometheus: Add Asterisk channel metrics
This patch adds basic Asterisk channel statistics to the res_prometheus
module. This includes:

* asterisk_calls_sum: A running sum of the total number of
  processed calls

* asterisk_calls_count: The current number of calls

* asterisk_channels_count: The current number of channels

* asterisk_channels_state: The state of any particular channel

* asterisk_channels_duration_seconds: How long a channel has existed,
  in seconds

In all cases, enough information is provided with each channel metric
to determine a unique instance of Asterisk that provided the data, as
well as the name, type, unique ID, and - if present - linked ID of each
channel.

ASTERISK-28403

Change-Id: I0db306ec94205d4f58d1e7fbabfe04b185869f59
2019-05-21 11:03:13 -05:00
Matt Jordan c50f29dfad Add core Prometheus support to Asterisk
Prometheus is the defacto monitoring tool for containerized applications.
This patch adds native support to Asterisk for serving up Prometheus
compatible metrics, such that a Prometheus server can scrape an Asterisk
instance in the same fashion as it does other HTTP services.

The core module in this patch provides an API that future work can build
on top of. The API manages metrics in one of two ways:
(1) Registered metrics. In this particular case, the API assumes that
    the metric (either allocated on the stack or on the heap) will have
    its value updated by the module registering it at will, and not
    just when Prometheus scrapes Asterisk. When a scrape does occur,
    the metrics are locked so that the current value can be retrieved.
(2) Scrape callbacks. In this case, the API allows consumers to be
    called via a callback function when a Prometheus initiated scrape
    occurs. The consumers of the API are responsible for populating
    the response to Prometheus themselves, typically using stack
    allocated metrics that are then formatted properly into strings
    via this module's convenience functions.

These two mechanisms balance the different ways in which information is
generated within Asterisk: some information is generated in a fashion
that makes it appropriate to update the relevant metrics immediately;
some information is better to defer until a Prometheus server asks for
it.

Note that some care has been taken in how metrics are defined to
minimize the impact on performance. Prometheus's metric definition
and its support for nesting metrics based on labels - which are
effectively key/value pairs - can make storage and managing of metrics
somewhat tricky. While a naive approach, where we allow for any number
of labels and perform a lot of heap allocations to manage the information,
would absolutely have worked, this patch instead opts to try to place
as much information in length limited arrays, stack allocations, and
vectors to minimize the performance impacts of scrapes. The author of
this patch has worked on enough systems that were driven to their knees
by poor monitoring implementations to be a bit cautious.

Additionally, this patch only adds support for gauges and counters.
Additional work to add summaries, histograms, and other Prometheus
metric types may add value in the future. This would be of particular
interest if someone wanted to track SIP response types.

Finally, this patch includes unit tests for the core APIs.

ASTERISK-28403

Change-Id: I891433a272c92fd11c705a2c36d65479a415ec42
2019-05-20 20:33:58 -05:00