IBM Z HMC Prometheus Exporter

Introduction

What this package provides

The IBM Z HMC Prometheus Exporter is a Prometheus exporter written in Python that retrieves metrics from the IBM Z Hardware Management Console (HMC) and exports them to the Prometheus monitoring system.

Supported environments

  • Operating systems: Linux, macOS, Windows
  • Python versions: 3.4 and higher
  • HMC versions: 2.11.1 and higher

Quickstart

  • Install the exporter and all of its Python dependencies as follows:

    $ pip install zhmc-prometheus-exporter
    
  • Provide an HMC credentials file for use by the exporter.

    The HMC credentials file tells the exporter which HMC to talk to for obtaining metrics, and which userid and password to use for logging on to the HMC.

    Download the Sample HMC credentials file as hmccreds.yaml and edit that copy accordingly.

    For details, see HMC credentials file.

  • Provide a metric definition file for use by the exporter.

    The metric definition file maps the metrics returned by the HMC to metrics exported to Prometheus.

    Furthermore, the metric definition file allows optimizing the access time to the HMC by disabling the fetching of metrics that are not needed.

    Download the Sample metric definition file as metrics.yaml. It can be used as it is and will have all metrics enabled and mapped properly. You only need to edit the file if you want to adjust the metric names, labels, or metric descriptions, or if you want to optimize access time by disabling metrics not needed.

    For details, see Metric definition file.

  • Run the exporter as follows:

    $ zhmc_prometheus_exporter -c hmccreds.yaml -m metrics.yaml
    
  • Direct your web browser at http://localhost:9291 to see the exported Prometheus metrics (depending on the number of CPCs managed by your HMC, and dependent on how many metrics are enabled, this may take a moment).

Reporting issues

If you encounter a problem, please report it as an issue on GitHub.

License

This package is licensed under the Apache 2.0 License.

Usage

This section describes how to use the exporter beyond the quick introduction in Quickstart.

Running on a system

If you want to run the exporter on some system (e.g. on your workstation for trying it out), it is recommended to use a virtual Python environment.

With the virtual Python environment active, follow the steps in Quickstart to install, establish the required files, and to run the exporter.

Running in a Docker container

If you want to run the exporter in a Docker container you can create the container as follows, using the Dockerfile provided in the Git repository.

  • Clone the Git repository of the exporter and switch to the clone’s root directory:

    $ git clone https://github.com/zhmcclient/zhmc-prometheus-exporter
    $ cd zhmc-prometheus-exporter
    
  • Provide an HMC credentials file named hmccreds.yaml in the clone’s root directory, as described in Quickstart. You can copy it from the examples directory.

  • Provide a metric definition file named metrics.yaml in the clone’s root directory, as described in Quickstart. You can copy it from the examples directory.

  • Build the container as follows:

    $ docker build . -t zhmcexporter
    
  • Run the container as follows:

    $ docker run -p 9291:9291 zhmcexporter
    

zhmc_prometheus_exporter command

The zhmc_prometheus_exporter command supports the following arguments:

zhmc_prometheus_exporter [-p PORT] [-c CREDS_FILE] [-m METRICS_FILE] [-h]
                         [--help-creds] [--help-metrics]

IBM Z HMC Exporter - a Prometheus exporter for metrics from the IBM Z HMC

optional arguments:

  -p PORT          port for exporting.
                   Default: 9291

  -c CREDS_FILE    path name of HMC credentials file.
                   Use --help-creds for details.
                   Default: /etc/zhmc-prometheus-exporter/hmccreds.yaml

  -m METRICS_FILE  path name of metric definition file.
                   Use --help-metrics for details.
                   Default: /etc/zhmc-prometheus-exporter/metrics.yaml

  -h, --help       show this help message and exit

  --help-creds     show help for HMC credentials file and exit

  --help-metrics   show help for metric definition file and exit

Exported metric concepts

The exporter provides its metrics in the Prometheus text-based format.

All metrics are of the metric type gauge and follow the Prometheus metric naming. The names of the metrics are defined in the Metric definition file. The metric names could be changed by users, but unless there is a strong reason for doing that, it is not recommended. It is recommended to use the Sample metric definition file unchanged. The metrics mapping in the Sample metric definition file is referred to as the standard metric definition in this documentation.

In the standard metric definition, the metric names are structured as follows:

zhmc_{resource-type}_{metric}_{unit}

Where:

  • {resource-type} is a short lower case term for the type of resource the metric applies to, for example cpc or partition.
  • {metric} is a unique name of the metric within the resource type, for example processor.
  • {unit} is the (simple or complex) unit of measurement of the metric value. For example, a usage percentage will usually have a unit of usage_ratio, while a temperature would have a unit of celsius.

Each metric value applies to a particular instance of a resource. In a particular set of exported metrics, there are usually metrics for multiple resource instances. For example, the HMC can manage multiple CPCs, a CPC can have multiple partitions, and so on. In the exported metrics, the resource instance is identified using one or more Prometheus labels. Where possible, the labels identify the resource instances in a hierarchical way from the CPC on down to the resource to which the metric value applies. For example, a metric for a partition will have labels cpc and partition whose values are the names of CPC and partition, respectively.

Example for the representation of metric values that are the IFL processor usage percentages of two partitions in a single CPC:

# HELP zhmc_partition_ifl_processor_usage_ratio Usage ratio across all IFL processors of the partition
# TYPE zhmc_partition_ifl_processor_usage_ratio gauge
zhmc_partition_ifl_processor_usage_ratio{cpc='CPCA',partition='PART1'} 0.42
zhmc_partition_ifl_processor_usage_ratio{cpc='CPCA',partition='PART2'} 0.07

Available metrics

The exporter code is agnostic to the actual set of metrics supported by the HMC. A new metric can immediately be supported by just adding it to the Metric definition file.

The Sample metric definition file in the Git repository states in its header up to which HMC version or Z machine generation the metrics are defined.

The following table shows the mapping between HMC metric groups and exported Prometheus metrics in the standard metric definition. Note that ensemble and zBX related metrics are not covered in the standard metric definition (support for them has been removed in z15). For more details on the HMC metrics, see section “Metric Groups” in the HMC API book.

HMC Metric Group Mode Prometheus Metrics Prometheus Labels
cpc-usage-overview C zhmc_cpc_* cpc
logical-partition-usage C zhmc_partition_* cpc, partition
channel-usage C zhmc_channel_* cpc, channel_css_chpid
crypto-usage C zhmc_crypto_adapter_* cpc, adapter_pchid
flash-memory-usage C zhmc_flash_memory_adapter_* cpc, adapter_pchid
roce-usage C zhmc_roce_adapter_* cpc, adapter_pchid
dpm-system-usage-overview D zhmc_cpc_* cpc
partition-usage D zhmc_partition_* cpc, partition
adapter-usage D zhmc_adapter_* cpc, adapter
network-physical-adapter-port D zhmc_port_* cpc, adapter, port
partition-attached-network-interface D zhmc_nic_* cpc, partition, nic
zcpc-environmentals-and-power C+D zhmc_cpc_* cpc
environmental-power-status C+D zhmc_cpc_* cpc
zcpc-processor-usage C+D zhmc_processor_* cpc, processor

Legend:

  • Mode: The operational mode of the CPC: C=Classic, D=DPM

As you can see, the zhmc_cpc_* and zhmc_partition_* metrics are used for both DPM mode and classic mode. The names of the metrics are equal if and only if they have the same meaning in both modes.

The following table shows the Prometheus metrics in the standard metric definition:

Prometheus Metric Mode Description
zhmc_cpc_processor_usage_ratio C+D Usage ratio across all processors of the CPC
zhmc_cpc_shared_processor_usage_ratio C+D Usage ratio across all shared processors of the CPC
zhmc_cpc_dedicated_processor_usage_ratio C Usage ratio across all dedicated processors of the CPC
zhmc_cpc_cp_processor_usage_ratio C+D Usage ratio across all CP processors of the CPC
zhmc_cpc_cp_shared_processor_usage_ratio C+D Usage ratio across all shared CP processors of the CPC
zhmc_cpc_cp_dedicated_processor_usage_ratio C Usage ratio across all dedicated CP processors of the CPC
zhmc_cpc_ifl_processor_usage_ratio C+D Usage ratio across all IFL processors of the CPC
zhmc_cpc_ifl_shared_processor_usage_ratio C+D Usage ratio across all shared IFL processors of the CPC
zhmc_cpc_ifl_dedicated_processor_usage_ratio C Usage ratio across all dedicated IFL processors of the CPC
zhmc_cpc_aap_shared_processor_usage_ratio C Usage ratio across all shared zAAP processors of the CPC
zhmc_cpc_aap_dedicated_processor_usage_ratio C Usage ratio across all dedicated zAAP processors of the CPC
zhmc_cpc_cbp_processor_usage_ratio C Usage ratio across all CBP processors of the CPC
zhmc_cpc_cbp_shared_processor_usage_ratio C Usage ratio across all shared CBP processors of the CPC
zhmc_cpc_cbp_dedicated_processor_usage_ratio C Usage ratio across all dedicated CBP processors of the CPC
zhmc_cpc_icf_processor_usage_ratio C Usage ratio across all ICF processors of the CPC
zhmc_cpc_icf_shared_processor_usage_ratio C Usage ratio across all shared ICF processors of the CPC
zhmc_cpc_icf_dedicated_processor_usage_ratio C Usage ratio across all dedicated ICF processors of the CPC
zhmc_cpc_iip_processor_usage_ratio C Usage ratio across all zIIP processors of the CPC
zhmc_cpc_iip_shared_processor_usage_ratio C Usage ratio across all shared zIIP processors of the CPC
zhmc_cpc_iip_dedicated_processor_usage_ratio C Usage ratio across all dedicated zIIP processors of the CPC
zhmc_cpc_channel_usage_ratio C Usage ratio across all channels of the CPC
zhmc_cpc_accelerator_adapter_usage_ratio D Usage ratio across all accelerator adapters of the CPC
zhmc_cpc_crypto_adapter_usage_ratio D Usage ratio across all crypto adapters of the CPC
zhmc_cpc_network_adapter_usage_ratio D Usage ratio across all network adapters of the CPC
zhmc_cpc_storage_adapter_usage_ratio D Usage ratio across all storage adapters of the CPC
zhmc_cpc_power_watts C+D Power consumption of the CPC
zhmc_cpc_ambient_temperature_celsius C+D Ambient temperature of the CPC
zhmc_crypto_adapter_usage_ratio C Usage ratio of the crypto adapter
zhmc_flash_memory_adapter_usage_ratio C Usage ratio of the flash memory adapter
zhmc_adapter_usage_ratio D Usage ratio of the adapter
zhmc_channel_usage_ratio C Usage ratio of the channel
zhmc_roce_adapter_usage_ratio C Usage ratio of the RoCE adapter
zhmc_partition_processor_usage_ratio C+D Usage ratio across all processors of the partition
zhmc_partition_cp_processor_usage_ratio C Usage ratio across all CP processors of the partition
zhmc_partition_ifl_processor_usage_ratio C Usage ratio across all IFL processors of the partition
zhmc_partition_icf_processor_usage_ratio C Usage ratio across all ICF processors of the partition
zhmc_partition_cbp_processor_usage_ratio C Usage ratio across all CBP processors of the partition
zhmc_partition_iip_processor_usage_ratio C Usage ratio across all IIP processors of the partition
zhmc_partition_accelerator_adapter_usage_ratio D Usage ratio of all accelerator adapters of the partition
zhmc_partition_crypto_adapter_usage_ratio D Usage ratio of all crypto adapters of the partition
zhmc_partition_network_adapter_usage_ratio D Usage ratio of all network adapters of the partition
zhmc_partition_storage_adapter_usage_ratio D Usage ratio of all storage adapters of the partition
zhmc_partition_zvm_paging_rate_pages_per_second C z/VM paging rate in pages/sec
zhmc_port_bytes_sent_count D Number of Bytes in unicast packets that were sent
zhmc_port_bytes_received_count D Number of Bytes in unicast packets that were received
zhmc_port_packets_sent_count D Number of unicast packets that were sent
zhmc_port_packets_received_count D Number of unicast packets that were received
zhmc_port_packets_sent_dropped_count D Number of sent packets that were dropped (resource shortage)
zhmc_port_packets_received_dropped_count D Number of received packets that were dropped (resource shortage)
zhmc_port_packets_sent_discarded_count D Number of sent packets that were discarded (malformed)
zhmc_port_packets_received_discarded_count D Number of received packets that were discarded (malformed)
zhmc_port_multicast_packets_sent_count D Number of multicast packets sent
zhmc_port_multicast_packets_received_count D Number of multicast packets received
zhmc_port_broadcast_packets_sent_count D Number of broadcast packets sent
zhmc_port_broadcast_packets_received_count D Number of broadcast packets received
zhmc_port_data_sent_bytes D Amount of data sent over the collection interval
zhmc_port_data_received_bytes D Amount of data received over the collection interval
zhmc_port_data_rate_sent_bytes_per_second D Data rate sent over the collection interval
zhmc_port_data_rate_received_bytes_per_second D Data rate received over the collection interval
zhmc_port_bandwidth_usage_ratio D Bandwidth usage ratio of the port
zhmc_nic_bytes_sent_count D Number of Bytes in unicast packets that were sent
zhmc_nic_bytes_received_count D Number of Bytes in unicast packets that were received
zhmc_nic_packets_sent_count D Number of unicast packets that were sent
zhmc_nic_packets_received_count D Number of unicast packets that were received
zhmc_nic_packets_sent_dropped_count D Number of sent packets that were dropped (resource shortage)
zhmc_nic_packets_received_dropped_count D Number of received packets that were dropped (resource shortage)
zhmc_nic_packets_sent_discarded_count D Number of sent packets that were discarded (malformed)
zhmc_nic_packets_received_discarded_count D Number of received packets that were discarded (malformed)
zhmc_nic_multicast_packets_sent_count D Number of multicast packets sent
zhmc_nic_multicast_packets_received_count D Number of multicast packets received
zhmc_nic_broadcast_packets_sent_count D Number of broadcast packets sent
zhmc_nic_broadcast_packets_received_count D Number of broadcast packets received
zhmc_nic_data_sent_bytes D Amount of data sent over the collection interval
zhmc_nic_data_received_bytes D Amount of data received over the collection interval
zhmc_nic_data_rate_sent_bytes_per_second D Data rate sent over the collection interval
zhmc_nic_data_rate_received_bytes_per_second D Data rate received over the collection interval
zhmc_cpc_humidity_percent C+D Relative humidity
zhmc_cpc_dew_point_celsius C+D Dew point
zhmc_cpc_heat_load_total_btu_per_hour C+D Total heat load of the CPC
zhmc_cpc_heat_load_forced_air_btu_per_hour C+D Heat load of the CPC covered by forced-air
zhmc_cpc_heat_load_water_btu_per_hour C+D Heat load of the CPC covered by water
zhmc_cpc_exhaust_temperature_celsius C+D Exhaust temperature of the CPC
zhmc_cpc_power_cord1_phase_a_watts C+D Power in Phase A of line cord 1 - 0 if not available
zhmc_cpc_power_cord1_phase_b_watts C+D Power in Phase B of line cord 1 - 0 if not available
zhmc_cpc_power_cord1_phase_c_watts C+D Power in Phase C of line cord 1 - 0 if not available
zhmc_cpc_power_cord2_phase_a_watts C+D Power in Phase A of line cord 2 - 0 if not available
zhmc_cpc_power_cord2_phase_b_watts C+D Power in Phase B of line cord 2 - 0 if not available
zhmc_cpc_power_cord2_phase_c_watts C+D Power in Phase C of line cord 2 - 0 if not available
zhmc_cpc_power_cord3_phase_a_watts C+D Power in Phase A of line cord 3 - 0 if not available
zhmc_cpc_power_cord3_phase_b_watts C+D Power in Phase B of line cord 3 - 0 if not available
zhmc_cpc_power_cord3_phase_c_watts C+D Power in Phase C of line cord 3 - 0 if not available
zhmc_cpc_power_cord4_phase_a_watts C+D Power in Phase A of line cord 4 - 0 if not available
zhmc_cpc_power_cord4_phase_b_watts C+D Power in Phase B of line cord 4 - 0 if not available
zhmc_cpc_power_cord4_phase_c_watts C+D Power in Phase C of line cord 4 - 0 if not available
zhmc_cpc_power_cord5_phase_a_watts C+D Power in Phase A of line cord 5 - 0 if not available
zhmc_cpc_power_cord5_phase_b_watts C+D Power in Phase B of line cord 5 - 0 if not available
zhmc_cpc_power_cord5_phase_c_watts C+D Power in Phase C of line cord 5 - 0 if not available
zhmc_cpc_power_cord6_phase_a_watts C+D Power in Phase A of line cord 6 - 0 if not available
zhmc_cpc_power_cord6_phase_b_watts C+D Power in Phase B of line cord 6 - 0 if not available
zhmc_cpc_power_cord6_phase_c_watts C+D Power in Phase C of line cord 6 - 0 if not available
zhmc_cpc_power_cord7_phase_a_watts C+D Power in Phase A of line cord 7 - 0 if not available
zhmc_cpc_power_cord7_phase_b_watts C+D Power in Phase B of line cord 7 - 0 if not available
zhmc_cpc_power_cord7_phase_c_watts C+D Power in Phase C of line cord 7 - 0 if not available
zhmc_cpc_power_cord8_phase_a_watts C+D Power in Phase A of line cord 8 - 0 if not available
zhmc_cpc_power_cord8_phase_b_watts C+D Power in Phase B of line cord 8 - 0 if not available
zhmc_cpc_power_cord8_phase_c_watts C+D Power in Phase C of line cord 8 - 0 if not available
zhmc_processor_usage_ratio C+D Usage ratio of the processor
zhmc_processor_smt_mode_percent C+D Percentage of time the processor was in in SMT mode
zhmc_processor_smt_thread0_usage_ratio C+D Usage ratio of thread 0 of the processor when in SMT mode
zhmc_processor_smt_thread1_usage_ratio C+D Usage ratio of thread 1 of the processor when in SMT mode

HMC credentials file

The HMC credentials file tells the exporter which HMC to talk to for obtaining metrics, and which userid and password to use for logging on to the HMC.

In addition, it allows specifying additional labels to be used in all metrics exported to Prometheus. This can be used for defining labels that identify the environment managed by the HMC, in cases where metrics from multiple instances of exporters and HMCs come together.

The HMC credentials file is in YAML format and has the following structure:

metrics:
  hmc: {hmc-ip-address}
  userid: {hmc-userid}
  password: {hmc-password}

extra_labels:  # optional
  # list of labels:
  - name: {label-name}
    value: {label-value}

Where:

  • {hmc-ip-address} is the IP address of the HMC.
  • {hmc-userid} is the userid on the HMC to be used for logging on.
  • {hmc-password} is the password of that userid.
  • {label-name} is the label name.
  • {label-value} is the label value. The string value is used directly without any further interpretation.

Sample HMC credentials file

The following is a sample HMC credentials file (hmccreds.yaml).

The file can be downloaded from the Git repo as examples/hmccreds.yaml.

# Sample HMC credentials file for the Z HMC Prometheus Exporter.

metrics:
  hmc: 9.10.11.12
  userid: user
  password: password

extra_labels:
  - name: pod
    value: mypod

Metric definition file

The metric definition file maps the metrics returned by the HMC to metrics exported to Prometheus.

Furthermore, the metric definition file allows optimizing the access time to the HMC by disabling the fetching of metrics that are not needed.

The metric definition file is in YAML format and has the following structure:

metric_groups:
  # dictionary of metric groups:
  {hmc-metric-group}:
    prefix: {resource-type}
    fetch: {fetch-bool}
    if: {fetch-condition}  # optional
    labels:
      # list of labels:
      - name: {label-name}
        value: {label-value}

metrics:
  # dictionary of metric groups and metrics
  {hmc-metric-group}:
    {hmc-metric}:
      percent: {percent-bool}
      exporter_name: {metric}_{unit}
      exporter_desc: {help}

Where:

  • {hmc-metric-group} is the name of the metric group on the HMC.

  • {hmc-metric} is the name of the metric (within the metric group) on the HMC.

  • {resource-type} is a short lower case term for the type of resource the metric applies to, for example cpc or partition. It is used in the Prometheus metric name directly after the initial zhmc_.

  • {fetch-bool} is a boolean indicating whether the user wants this metric group to be fetched from the HMC. For the metric group to actually be fetched, the if property, if specified, also needs to evaluate to True.

  • {fetch-condition} is a string that is evaluated as a Python expression and that indicates whether the metric group can be fetched. For the metric group to actually be fetched, the fetch property also needs to be True. The expression may contain the hmc_version variable which evaluates to the HMC version. The HMC versions are evaluated as tuples of integers, padding them to 3 version parts by appending 0 if needed.

  • {label-name} is the label name.

  • {label-value} identifies where the label value is taken from, as follows:

    • resource the name of the resource reported by the HMC for the metric. This is the normal case and also the default.

    • resource.parent the name of the parent resource of the resource reported by the HMC for the metric. This is useful for resources that are inside of the CPC, such as adapters or partitions, to get back to the CPC containing them.

    • resource.parent.parent the name of the grand parent resource of the resource reported by the HMC for the metric. This is useful for resources that are inside of the CPC at the second level, such as NICs or adapter ports, to get back to the CPC containing them.

    • {hmc-metric-name} the name of the HMC metric within the same metric group whose metric value should be used as a label value. This can be used to use accompanying HMC metrics that are actually identifiers for resources, a labels for the actual metric. Example: The HMC returns metrics group channel-usage with metric channel-usage that has the actual value and metric channel-name that identifies the channel to which the metric value belongs. The following fragment utilizes the channel-name metric as a label for the channel-usage metric:

      metric_groups:
        channel-usage:
          prefix: channel
          fetch: True
          labels:
            - name: cpc
              value: resource
            - name: channel_css_chpid
              value: channel-name
      metrics:
        channel-usage:
          channel-usage:
            percent: True
            exporter_name: usage_ratio
            exporter_desc: Usage ratio of the channel
      
  • {percent-bool} is a boolean indicating whether the metric value should be divided by 100. The reason for this is that the HMC metrics represent percentages such that a value of 100 means 100% = 1, while Prometheus represents them such that a value of 1.0 means 100% = 1.

  • {metric}_{unit} is the Prometheus local metric name and unit in the full metric name zhmc_{resource-type}_{metric}_{unit}.

  • {help} is the description text that is exported as # HELP.

Sample metric definition file

The following is a sample metric definition file (metrics.yaml) that defines all metrics as of HMC 2.15 (z15).

The file can be downloaded from the Git repo as examples/metrics.yaml.

# Sample metric definition file for the Z HMC Prometheus Exporter.
# Defines all metrics up to HMC version 2.15.0 (z15), except for ensemble/zBX
# related metrics which are not supported by the Z HMC Prometheus Exporter.

metric_groups:

  # Available for CPCs in classic mode

  cpc-usage-overview:
    prefix: cpc
    fetch: true
    labels:
      - name: cpc
        value: resource

  logical-partition-usage:
    prefix: partition
    fetch: true
    labels:
      - name: cpc
        value: resource.parent
      - name: partition
        value: resource

  channel-usage:
    prefix: channel
    fetch: true
    labels:
      - name: cpc
        value: resource
      - name: channel_css_chpid
        value: channel-name  # format: 'CSS.CHPID'

  crypto-usage:
    prefix: crypto_adapter
    fetch: true
    if: "hmc_version>='2.12.0'"
    labels:
      - name: cpc
        value: resource
      - name: adapter_pchid
        value: channel-id

  flash-memory-usage:
    prefix: flash_memory_adapter
    fetch: true
    if: "hmc_version>='2.12.0'"
    labels:
      - name: cpc
        value: resource
      - name: adapter_pchid
        value: channel-id

  roce-usage:
    prefix: roce_adapter
    fetch: true
    if: "hmc_version>='2.12.1'"
    labels:
      - name: cpc
        value: resource
      - name: adapter_pchid
        value: channel-id

  # Available for CPCs in DPM mode

  dpm-system-usage-overview:
    prefix: cpc
    fetch: true
    if: "hmc_version>='2.13.1'"
    labels:
      - name: cpc
        value: resource

  partition-usage:
    prefix: partition
    fetch: true
    if: "hmc_version>='2.13.1'"
    labels:
      - name: cpc
        value: resource.parent
      - name: partition
        value: resource

  adapter-usage:
    prefix: adapter
    fetch: true
    if: "hmc_version>='2.13.1'"
    labels:
      - name: cpc
        value: resource.parent
      - name: adapter
        value: resource

  network-physical-adapter-port:
    prefix: port
    fetch: true
    if: "hmc_version>='2.13.1'"
    labels:
      - name: cpc
        value: resource.parent
      - name: adapter
        value: resource
      - name: port
        value: network-port-id

  partition-attached-network-interface:
    prefix: nic
    fetch: false  # Takes about 1 minute for the initial processing
    if: "hmc_version>='2.13.1'"
    labels:
      - name: cpc
        value: resource.parent.parent
      - name: partition
        value: resource.parent
      - name: nic
        value: resource

  # Available for CPCs in any mode

  zcpc-environmentals-and-power:
    prefix: cpc
    fetch: true
    labels:
      - name: cpc
        value: resource

  zcpc-processor-usage:
    prefix: processor
    fetch: true
    labels:
      - name: cpc
        value: resource
      - name: processor
        value: processor-name

  environmental-power-status:
    prefix: cpc
    fetch: true
    if: "hmc_version>='2.15.0'"
    labels:
      - name: cpc
        value: resource

metrics:

  # Available for CPCs in classic mode

  cpc-usage-overview:
    cpc-processor-usage:
      percent: true
      exporter_name: processor_usage_ratio
      exporter_desc: Usage ratio across all processors of the CPC
    all-shared-processor-usage:
      percent: true
      exporter_name: shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared processors of the CPC
    all-dedicated-processor-usage:
      percent: true
      exporter_name: dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated processors of the CPC
    cp-all-processor-usage:
      percent: true
      exporter_name: cp_processor_usage_ratio
      exporter_desc: Usage ratio across all CP processors of the CPC
    cp-shared-processor-usage:
      percent: true
      exporter_name: cp_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared CP processors of the CPC
    cp-dedicated-processor-usage:
      percent: true
      exporter_name: cp_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated CP processors of the CPC
    ifl-all-processor-usage:
      percent: true
      exporter_name: ifl_processor_usage_ratio
      exporter_desc: Usage ratio across all IFL processors of the CPC
    ifl-shared-processor-usage:
      percent: true
      exporter_name: ifl_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared IFL processors of the CPC
    ifl-dedicated-processor-usage:
      percent: true
      exporter_name: ifl_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated IFL processors of the CPC
    icf-all-processor-usage:
      percent: true
      exporter_name: icf_processor_usage_ratio
      exporter_desc: Usage ratio across all ICF processors of the CPC
    icf-shared-processor-usage:
      percent: true
      exporter_name: icf_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared ICF processors of the CPC
    icf-dedicated-processor-usage:
      percent: true
      exporter_name: icf_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated ICF processors of the CPC
    iip-all-processor-usage:
      percent: true
      exporter_name: iip_processor_usage_ratio
      exporter_desc: Usage ratio across all zIIP processors of the CPC
    iip-shared-processor-usage:
      percent: true
      exporter_name: iip_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared zIIP processors of the CPC
    iip-dedicated-processor-usage:
      percent: true
      exporter_name: iip_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated zIIP processors of the CPC
    aap-shared-processor-usage:
      percent: true
      exporter_name: aap_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared zAAP processors of the CPC
    aap-dedicated-processor-usage:
      percent: true
      exporter_name: aap_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated zAAP processors of the CPC
    # aap-all-processor-usage does not seem to exist
    cbp-all-processor-usage:
      percent: true
      exporter_name: cbp_processor_usage_ratio
      exporter_desc: Usage ratio across all CBP processors of the CPC
    cbp-shared-processor-usage:
      percent: true
      exporter_name: cbp_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared CBP processors of the CPC
    cbp-dedicated-processor-usage:
      percent: true
      exporter_name: cbp_dedicated_processor_usage_ratio
      exporter_desc: Usage ratio across all dedicated CBP processors of the CPC
    channel-usage:
      percent: true
      exporter_name: channel_usage_ratio
      exporter_desc: Usage ratio across all channels of the CPC
    power-consumption-watts:
      percent: false
      exporter_name: power_watts
      exporter_desc: Power consumption of the CPC
    temperature-celsius:
      percent: false
      exporter_name: ambient_temperature_celsius
      exporter_desc: Ambient temperature of the CPC

  logical-partition-usage:
    processor-usage:
      percent: true
      exporter_name: processor_usage_ratio
      exporter_desc: Usage ratio across all processors of the partition
    cp-processor-usage:
      percent: true
      exporter_name: cp_processor_usage_ratio
      exporter_desc: Usage ratio across all CP processors of the partition
    ifl-processor-usage:
      percent: true
      exporter_name: ifl_processor_usage_ratio
      exporter_desc: Usage ratio across all IFL processors of the partition
    icf-processor-usage:
      percent: true
      exporter_name: icf_processor_usage_ratio
      exporter_desc: Usage ratio across all ICF processors of the partition
    iip-processor-usage:
      percent: true
      exporter_name: iip_processor_usage_ratio
      exporter_desc: Usage ratio across all IIP processors of the partition
    cbp-processor-usage:
      percent: true
      exporter_name: cbp_processor_usage_ratio
      exporter_desc: Usage ratio across all CBP processors of the partition
    zvm-paging-rate:
      percent: false
      exporter_name: zvm_paging_rate_pages_per_second
      exporter_desc: z/VM paging rate in pages/sec

  channel-usage:
    channel-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the channel
    channel-name:
      percent: false
      exporter_name: null  # Ignored (used for identification in channel-usage)
      exporter_desc: null
    shared-channel:
      percent: false
      exporter_name: null  # Ignored (used for identification in channel-usage)
      exporter_desc: null
    logical-partition-name:
      percent: false
      exporter_name: null  # Ignored (used for identification in channel-usage)
      exporter_desc: null

  crypto-usage:
    adapter-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the crypto adapter
    channel-id:
      percent: false
      exporter_name: null  # Ignored (used for identification in adapter-usage)
      exporter_desc: null
    crypto-id:
      percent: false
      exporter_name: null  # Ignored (used for identification in adapter-usage)
      exporter_desc: null

  flash-memory-usage:
    adapter-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the flash memory adapter
    channel-id:
      percent: false
      exporter_name: null  # Ignored (used for identification in adapter-usage)
      exporter_desc: null

  roce-usage:
    adapter-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the RoCE adapter

  # Available for CPCs in DPM mode

  dpm-system-usage-overview:
    processor-usage:
      percent: true
      exporter_name: processor_usage_ratio
      exporter_desc: Usage ratio across all processors of the CPC
    all-shared-processor-usage:
      percent: true
      exporter_name: shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared processors of the CPC
    cp-all-processor-usage:
      percent: true
      exporter_name: cp_processor_usage_ratio
      exporter_desc: Usage ratio across all CP processors of the CPC
    cp-shared-processor-usage:
      percent: true
      exporter_name: cp_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared CP processors of the CPC
    ifl-all-processor-usage:
      percent: true
      exporter_name: ifl_processor_usage_ratio
      exporter_desc: Usage ratio across all IFL processors of the CPC
    ifl-shared-processor-usage:
      percent: true
      exporter_name: ifl_shared_processor_usage_ratio
      exporter_desc: Usage ratio across all shared IFL processors of the CPC
    network-usage:
      percent: true
      exporter_name: network_adapter_usage_ratio
      exporter_desc: Usage ratio across all network adapters of the CPC
    storage-usage:
      percent: true
      exporter_name: storage_adapter_usage_ratio
      exporter_desc: Usage ratio across all storage adapters of the CPC
    accelerator-usage:
      percent: true
      exporter_name: accelerator_adapter_usage_ratio
      exporter_desc: Usage ratio across all accelerator adapters of the CPC
    crypto-usage:
      percent: true
      exporter_name: crypto_adapter_usage_ratio
      exporter_desc: Usage ratio across all crypto adapters of the CPC
    power-consumption-watts:
      percent: false
      exporter_name: power_watts
      exporter_desc: Power consumption of the CPC
    temperature-celsius:
      percent: false
      exporter_name: ambient_temperature_celsius
      exporter_desc: Ambient temperature of the CPC

  partition-usage:
    processor-usage:
      percent: true
      exporter_name: processor_usage_ratio
      exporter_desc: Usage ratio across all processors of the partition
    network-usage:
      percent: true
      exporter_name: network_adapter_usage_ratio
      exporter_desc: Usage ratio of all network adapters of the partition
    storage-usage:
      percent: true
      exporter_name: storage_adapter_usage_ratio
      exporter_desc: Usage ratio of all storage adapters of the partition
    accelerator-usage:
      percent: true
      exporter_name: accelerator_adapter_usage_ratio
      exporter_desc: Usage ratio of all accelerator adapters of the partition
    crypto-usage:
      percent: true
      exporter_name: crypto_adapter_usage_ratio
      exporter_desc: Usage ratio of all crypto adapters of the partition

  adapter-usage:
    adapter-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the adapter

  network-physical-adapter-port:
    network-port-id:
      # type: info
      percent: false
      exporter_name: null  # Ignored (identifies the port, used in label)
      exporter_desc: null
    bytes-sent:
      # type: counter
      percent: false
      exporter_name: bytes_sent_count
      exporter_desc: Number of Bytes in unicast packets that were sent
    bytes-received:
      # type: counter
      percent: false
      exporter_name: bytes_received_count
      exporter_desc: Number of Bytes in unicast packets that were received
    packets-sent:
      # type: counter
      percent: false
      exporter_name: packets_sent_count
      exporter_desc: Number of unicast packets that were sent
    packets-received:
      # type: counter
      percent: false
      exporter_name: packets_received_count
      exporter_desc: Number of unicast packets that were received
    packets-sent-dropped:
      # type: counter
      percent: false
      exporter_name: packets_sent_dropped_count
      exporter_desc: Number of sent packets that were dropped (resource shortage)
    packets-received-dropped:
      # type: counter
      percent: false
      exporter_name: packets_received_dropped_count
      exporter_desc: Number of received packets that were dropped (resource shortage)
    packets-sent-discarded:
      # type: counter
      percent: false
      exporter_name: packets_sent_discarded_count
      exporter_desc: Number of sent packets that were discarded (malformed)
    packets-received-discarded:
      # type: counter
      percent: false
      exporter_name: packets_received_discarded_count
      exporter_desc: Number of received packets that were discarded (malformed)
    multicast-packets-sent:
      # type: counter
      percent: false
      exporter_name: multicast_packets_sent_count
      exporter_desc: Number of multicast packets sent
    multicast-packets-received:
      # type: counter
      percent: false
      exporter_name: multicast_packets_received_count
      exporter_desc: Number of multicast packets received
    broadcast-packets-sent:
      # type: counter
      percent: false
      exporter_name: broadcast_packets_sent_count
      exporter_desc: Number of broadcast packets sent
    broadcast-packets-received:
      # type: counter
      percent: false
      exporter_name: broadcast_packets_received_count
      exporter_desc: Number of broadcast packets received
    interval-bytes-sent:
      percent: false
      exporter_name: data_sent_bytes
      exporter_desc: Amount of data sent over the collection interval
    interval-bytes-received:
      percent: false
      exporter_name: data_received_bytes
      exporter_desc: Amount of data received over the collection interval
    bytes-per-second-sent:
      percent: false
      exporter_name: data_rate_sent_bytes_per_second
      exporter_desc: Data rate sent over the collection interval
    bytes-per-second-received:
      percent: false
      exporter_name: data_rate_received_bytes_per_second
      exporter_desc: Data rate received over the collection interval
    utilization:
      percent: true
      exporter_name: bandwidth_usage_ratio
      exporter_desc: Bandwidth usage ratio of the port
    mac-address:
      # type: info
      percent: false
      exporter_name: null # mac_address
      exporter_desc: null # MAC address of the port, or 'N/A'
    flags:
      # type: info
      percent: false
      exporter_name: null  # Ignored (can be detected from metric values)
      exporter_desc: null

  partition-attached-network-interface:
    partition-id:  # the OID, i.e. /api/partitions/{partition-id}
      # type: info
      percent: false
      exporter_name: null  # Ignored (identifies the partition, used in label)
      exporter_desc: null
    bytes-sent:
      # type: counter
      percent: false
      exporter_name: bytes_sent_count
      exporter_desc: Number of Bytes in unicast packets that were sent
    bytes-received:
      # type: counter
      percent: false
      exporter_name: bytes_received_count
      exporter_desc: Number of Bytes in unicast packets that were received
    packets-sent:
      # type: counter
      percent: false
      exporter_name: packets_sent_count
      exporter_desc: Number of unicast packets that were sent
    packets-received:
      # type: counter
      percent: false
      exporter_name: packets_received_count
      exporter_desc: Number of unicast packets that were received
    packets-sent-dropped:
      # type: counter
      percent: false
      exporter_name: packets_sent_dropped_count
      exporter_desc: Number of sent packets that were dropped (resource shortage)
    packets-received-dropped:
      # type: counter
      percent: false
      exporter_name: packets_received_dropped_count
      exporter_desc: Number of received packets that were dropped (resource shortage)
    packets-sent-discarded:
      # type: counter
      percent: false
      exporter_name: packets_sent_discarded_count
      exporter_desc: Number of sent packets that were discarded (malformed)
    packets-received-discarded:
      # type: counter
      percent: false
      exporter_name: packets_received_discarded_count
      exporter_desc: Number of received packets that were discarded (malformed)
    multicast-packets-sent:
      # type: counter
      percent: false
      exporter_name: multicast_packets_sent_count
      exporter_desc: Number of multicast packets sent
    multicast-packets-received:
      # type: counter
      percent: false
      exporter_name: multicast_packets_received_count
      exporter_desc: Number of multicast packets received
    broadcast-packets-sent:
      # type: counter
      percent: false
      exporter_name: broadcast_packets_sent_count
      exporter_desc: Number of broadcast packets sent
    broadcast-packets-received:
      # type: counter
      percent: false
      exporter_name: broadcast_packets_received_count
      exporter_desc: Number of broadcast packets received
    interval-bytes-sent:
      percent: false
      exporter_name: data_sent_bytes
      exporter_desc: Amount of data sent over the collection interval
    interval-bytes-received:
      percent: false
      exporter_name: data_received_bytes
      exporter_desc: Amount of data received over the collection interval
    bytes-per-second-sent:
      percent: false
      exporter_name: data_rate_sent_bytes_per_second
      exporter_desc: Data rate sent over the collection interval
    bytes-per-second-received:
      percent: false
      exporter_name: data_rate_received_bytes_per_second
      exporter_desc: Data rate received over the collection interval
    flags:
      # type: info
      percent: false
      exporter_name: null  # Ignored (can be detected from metric values)
      exporter_desc: null

  # Available for CPCs in any mode

  zcpc-environmentals-and-power:
    temperature-celsius:
      percent: false
      exporter_name: null  # Ignored (duplicate of ambient_temperature_celsius)
      exporter_desc: null
    humidity:
      percent: false
      exporter_name: humidity_percent
      exporter_desc: Relative humidity
    dew-point-celsius:
      percent: false
      exporter_name: dew_point_celsius
      exporter_desc: Dew point
    power-consumption-watts:
      percent: false
      exporter_name: null  # Ignored (duplicate of power_watts)
      exporter_desc: null
    heat-load:
      percent: false
      exporter_name: heat_load_total_btu_per_hour
      exporter_desc: Total heat load of the CPC
    heat-load-forced-air:
      percent: false
      exporter_name: heat_load_forced_air_btu_per_hour
      exporter_desc: Heat load of the CPC covered by forced-air
    heat-load-water:
      percent: false
      exporter_name: heat_load_water_btu_per_hour
      exporter_desc: Heat load of the CPC covered by water
    exhaust-temperature-celsius:
      percent: false
      exporter_name: exhaust_temperature_celsius
      exporter_desc: Exhaust temperature of the CPC

  environmental-power-status:
    # linecord-one-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord1_name
    #   exporter_desc: Line cord 1 identifier - "not-connected" if not available
    linecord-one-power-phase-A:
      percent: false
      exporter_name: power_cord1_phase_a_watts
      exporter_desc: Power in Phase A of line cord 1 - 0 if not available
    linecord-one-power-phase-B:
      percent: false
      exporter_name: power_cord1_phase_b_watts
      exporter_desc: Power in Phase B of line cord 1 - 0 if not available
    linecord-one-power-phase-C:
      percent: false
      exporter_name: power_cord1_phase_c_watts
      exporter_desc: Power in Phase C of line cord 1 - 0 if not available
    # linecord-two-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord2_name
    #   exporter_desc: Line cord 2 identifier - "not-connected" if not available
    linecord-two-power-phase-A:
      percent: false
      exporter_name: power_cord2_phase_a_watts
      exporter_desc: Power in Phase A of line cord 2 - 0 if not available
    linecord-two-power-phase-B:
      percent: false
      exporter_name: power_cord2_phase_b_watts
      exporter_desc: Power in Phase B of line cord 2 - 0 if not available
    linecord-two-power-phase-C:
      percent: false
      exporter_name: power_cord2_phase_c_watts
      exporter_desc: Power in Phase C of line cord 2 - 0 if not available
    # linecord-three-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord3_name
    #   exporter_desc: Line cord 3 identifier - "not-connected" if not available
    linecord-three-power-phase-A:
      percent: false
      exporter_name: power_cord3_phase_a_watts
      exporter_desc: Power in Phase A of line cord 3 - 0 if not available
    linecord-three-power-phase-B:
      percent: false
      exporter_name: power_cord3_phase_b_watts
      exporter_desc: Power in Phase B of line cord 3 - 0 if not available
    linecord-three-power-phase-C:
      percent: false
      exporter_name: power_cord3_phase_c_watts
      exporter_desc: Power in Phase C of line cord 3 - 0 if not available
    # linecord-four-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord4_name
    #   exporter_desc: Line cord 4 identifier - "not-connected" if not available
    linecord-four-power-phase-A:
      percent: false
      exporter_name: power_cord4_phase_a_watts
      exporter_desc: Power in Phase A of line cord 4 - 0 if not available
    linecord-four-power-phase-B:
      percent: false
      exporter_name: power_cord4_phase_b_watts
      exporter_desc: Power in Phase B of line cord 4 - 0 if not available
    linecord-four-power-phase-C:
      percent: false
      exporter_name: power_cord4_phase_c_watts
      exporter_desc: Power in Phase C of line cord 4 - 0 if not available
    # linecord-five-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord5_name
    #   exporter_desc: Line cord 5 identifier - "not-connected" if not available
    linecord-five-power-phase-A:
      percent: false
      exporter_name: power_cord5_phase_a_watts
      exporter_desc: Power in Phase A of line cord 5 - 0 if not available
    linecord-five-power-phase-B:
      percent: false
      exporter_name: power_cord5_phase_b_watts
      exporter_desc: Power in Phase B of line cord 5 - 0 if not available
    linecord-five-power-phase-C:
      percent: false
      exporter_name: power_cord5_phase_c_watts
      exporter_desc: Power in Phase C of line cord 5 - 0 if not available
    # linecord-six-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord6_name
    #   exporter_desc: Line cord 6 identifier - "not-connected" if not available
    linecord-six-power-phase-A:
      percent: false
      exporter_name: power_cord6_phase_a_watts
      exporter_desc: Power in Phase A of line cord 6 - 0 if not available
    linecord-six-power-phase-B:
      percent: false
      exporter_name: power_cord6_phase_b_watts
      exporter_desc: Power in Phase B of line cord 6 - 0 if not available
    linecord-six-power-phase-C:
      percent: false
      exporter_name: power_cord6_phase_c_watts
      exporter_desc: Power in Phase C of line cord 6 - 0 if not available
    # linecord-seven-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord7_name
    #   exporter_desc: Line cord 7 identifier - "not-connected" if not available
    linecord-seven-power-phase-A:
      percent: false
      exporter_name: power_cord7_phase_a_watts
      exporter_desc: Power in Phase A of line cord 7 - 0 if not available
    linecord-seven-power-phase-B:
      percent: false
      exporter_name: power_cord7_phase_b_watts
      exporter_desc: Power in Phase B of line cord 7 - 0 if not available
    linecord-seven-power-phase-C:
      percent: false
      exporter_name: power_cord7_phase_c_watts
      exporter_desc: Power in Phase C of line cord 7 - 0 if not available
    # linecord-eight-name:
    #   # type: info
    #   percent: false
    #   exporter_name: power_cord8_name
    #   exporter_desc: Line cord 8 identifier - "not-connected" if not available
    linecord-eight-power-phase-A:
      percent: false
      exporter_name: power_cord8_phase_a_watts
      exporter_desc: Power in Phase A of line cord 8 - 0 if not available
    linecord-eight-power-phase-B:
      percent: false
      exporter_name: power_cord8_phase_b_watts
      exporter_desc: Power in Phase B of line cord 8 - 0 if not available
    linecord-eight-power-phase-C:
      percent: false
      exporter_name: power_cord8_phase_c_watts
      exporter_desc: Power in Phase C of line cord 8 - 0 if not available

  zcpc-processor-usage:
    processor-name:
      # type: info
      percent: false
      exporter_name: null  # Ignored (used as label)
      exporter_desc: null
    processor-type:
      # type: info
      percent: false
      exporter_name: null  # Ignored (redundant with processor-name)
      exporter_desc: null
    processor-usage:
      percent: true
      exporter_name: usage_ratio
      exporter_desc: Usage ratio of the processor
    smt-usage:
      percent: false
      exporter_name: smt_mode_percent
      exporter_desc: Percentage of time the processor was in in SMT mode - -1 if not supported
    thread-0-usage:
      percent: true
      exporter_name: smt_thread0_usage_ratio
      exporter_desc: Usage ratio of thread 0 of the processor when in SMT mode - -1 if not supported
    thread-1-usage:
      percent: true
      exporter_name: smt_thread1_usage_ratio
      exporter_desc: Usage ratio of thread 1 of the processor when in SMT mode - -1 if not supported

Demo setup with Grafana

This section describes a demo setup with a Prometheus server and with the Grafana frontend for visualizing the metrics.

The Prometheus server scrapes the metrics from the exporter. The Grafana server provides a HTML based web server that visualises the metrics in a dashboard.

The following diagram shows the demo setup:

Demo setup

Perform these steps for setting it up:

  • Download and install Prometheus from the Prometheus download page or using your OS-specific package manager.

    Copy the sample Prometheus configuration file (examples/prometheus.yaml in the Git repo) as prometheus.yaml into some directory where you will run the Prometheus server. The host:port for contacting the exporter is already set to localhost:9291 and it can be changed as needed.

    Run the Prometheus server as follows:

    $ prometheus --config.file=prometheus.yaml
    

    For details, see the Prometheus guide.

  • Download and install Grafana from the Grafana download page or using your OS-specific package manager.

    Run the Grafana server as follows:

    $ grafana-server -homepath {homepath} web
    

    Where:

    • {homepath} is the path name of the directory with the conf and data directories, for example /usr/local/Cellar/grafana/7.3.4/share/grafana on macOS when Grafana was installed using Homebrew.

    By default, the web interface will be on localhost:3000. This can be changed as needed. For details, see the Prometheus guide on Grafana.

  • Direct your web browser at https://localhost:3000 and log on using admin/admin.

    Create a data source in Grafana with:

    Create a dashboard in Grafana by importing the sample dashboard (examples/grafana.json in the Git repo). It will use the data source ZHMC_Prometheus.

Trouble shooting

This section describes some issues and how to resolve them. If you encounter an issue that is not covered here, see Reporting issues.

Permission error

Example:

$ zhmc_prometheus_exporter
Permission error. Make sure you have appropriate permissions to read from
  /etc/zhmc-prometheus-exporter/hmccreds.yaml.

You don’t have permission to read from a YAML file. Change the permissions with chmod, check man chmod if you are unfamiliar with it.

File not found

Example:

$ zhmc_prometheus_exporter
Error: File not found. It seems that /etc/zhmc-prometheus-exporter/hmccreds.yaml does not exist.

A required YAML file (hmccreds.yaml and metrics.yaml) does not exist. Make sure that you specify paths, relative or absolute, with -c or -m if the file is not in etc/zhmc-prometheus-exporter/. You have to copy the HMC credentials file from the examples folder and fill in your own credentials, see Quickstart for more information.

Section not found

Example:

$ zhmc_prometheus_exporter
Section metric_groups not found in file /etc/zhmc-prometheus-exporter/metrics.yaml.

At least one of the sections metric_groups and metrics in your metrics.yaml or metrics in hmccreds.yaml is missing in its entirety. See chapter Metric definition file for more information.

Doesn’t follow the YAML syntax

Example:

$ zhmc_prometheus_exporter
/etc/zhmc-prometheus-exporter/metrics.yaml does not follow the YAML syntax

A YAML file you specified breaks the syntax rules of the YAML specification. If you derive your YAML files from the existing examples (see chapter Quickstart), this error should not occur, you can also check the YAML specification.

You did not specify

Example:

$ zhmc_prometheus_exporter
You did not specify the IP address of the HMC in /etc/zhmc-prometheus-exporter/hmccreds.yaml.

There is a lot of mandatory information in the two YAML files that might be missing if you improperly filled the credentials file (see Quickstart) or made bad changes to the metrics file (see Metric definition file).

All of these values could in some way be missing or incorrect:

In the credentials YAML file, in the section “metrics”

  • hmc, the IP address of the HMC (it must be a correct IP address as well!)
  • userid, a username for the HMC
  • password, the respective password

In the metrics YAML file, in the section “metric_groups”, for each metric group

  • prefix, the prefix for the metrics to be exported
  • fetch, specifying whether the group should be fetched (it must be one of True or False as well!)

In the metrics YAML file, in the section “metrics”, for each metric group

  • The group must also exist in the metric_groups section
  • percent, specifying whether the metric is a percent value (it must be one of True or False as well!)
  • exporter_name, the name for the exporter (minus the prefix)
  • exporter_desc, the mandatory description for the exporter

Time out

Example:

$ zhmc_prometheus_exporter
Time out. Ensure that you have access to the HMC and that you have stored
  the correct IP address in /etc/zhmc-prometheus-exporter/hmccreds.yaml.

There is a certain timeout threshold if the HMC cannot be found. Check that you have access to the HMC on the IP address that you specified in the HMC credentials file.

Authentication error

Example:

$ zhmc_prometheus_exporter
Authentication error. Ensure that you have stored a correct user ID-password
  combination in /etc/zhmc-prometheus-exporter/hmccreds.yaml.

Wrong username or password in the HMC credentials file. Check if you can regularly access the HMC with this username-password combination.

Warning: Skipping metric or metric group

Example:

$ zhmc_prometheus_exporter
...: UserWarning: Skipping metric group 'new-metric-group' returned by the HMC that is
  not defined in the 'metric_groups' section of metric definition file metrics.yaml
  warnings.warn(warning_str % (metric, filename))

$ zhmc_prometheus_exporter
...: UserWarning: Skipping metric 'new-metric' of metric group 'new-metric-group'
  returned by the HMC that is not defined in the 'metrics' section of metric
  definition file metrics.yaml
  warnings.warn(warning_str % (metric, filename))

If the HMC implements new metrics, or if the metric definition file misses a metric or metric group, the exporter issues ths warning to make you aware of that.

Development

This page covers the relevant aspects for developers.

Repository

The Git repository for the exporter project is GitHub: https://github.com/zhmcclient/zhmc-prometheus-exporter

Code of Conduct

Contribution must follow the Code of Conduct as defined by the Contributor Covenant.

Contributing

Third party contributions to this project are welcome!

In order to contribute, create a Git pull request, considering this:

  • Test is required.
  • Each commit should only contain one “logical” change.
  • A “logical” change should be put into one commit, and not split over multiple commits.
  • Large new features should be split into stages.
  • The commit message should not only summarize what you have done, but explain why the change is useful.
  • The commit message must follow the format explained below.

What comprises a “logical” change is subject to sound judgement. Sometimes, it makes sense to produce a set of commits for a feature (even if not large). For example, a first commit may introduce a (presumably) compatible API change without exploitation of that feature. With only this commit applied, it should be demonstrable that everything is still working as before. The next commit may be the exploitation of the feature in other components.

For further discussion of good and bad practices regarding commits, see:

Format of commit messages

A commit message must start with a short summary line, followed by a blank line.

Optionally, the summary line may start with an identifier that helps identifying the type of change or the component that is affected, followed by a colon.

It can include a more detailed description after the summary line. This is where you explain why the change was done, and summarize what was done.

It must end with the DCO (Developer Certificate of Origin) sign-off line in the format shown in the example below, using your name and a valid email address of yours. The DCO sign-off line certifies that you followed the rules stated in DCO 1.1. In short, you certify that you wrote the patch or otherwise have the right to pass it on as an open-source patch.

We use GitCop during creation of a pull request to check whether the commit messages in the pull request comply to this format. If the commit messages do not comply, GitCop will add a comment to the pull request with a description of what was wrong.

Example commit message:

cookies: Add support for delivering cookies

Cookies are important for many people. This change adds a pluggable API for
delivering cookies to the user, and provides a default implementation.

Signed-off-by: Random J Developer <random@developer.org>

Use git commit --amend to edit the commit message, if you need to.

Use the --signoff (-s) option of git commit to append a sign-off line to the commit message with your name and email as known by Git.

If you like filling out the commit message in an editor instead of using the -m option of git commit, you can automate the presence of the sign-off line by using a commit template file:

  • Create a file outside of the repo (say, ~/.git-signoff.template) that contains, for example:

    <one-line subject>
    
    <detailed description>
    
    Signed-off-by: Random J Developer <random@developer.org>
    
  • Configure Git to use that file as a commit template for your repo:

    git config commit.template ~/.git-signoff.template
    

Releasing a version

This section shows the steps for releasing a version to PyPI.

Switch to your work directory of the zhmc-prometheus-exporter Git repo (this is where the Makefile is), and perform the following steps in that directory:

  1. Set a shell variable for the version to be released, e.g.:

    MNU='0.11.0'
    
  2. Verify that your working directory is in a Git-wise clean state:

    git status
    
  3. Check out the master branch, and update it from upstream:

    git checkout master
    git pull
    
  4. Create a topic branch for the release, based upon the master branch:

    git checkout -b release_$MNU
    
  5. Edit the version file and set the version to be released:

    vi zhmc_prometheus_exporter/_version.py
    

    __version__ = 'M.N.U'

    Where M.N.U is the version to be released.

  6. Edit the change log and perform the following changes in the top-most section (that is the section for the version to be released):

    vi docs/changes.rst
    
    • If needed, change the version in the section heading to the version to be released, e.g.:

      Version 0.11.0
      ^^^^^^^^^^^^^^
      
    • Change the release date to today’s date, e.g.:

      Released: 2018-08-20
      
    • Make sure that the change log entries reflect all changes since the previous version, and make sure they are relevant for and understandable by users.

    • In the “Known issues” list item, remove the link to the issue tracker and add any known issues you want users to know about. Just linking to the issue tracker quickly becomes incorrect for released versions:

      **Known issues:**
      
      * ...
      
    • Remove all empty list items in the change log section for this release.

  7. Commit your changes and push them upstream:

    git add zhmc_prometheus_exporter/_version.py docs/changes.rst
    git commit -sm "Release $MNU"
    git push --set-upstream origin release_$MNU
    
  8. On GitHub, create a pull request for branch release_$MNU.

  9. Perform a complete test:

    make test
    

    This should not fail because the same tests have already been run in the Travis CI. However, run it for additional safety before the release.

    If this test fails, fix any issues until the test succeeds.

  10. Once the CI tests on GitHub are complete, merge the pull request.

  11. Update your local master branch:

    git checkout master
    git pull
    
  12. Tag the master branch with the release label and push the tag upstream:

    git tag $MNU
    git push --tags
    
  13. On GitHub, edit the new tag, and create a release description on it. This will cause it to appear in the Release tab.

    You can see the tags in GitHub via Code -> Releases -> Tags.

  14. Upload the package to PyPI:

    make upload
    

    This will show the package version and will ask for confirmation.

    Attention! This only works once for each version. You cannot release the same version twice to PyPI.

  15. Verify that the released version is shown on PyPI.

  16. On GitHub, close milestone M.N.U.

Starting a new version

This section shows the steps for starting development of a new version.

Switch to your work directory of the zhmc-prometheus-exporter Git repo (this is where the Makefile is), and perform the following steps in that directory:

  1. Set a shell variable for the version to be started, e.g.:

    MNU='0.11.1'
    
  2. Check out the branch the new version is based on, make sure it is up to date with upstream, and create a topic branch for the new version:

    git checkout master
    git pull
    git checkout -b start_$MNU
    
  3. Edit the version file and set the version to the new version and to development:

    vi zhmc_prometheus_exporter/_version.py
    

    __version__ = 'M.N.U.dev1'

    Where M.N.U is the version to be started.

  4. Edit the change log:

    vi docs/changes.rst
    

    and insert the following section before the top-most section:

    Version 0.20.0
    ^^^^^^^^^^^^^^
    
    Released: not yet
    
    **Incompatible changes:**
    
    **Deprecations:**
    
    **Bug fixes:**
    
    **Enhancements:**
    
    **Cleanup:**
    
    **Known issues:**
    
    * See `list of open issues`_.
    
    .. _`list of open issues`: https://github.com/zhmcclient/zhmc-prometheus-exporter/issues
    
  5. Commit your changes and push them upstream:

    git add zhmc_prometheus_exporter/_version.py docs/changes.rst
    git commit -sm "Start $MNU"
    git push --set-upstream origin start_$MNU
    
  6. On GitHub, create a Pull Request for branch start_M.N.U.

  7. On GitHub, create a milestone for the new version M.N.U.

    You can create a milestone in GitHub via Issues -> Milestones -> New Milestone.

  8. On GitHub, go through all open issues and pull requests that still have milestones for previous releases set, and either set them to the new milestone, or to have no milestone.

  9. On GitHub, once the checks for this Pull Request succeed:

    • Merge the Pull Request (no review is needed)
    • Delete the branch of the Pull Request (start_M.N.U)
  10. Checkout the branch the new version is based on, update it from upstream, and delete the local topic branch you created:

    git checkout master
    git pull
    git branch -d start_$MNU
    

Building the distribution archives

You can build a binary (wheel) distribution archive and a source distribution archive (a more minimal version of the repository) with:

$ make build

You will find the files zhmc_prometheus_exporter-VERSION_NUMBER-py2.py3-none-any.whl and zhmc_prometheus_exporter-VERSION_NUMBER.tar.gz in the dist folder, the former being the binary and the latter being the source distribution archive.

The binary distribution archive could be installed with:

$ pip install zhmc_prometheus_exporter-VERSION_NUMBER-py2.py3-none-any.whl

The source distribution archive could be installed with:

$ tar -xfz zhmc_prometheus_exporter-VERSION_NUMBER.tar.gz
$ pip install zhmc_prometheus_exporter-VERSION_NUMBER

Building the documentation

You can build the HTML documentation with:

$ make builddoc

The root file for the built documentation will be build_docs/index.html.

Testing

You can perform unit tests with:

$ make test

You can perform a flake8 check with:

$ make check

You can perform a pylint check with:

$ make pylint

Appendix

Glossary

Exporter
A server application for exposing metrics to Prometheus
IBM Z
IBM’s mainframe product line
Prometheus
A server application for monitoring and alerting
Z HMC
Hardware Management Console for IBM Z

Change log

Version 0.6.0

Released: 2020-12-07

Bug fixes:

  • Docs: Fixed the names of the Prometheus metrics of the line cord power metrics. (see issue #89)
  • Added missing dependency to ‘urllib3’ Python package.
  • README: Fixed the links to the metric definition and HMC credentials files (see issue #88).
  • Dockerfile: Fixed that all files from the package are included in the Docker image (see issue #91).

Enhancements:

  • Added support for specifying a new optional property if in the definition of metric groups in the metric definition file, which specifies a Python expression representing a condition under which the metric group is fetched. The HMC version can be specified in the expression as a hmc_version variable. (see issue #77)

Cleanup:

  • The metric definition and HMC credentials YAML files are now validated using a schema definition (using JSON schema). This improved the ability to enhance these files, and allowed to get rid of error-prone manual validation code. The schema validation files are part of the installed Python package. This adds a dependency to the ‘jsonschema’ package. (see issue #81)

Version 0.5.0

Released: 2020-12-03

Incompatible changes:

  • The sample metric definition file has changed the metric names that are exported, and also the labels. This is only a change if you choose to use the new sample metric definition file; if you continue using your current metric definition file, the exported metrics will be as before.

Enhancements:

  • The packages needed for installation are now properly reflected in the package metadata (part of issue #55).

  • Improved the metric labels published along with metric values in multiple ways. The sample metric definition file has been updated to exploit all these new capabilities:

    • The type of resource to which a metric value belongs is now identified in the label name e.g. by showing a label ‘cpc’ or ‘adapter’ instead of the generic label ‘resource’.
    • Resources that are inside a CPC (e.g. adapters, partitions) now can show their parent resource (the CPC) as an additional label, if the metric definition file specifies that.
    • Metrics that identify the resource (e.g. ‘channel-id’ in the ‘channel-usage’ metric group now can used as additional labels on the actual metric value, if the metric definition file specifies that.

    Note that these changes will only become active if you pick them up in your metric definition file, e.g. by using the updated sample metric definition file. If you continue to use your current metric definition file, nothing will change regarding the labels.

  • The published metrics no longer contain empty HELP/TYPE comments.

  • Metrics with the special value -1 that are returned by the HMC for some metrics in case the resource does not exist, are now suppressed.

  • Disabled the Platform and Python specific additional metrics so that they are not collected or published (see issue #66).

  • Overhauled the complete documentation (triggered by issue #57).

  • Added a cache for looking up HMC resources from their resource URIs to avoid repeated lookup on the HMC. This speeds up large metric retrievals from over a minute to sub-seconds (see issue #73).

  • Added a command line option -v / –verbose to show additional verbose messages (see issue #54).

  • Showing the HMC API version as a verbose message.

  • Removed ensemble/zBX related metrics from the sample metric definition file.

  • Added all missing metrics up to z15 to the sample metric definition file.

  • Added support for additional labels to be shown in every metric that is exported, by specifying them in a new extra_labels section of the HMC credentials file. This allows providing some identification of the HMC environment, if needed. (see issue #80)

Cleanup:

  • Removed the use of ‘pbr’ to simplify installation and development (see issue #55).

Version 0.4.1

Released: 2020-11-29

Bug fixes:

  • Fixed the error that only a subset of the possible exceptions were handled that can be raised by the zhmcclient package (i.e. only ConnectionTimeout and ServerAuthError). This lead to lengthy and confusing tracebacks being shown when they occurred. Now, they are all handled and result in a proper error message.
  • Added metadata to the Pypi package declaring a development status of 4 - Beta, and requiring the supported Python versions (3.4 and higher).

Enhancements:

  • Migrated from Travis and Appveyor to GitHub Actions. This required several changes in package dependencies for development.
  • Added options –help-creds and –help-metrics that show brief help for the HMC credentials file and for the metric definition file, respectively.
  • Improved all exception and warning messages to be better understandable and to provide the context for any issues with content in the HMC credentials or metric definition files.
  • Expanded the supported Python versions to 3.4 and higher.
  • Expanded the supported operating systems to Linux, macOS, Windows.
  • Added the sample HMC credentials file and the sample metric definition file to the appendix of the documentation.
  • The sample metric definition file ‘examples/metrics.yaml’ has been completed so that it now defines all metrics of all metric groups supported by HMC 2.15 (z15). Note that some metric values have been renamed for clarity and consistency.

Version 0.4.0

Released: 2019-08-21

Bug fixes:

  • Avoid exception in case of a connection drop error handling.
  • Replace yaml.load() by yaml.safe_load(). In PyYAML before 5.1, the yaml.load() API could execute arbitrary code if used with untrusted data (CVE-2017-18342).

Version 0.3.0

Released: 2019-08-11

Bug fixes:

  • Reconnect in case of a connection drop.

Version 0.2.0

Released: 2018-08-24

Incompatible changes:

  • All metrics now have a zhmc_ prefix.

Bug fixes:

  • Uses Grafana 5.2.2.

Version 0.1.2

Released: 2018-08-23

Enhancements:

  • The description now instructs the user to pip3 install zhmc-prometheus-exporter instead of running a local install from the cloned repository. It also links to the stable version of the documentation rather than to the latest build.

Version 0.1.1

Released: 2018-08-23

Initial PyPI release (0.1.0 was for testing purposes)

Version 0.1.0

Released: Only on GitHub, never on PyPI

Initial release