Node monitoring

Abstract

This subsection describes the key elements of hardware monitoring and monitoring the status of node uptime and performance status. It does not consider the monitoring of system logs.

Motivation

Due to the decentralized nature of the blockchain infrastructure, it is challenging to find an appropriate method for monitoring the status of all running nodes. Validating whether a node is running or not is insufficient to determine the overall health of a specific node. A node as a virtual or physical appliance could be running, while the actual blockchain-specific process is not operational. Once adequate measures have been taken to properly monitor a node's performance, then an overall network health can be determined.

Elaboration

The following parameters are prerequisites for effective hardware monitoring and they are blockchain agnostic. However, there might be a qualitative difference for required monitoring, based on the type of node (e.g. validator vs. read-only nodes).

General

It is recommended to get an overview about:

Selection of components to be monitored
Selection of the measured variables

What should be monitored

The status of the following node properties could be monitored:

Operating system
System voltage & Uninterruptible power supply (UPS)
Liveness and readiness of node process
Connectivity of public ports
CPU available/used
Memory available/used
Disk available/used
Bandwidth available/used
Link Speed

How should it be monitored?

The node properties could be monitored a follows:

Use of a watchdog (hardware or software)
Sensor technology
Storage technology for metrics
Transmission protocols and architecture (push vs pull)
Display of the measured values and visualisation
Selection of the analysis method
Alerting
- what to alert: The "what" can be inferred from standard operational practices and might be blockchain agnostic, e.g. selection of relevant metrics and criteria
- how to alert
- who to alert
Definition of sampling rates (per Metric)
Selection of the transmission protocols

Drawbacks are:

Monitoring the infrastructure layer introduces overhead caused by reporting processes
Monitoring the blockchain layer require resources. The process involves parsing the blockchain and creating a regular database with the parsed information. This may use 10 times more disk space as the blockchain itself uses. Disk, CPU and Memory requirements for monitoring the blockchain layer grows together with the blockchain disk footprint

Unresolved questions are:

Defining a process to ingest and visualize the data reported by the core nodes
Defining a process for reporting hardware resource available and usage
How to report the information needed for monitoring without granting more permissions that are required to perform the tasks
Establishing an alerting strategy when multiple node owners are involved

Internal references and dependencies

'How to alert' and 'who to alert' refers to the section Collaboration.

References to best practices, examples

Core nodes are different for each blockchain infrastructure architecture.

Ethereum Public Network

For public ethereum nodes, the bootnodes are the core nodes, as described in the documentation.

The way the Ethereum team monitors their bootnodes has been addressed by Péter Szilágyi at Devcon 5 Monitoring an Ethereum infrastructure. The video describes monitoring at infrastructure layer. Péter mentions they used DataDog, Graphana, Phrometheus and InfluxDB.

The PoW mining nodes can also be monitored individually by each node owner.

European Blockchain Services Infrastructure

A mixture of various monitoring tools are used apparently by EBSI to monitor the infrastructure layer. One tool being mentioned is phpservermon.

For the blockchain layer, this service category is mentioned in EBSI V1, however it may contain further monitoring tools. As of V1, the existing monitoring tools are block explorers.

Finding an appropriate way of monitoring the infrastructure layer is challenging and is highly dependent on the infrastructure architecture. Security considerations have to be taken into account when the core nodes are owned and controlled by various parties.

The blockchain layer monitoring can be easily adapted from a working solution if the underlying blockchain technology is the same (ie: Ethereum based blockchain)

Hyperledger Besu Monitor Metrics

RFC-0831
Authors: Iosif Peterfi, David Maas, Chinmay Khandekar, Kevin Wittek, Andrei Ionita
Status: work in progress
Last modified: 2021-03-17