Milvus collects monitoring data and pushes it to Pushgateway.Milvus collects monitoring data and pushes it to Pushgateway.Meanwhile, Prometheus Server will pull data from Pushgateway and save it to its timing database (TSDB) on a regular basis. Prometheus Server will push the alarm information to Alertmanager when an alarm is generated. Grafana can be used to visualize the collected data.

1

1、Prometheus

2、Alertmanager

3、Grafana

Firstly, Prometheus is used to collect Milvus monitoring indicators, and how to connect Alertmanager to Prometheus to realize the visualization of data display and alarm mechanism.

Install the Prometheus

Download the Prometheus binary zip file.

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Install Pushgateay

Download the Pushgateway binary zip file.

tar xvfz pushgateway-*.tar.gz
cd pushgateway-*

Start the Pushgateway

./pushgateway

Turn on Prometheus monitor in server_config.yaml and set the address and port number of Pushgateway.

metric:
  enable: true       # Set the value to true to turn on Prometheus monitoring
  address: 127.0.0.1 # Set the IP address of Pushgateway
  port: 9091         # Set the port number of Pushgateway.

Download the Milvus Prometheus profile:

$ wget https://raw.githubusercontent.com/milvus-io/docs/master/v0.10.3/assets/monitoring/prometheus.yml \ -O prometheus.yml

Download Milvus alarm rules file to Prometheus root directory:

$ wget -P rules https://raw.githubusercontent.com/milvus-io/docs/master/v0.10.3/assets/monitoring/alert_rules.yml

Edit Prometheus configuration file according to actual requirements:

Start the Prometheus:

./prometheus --config.file=prometheus.yml

Login through the browser http://:9090,Go to the prometheus user interaction page.

Configuration Alertmanager

Alertmanager is primarily used to receive alarm messages sent by Prometheus. Here's the events that need to create alarm rules.

Alarm rule: Send an alarm when Milvus server goes down.

How to tell: When Milvus servers go down, indicators on the monitoring dashboard show No Data.

Alarm rule: Send alarm message when CPU/GPU temperature exceeds 80 ° C.

How to judge: Check CPU Temperature and GPU Temperature on the monitoring dashboard.

Download the Alertmanager binary zip file

tar xvfz Alertmanager-*.tar.gz
cd Alertmanager-*

Create the configuration file alertManager.yml based on the configuration Alertmanager, specify the mailbox to which to receive alarm notifications, and add the configuration file to the root of the Alertmanager

Activate the Alertmanager service and specify the configuration file:

./alertmanager --config.file=alertmanager.yml

Open it in a browser http://:3000Url, and login to the Grafana User Interaction page.

Grafana

From the Grafana User Interaction page, click Configuration>Data Sources>Prometheus, and set the following Data source properties:

数据源配置

Name

Prometheus

Default

True

URL

http://:9090

Access

Browser

img

Milvus performance indicators

Insert per Second

The number of vectors inserted per second

Queries per Minute

The number of queries run per minute

Query Time per Vector

Single vector query time = query time/number of vectors

Query Service Level

Query service level = number of queries within a certain time threshold/total number of queries

Uptime

How long the Milvus server is up (minutes)

System performance index

GPU Utilization

GPU utilization rate (%)

GPU Memory Usage

Amount of display (GB) currently used by Milvus

CPU Utilization

CPU utilization (%) = server task execution time/server total elapsed time

Memory Usage

Current amount of memory used by Milvus (GB)

Cache Utilization

Cache utilization (%)

Network IO

Read/write speed of network port (GB/s)

Disk Read Speed

Disk read speed (GB/s)

Disk Write Speed

Disk write speed (GB/s).

Hardware storage metrics

Data Size

Total amount of data stored by Milvus (GB)

Total File

The total number of data files stored in Milvus