Open-Falcon自用监控插件集

目前本站三台主机的监控系统大致搭建完善已经完成,系统的基础指标采集使用了openfalcon,数据展示使用了grafana,数据的存储使用了falcon自带的RDD,本来想搭建一套opentsdbfalcon使用,但是无奈我的主机都是最低的学生配置,扛不住过多应用的内存消耗。

最近写了一些监控的插件,主要是对业务指标进行采集和展示,记录于此,不断完善中。

  • 一些基础指标的完善,如三台服务器之间的latency,uptime等
  • web服务的监控指标的采集
  • shadowsocks server流量/连接监控

关于falcongrafana的安装,本篇就不再详细描述,官网有较完善的安装姿势~

指标上报

本篇的falcon的指标上报,直接脚本生成数据push数据到agent http接口,详情请看自定义push数据到open-falcon

falcon上报指标格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
payload = [
{
"endpoint": "test-endpoint",
"metric": "test-metric",
"timestamp": ts,
"step": 60,
"value": 1,
"counterType": "GAUGE",
"tags": "idc=lg,loc=beijing",
},
{
"endpoint": "test-endpoint",
"metric": "test-metric2",
"timestamp": ts,
"step": 60,
"value": 2,
"counterType": "GAUGE",
"tags": "idc=lg,loc=beijing",
},
]

push到接口:http://127.0.0.1:1988/v1/pushhttp methodpost

端口流量统计

这里采集了ss的一些数值指标进行上报,现在我的ssserver放在了香港,速度还不错,开放了几个端口给小伙伴,这里顺便采集了下开放出去的每个端口的数据流量。上报给falcon

实现的效果图,点击观赏~

注意需要使用ipatblesINPUT chain对流量进行匹配记录,如增加一个端口1003的统计:
sudo iptables -I INPUT -d 10.144.x.x -p tcp --dport 1003
注意:以上是在INPUT链处理的时候统计的

之后可以查看

1
2
3
4
5
6
[zhxfei@hk ~]$ sudo iptables -L -n -x -v
Chain INPUT (policy ACCEPT 13246710 packets, 1587188191 bytes)
pkts bytes target prot opt in out source destination
1458 117508 tcp -- * * 0.0.0.0/0 10.144.x.x tcp dpt:1003
77134 8936582 tcp -- * * 0.0.0.0/0 10.144.x.x tcp dpt:1001
93425 6541935 tcp -- * * 0.0.0.0/0 10.144.x.x tcp dpt:1002

采集的脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#!/usr/bin/python
# coding: utf-8
import sys
import os
import time
import requests
import json
import re
ts = int(time.time())
payload_lst = []
def get_mon_by_port(port):
command = ''' iptables -L -n -v -x| grep {}'''
res = os.popen(command.format(port)).read().strip()
if res:
pkt, byte = map(int, re.split('\s+', res)[:2])
return {
'packet-received': pkt,
'byte-received': byte
}
def get_hostname():
res_command = os.popen('hostname').read().strip()
return res_command if res_command else 'unknown'
def get_send_json(port, user):
info_dict = get_mon_by_port(port)
for k, v in info_dict.items():
payload = {
"endpoint": get_hostname(),
"metric": k,
"timestamp": ts,
"step": 60,
"value": v,
"counterType": "COUNTER",
"tags": "port={port} user={user}".format(port=port, user=user)
}
#print payload
payload_lst.append(payload)
def main():
monitor_port_user = {
1001: u'small partner',
1002: 'own',
1003: 'taolei'
}
for port, user in monitor_port_user.items():
get_send_json(port, user)
print payload_lst
main()
#print payload_lst
r = requests.post("http://127.0.0.1:1988/v1/push", data=json.dumps(payload_lst))
print r.text

Nginx监控

我没有使用falcon官网推荐的插件,而是使用了Tengine reqstat模块收集的一些信息, 脚本采集之后进行上报,主要是:

  • req / connect rate
  • Byte IN/OUT rate
  • http 2xx/3xx/4xx/5xx rate

实现的效果图,点击观赏,脚本如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
import requests
import time
import json
import os
# status_counter means : http://tengine.taobao.org/document_cn/http_reqstat_cn.html
status_counter = {
'kv': 'www.zhxfei.com',
'bytes_in': 0,
'bytes_out': 0,
'conn_total': 0,
'req_total': 0,
'http_2xx': 0,
'http_3xx': 0,
'http_4xx': 0,
'http_5xx': 0,
'http_other_status': 0,
'rt': 0,
'ups_req': 0,
'ups_rt': 0,
'ups_tries': 0,
'http_200': 0,
'http_206': 0,
'http_302': 0,
'http_304': 0,
'http_403': 0,
'http_404': 0,
'http_416': 0,
'http_499': 0,
'http_500': 0,
'http_502': 0,
'http_503': 0,
'http_504':0,
'http_508': 0,
'http_other_detail_status': 0,
'http_ups_4xx': 0,
'http_ups_5xx': 0
}
def proc_status(lines):
status = [
'bytes_in',
'bytes_out',
'conn_total',
'req_total',
'http_2xx',
'http_3xx',
'http_4xx',
'http_5xx',
'http_other_status',
'rt',
'ups_req',
'ups_rt',
'ups_tries',
'http_200',
'http_206',
'http_302',
'http_304',
'http_403',
'http_404',
'http_416',
'http_499',
'http_500',
'http_502',
'http_503',
'http_504',
'http_508',
'http_other_detail_status',
'http_ups_4xx',
'http_ups_5xx'
]
res_dict_lst = [dict(zip(status, line.split(','))) for line in lines]
for res_dict in res_dict_lst:
dict_key_value_sum(res_dict)
def dict_key_value_sum(res_dict):
global status_counter
for k, v in res_dict.items():
status_counter[k] += int(v)
def get_info():
req_headers = {
'user-agent': 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1b2) Gecko/20060823 SeaMonkey/1.1',
'Host': 'zhxfei.com',
}
res = requests.get("http://localhost/monitor/nginx_status",
headers=req_headers)
if res.status_code == 200:
info_lines = [','.join(line.split(',')[2:]) for line in res.text.split('\n') if 'zhxfei' in line and
line.split(',')[1] == '10.144.89.245:80']
return info_lines
def get_hostname():
res_command = os.popen('hostname').read().strip()
return res_command if res_command else 'unknown'
def get_ip_port():
command_ip = "ip addr show eth0|grep 'inet '| awk '{print $2}'"
res_ip_addr = os.popen(command_ip).read().strip()
return {
'ip': res_ip_addr,
'port': 80
}
def main():
res = get_info()
if res:
ts = int(time.time())
payload_lst = []
proc_status(res)
#print status_counter
for k, v in status_counter.items():
payload = {
"endpoint": get_hostname(),
"metric": k,
"timestamp": ts,
"step": 60,
"value": v,
"counterType": "COUNTER",
"tags": "ip={ip} port={port}".format(**get_ip_port())
}
payload_lst.append(payload)
return payload_lst
payload = main()
r = requests.post("http://127.0.0.1:1988/v1/push", data=json.dumps(payload))
print r.text

latency采集

使用hk.zhxfei.comping qd.zhxfei.comsh.zhxfei.com这两台主机,获取到RTT采集之后上报数据

1
2
3
4
5
6
7
8
9
10
11
12
#!/bin/bash
#
IDC_HOST="qd.zhxfei.com sh.zhxfei.com"
ts=`date +%s`;
for host in ${IDC_HOST}
do
RTT_TIME=`ping -c 1 ${host}| tail -n 1| awk -F'/' '{print $5}'`
curl -X POST -d "[{\"metric\": \"net.latency.hk.zhxfei.com-${host}\", \"endpoint\": \"hk.zhxfei.com\", \"timestamp\": $ts,\"step\": 60,\"value\": ${RTT_TIME},\"counterType\": \"GAUGE\",\"tags\": \"idc=hk,project=latency_test\"}]" http://127.0.0.1:1988/v1/push
# echo "[{\"metric\": \"net.latency.hk.zhxfei.com-${host}\", \"endpoint\": \"hk.zhxfei.com\", \"timestamp\": $ts,\"step\": 60,\"value\": ${RTT_TIME},\"counterType\": \"GAUGE\",\"tags\": \"idc=hk,project=latency_test\"}]"
done

持续更新中…

坚持原创技术分享,您的支持将鼓励我继续创作!