安装前的准备
3台装有rhel6.2x64系统的机器,其中一台作为服务端(192.168.5.203),另两台为被监控端(192.168.5.204装有http服务并打开服务和192.168.5.206装有mysql服务并打开服务)
注:192.168.5.204监控http服务,192.168.5.206监控mysql服务
服务端要用的安装包:nagios-3.2.3.tar.gz
nagios-plugins-1.4.14.tar.gz
httpd-2.2.23.tar.bz2
php-5.4.10.tar.gz
nrpe-2.12.tar.gz
下载地址:http://pan.baidu.com/s/1c0lHEH6
两个客户端要使用的安装包:nagios-plugins-1.4.14.tar.gz
nrpe-2.12.tar.gz
在服务端:
1)创建nagios用户和用户组
[root@Nagios-Server ~]# pwd
/root
[root@Nagios-Server ~]# useradd -s /sbin/nologin nagios
[root@Nagios-Server ~]# mkdir /usr/local/nagios
[root@Nagios-Server ~]# chown -R nagios.nagios /usr/local/nagios/
2)开始系统的sendmail服务
[root@Nagios-Server ~]# /etc/init.d/sendmail start
只需开启sendmail服务,无需配置
2.编译安装
[root@Nagios-Server ~]# tar zxvf nagios-3.2.3.tar.gz
[root@Nagios-Server ~]# cd nagios-3.2.3
[root@Nagios-Server nagios-3.2.3]# ./configure --prefix=/usr/local/nagios
[root@Nagios-Server nagios-3.2.3]# make all
[root@Nagios-Server nagios-3.2.3]# make install
[root@Nagios-Server nagios-3.2.3]# make install-init
[root@Nagios-Server nagios-3.2.3]# make install-commandmode
[root@Nagios-Server nagios-3.2.3]# make install-config
[root@Nagios-Server nagios-3.2.3]# chkconfig --add nagios
[root@Nagios-Server nagios-3.2.3]# chkconfig --level 35 nagios on
#echo "/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg">>/etc/rc.local
3.安装nagios插件
[root@Nagios-Server ~]# tar nagios-plugins-1.4.14.tar.gz
[root@Nagios-Server ~]# cd nagios-plugins-1.4.14
[root@Nagios-Server nagios-plugins-1.4.14]# ./configure --prefix=/usr/local/nagios
[root@Nagios-Server nagios-plugins-1.4.14]# make
[root@Nagios-Server nagios-plugins-1.4.14]# make install
4.安装Apache和php
[root@Nagios-Server ~]# tar jxvf httpd-2.2.23.tar.bz2
[root@Nagios-Server ~]# cd httpd-2.2.23
[root@Nagios-Server httpd-2.2.23]# ./configure --prefix=/usr/local/apache2
[root@Nagios-Server httpd-2.2.23]# make &&make install
[root@Nagios-Server ~]# tar zxvf php-5.4.10.tar.gz
[root@Nagios-Server ~]# cd php-5.4.10
[root@Nagios-Server php-5.4.10]# ./configure --prefix=/usr/local/php \
> --with-gd --with-zlib --with-apxs2=/usr/local/apache2/bin/apxs
[root@Nagios-Server php-5.4.10]# make && make install
配置Apache
1)首先在/usr/local/apache2/conf/httpd.conf 中修改apache进程的启动用户为nagios
修改为:(大概在第67行)
User nagios
Group nagios
2)然后找到 DirectoryIndex(大概在168行 )
<IfModule dir_module>
DirectoryIndex index.html index.php
</IfModule>
3)增加如下内容(大概在311行增加)
AddType application/x-httpd-php .php
4)授权访问nagios的web监控界面,需要增加验证配置,在http.conf文件的最后添加如下信息:
ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
<Directory "/usr/local/nagios/sbin">
AuthType Basic
Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
Alias /nagios "/usr/local/nagios/share"
<Directory "/usr/local/nagios/share">
AuthType Basic
Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "nagios Access"
AuthUserFile /usr/local/nagios/etc/htpasswd
Require valid-user
</Directory>
5)创建Apache目录验证文件htpasswd (用户名和密码任意,本次使用ixdba)
[root@Nagios-Server ~]# /usr/local/apache2/bin/htpasswd \
> -c /usr/local/nagios/etc/htpasswd ixdba
New password:
Re-type new password:
Adding password for user nagios
6)启动apache服务
[root@Nagios-Server ~]# /usr/local/apache2/bin/apachectl start
#echo "/usr/local/apache2/bin/apachectl start" >>/etc/rc.local
3.在服务端(192.168.5.203)安装NRPE外部构件监控远程主机
[root@Nagios-Server ~]# tar zxvf nrpe-2.12.tar.gz
[root@Nagios-Server ~]# cd nrpe-2.12
[root@Nagios-Server nrpe-2.12]# make all
[root@Nagios-Server nrpe-2.12]# make install-plugin
#echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local
4.在两台被监控端安装nagios客户端和NRPE
1)在被监控机上(192.168.5.204)安装nagios-plugins
[root@localhost ~]# useradd -s /sbin/nologin nagios
[root@localhost ~]# tar zxvf nagios-plugins-1.4.14.tar.gz
root@localhost ~]# cd nagios-plugins-1.4.14
[root@localhost nagios-plugins-1.4.14]# ./configure
[root@localhost nagios-plugins-1.4.14]# make
[root@localhost nagios-plugins-1.4.14]# make install
[root@localhost nagios-plugins-1.4.14]# chown nagios.nagios /usr/local/nagios/
[root@localhost nagios-plugins-1.4.14]# chown -R nagios.nagios /usr/local/nagios/libexec/
2)在被监控机上(192.168.5.204)安装nrpe
[root@localhost ~]# tar zxvf nrpe-2.12.tar.gz
[root@localhost ~]# cd nrpe-2.12
[root@localhost nrpe-2.12]# ./configure
[root@localhost nrpe-2.12]# make all
[root@localhost nrpe-2.12]# make install-plugin
[root@localhost nrpe-2.12]# make install-daemon
[root@localhost nrpe-2.12]# make install-daemon-config
#echo "/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d">>/etc/rc.local
注:在192.168.5.206 重复1)2)
3)在被监控机上(192.168.5.204)修改 /usr/local/nagios/etc/nrpe.cfg 中(79行)修改为
allowed_hosts=127.0.0.1,192.168.5.203 (中间有个逗号,不要有空格)
并启动nrpe进程,如下表示启动成功,默认端口号5666
[root@Nagios-Linux ]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
[root@localhost nrpe-2.12]# ps -ef | grep nrpe
nagios 21885 1 0 Sep09 ? 00:00:08 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
[root@Nagios-Server nrpe-2.12]# netstat -tunl | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
4)在服务端(192.168.5.203)上测试与客户端能否正常通信,执行命令如下,出现版本号表明,服务端可以与客户端正常通信。
[root@Nagios-Server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.204
NRPE v2.12
[root@Nagios-Server nrpe-2.12]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.206
NRPE v2.12
5)在服务端(192.168.5.203)定义一个check_nrpe监控命令
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/commands.cfg
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
6)在被监控机(192.168.5.204)上定义新增加监控服务器内容
用/usr/local/nagios/libexec/check_tcp 这个命令脚本, -p 80 端口,10是端口超时时间秒(204行)
[root@localhost ~]# vim /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_tcp80]=/usr/local/nagios/libexec/check_tcp -p 80 10
注:每次修改nrpe.cfg后,都要重启nrpe进程才能生效:杀死进程,再启动进程
[root@Nagios-Linux nagios]# ps -ef | grep nrpe
nagios 6508 1 0 09:32 ? 00:00:00 /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
[root@Nagios-Linux nagios]# kill 6508
[root@Nagios-Linux nagios]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
在被监控机(192.168.5.206)上定义check_tcp3306是命令名称,使用/usr/local/nagios/libexec/check_tcp 这个命令脚本,-p 3306端口,10 是端口超时时间秒(204行)
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_tcp3306]=/usr/local/nagios/libexec/check_tcp -p 3306 5
7)在服务端(192.168.5.203)进行命令测试是否能够检测到,出现TCP OK表明正确
[root@Nagios-Server etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.204 -c check_tcp80
TCP OK - 0.000 second response time on port 80|time=0.000421s;;;0.000000;10.000000
[root@Nagios-Server etc]# /usr/local/nagios/libexec/check_nrpe -H 192.168.5.206 -c check_tcp3306
TCP OK - 0.000 second response time on port 3306|time=0.000431s;;;0.000000;10.000000
4.在服务端(192.168.5.203)添加被监控主机和监控服务
1)templates.cfg (默认定义,无需编辑)
位置 /usr/local/nagios/etc/objects/templates.cfg
2)resource.cfg(只有一行,大概是第26行,默认是下面这一行)
#vim /usr/local/nagios/etc/resource.cfg
$USER1$=/usr/local/nagios/libexec
3)commands.cfg(已在上面定义了check_nrpe的命令,无需再编辑)
4)host.cfg(默认没有,需要手动创建,此文件定义监控主机的名字和IP,注意不要忘记上下大括号)
[root@Nagios-Server objects]# pwd
/usr/local/nagios/etc/objects
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/hosts.cfg
define host{
use linux-server ;默认写linux-server, 在templates.cfg中默认定义
host_name web ;这个主机名可以任意命名
alias ixdba-web ;别名任意命名
address 192.168.5.204 ;被监控机地址
}
define host{
use linux-server
host_name mysql
alias ixdba-mysql
address 192.168.5.206
}
define hostgroup{ ;定义主机组
hostgroup_name sa-server ;主机组名称任意命名
alias sa server ;主机别名
members web,mysql ;上面定义的两个主机
}
5)services.cfg(默认没有,需手动创建,此文件用来定义被监控主机的服务)
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/services.cfg
define service{
use local-service ;使用默认local-service,已在templates.cfg中默认定义
host_name web ;web主机,即192.168.5.204,已在hosts.cfg中定义
service_description PING ;监控内容描述,名称意思接近服务即可,任意
check_command check_ping!100.0,20%!500.0,60%
} ;使用服务端的chek_ping 此命令组合从左到右一次为命令!告警时延,丢包率!严重告警时延,丢包率
define service{
use local-service ;使用默认local-service,已在templates.cfg中默认定义
host_name web ;web主机,即192.168.5.204,已在hosts.cfg中定义
service_description web80 ;监控内容描述,名称意思接近服务即可,任意
check_command check_nrpe!check_tcp80 ;命令已在被监控机nrpe.cfg中定义
}
define service{
use local-service
host_name mysql
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service
host_name mysql
service_description mysql3306
check_command check_nrpe!check_tcp3306
}
define servicegroup{ ;定义服务组,不是重点
servicegroup_name servergroup
alias server-group
members web,PING,web,web80,mysql,PING,mysql,mysql3306
}
~
6)contacts.cfg(定义联系人和联系人组)
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/contacts.cfg
define contact{
contact_name nagiosadmin ; 联系人名称,使用默认即可 use generic-contact ; 使用generic-contact的属性信息,已在templates.cfg中定义
alias Nagios Admin ; Full name of user
email 15901392876@139.com ; 邮箱(建议移动,设置短信提醒)
}
define contactgroup{
contactgroup_name admins ;联系人组名称 ;使用默认
alias Nagios Administrators
members nagiosadmin
}
7)timeperiods.cfg(定义监控时间段,已默认定义,无需改动)
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/timeperiods.cfg
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}
8)cgi.cfg(此文件用来控制相关CGI脚本,只需在此文件添加用户的执行权限)
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/cgi.cfg
default_user_name=ixdba
authorized_for_system_information=ixdba
authorized_for_configuration_information=ixdba
authorized_for_system_commands=ixdba
authorized_for_all_services=ixdba
authorized_for_all_hosts=ixdba
authorized_for_all_service_commands=ixdba
authorized_for_all_host_commands=ixdba
9)nagios.cfg(nagios的核心配置文件)
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
cfg_file=/usr/local/nagios/etc/objects/hosts.cfg(添加)
cfg_file=/usr/local/nagios/etc/objects/services.cfg (添加)
use_authentication=1 #0改成1,大概78行
5.验证nagios配置文件的正确性
[root@Nagios-Server ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
根据提示在错误在哪个文件的第几行有错误,而适当修改,(配置正确提示 警告0,错误0)
Checking for circular paths between hosts...
Checking for circular host and service dependencies...
Checking global event handlers...
Checking obsessive compulsive processor commands...
Checking misc settings...
Total Warnings: 0
Total Errors: 0
[root@Nagios-Server ~]# service nagios start
6.登录监控界面 http://192.168.5.203/nagios 输入用户名ixdba和密码
点击 Services会看到监控服务,其中 localhost是默认监控本地的服务,会看到mysql(192.168.5.206)和web(192.168.5.204)的监控服务。
7.模拟web的http程序异常,等待出现报警
#service httpd stop
并有报警邮件和短信提醒
8.模拟恢复web
#service httpd start
并有恢复邮件通知和短信提醒
注:虽然已经实现了服务的监控、报警、和报警邮件、短信。但是发现从web故障发(11:29)生到报警时间(12:01),30分钟时间。这时间是不能忍的
所以还要对nagios做一些检查的优化。
在templates.cfg文件中修改
[root@Nagios-Server ~]# vim /usr/local/nagios/etc/objects/templates.cfg
72 check_interval 是对主机的检查时间间隔,改成1(单位分钟)
73 retry_interval 是重试检查时间间隔,改成1(单位分钟)
74 max_check_attempts 是对主机的最大检查次数,改成1次
76 notification_period 故障时发送通知的时间范围,改成24x7
169 max_check_attempts 对服务的最大检查次数,改成 2 (分钟)
170 normal_check_interval 对服务检查时间间隔,改成 1 (分钟)
171 retry_check_interval 重试检查时间间隔 改成1 (分钟)
185 max_check_attempts 对服务的最大检查次数 改成 2(分钟)
186 normal_check_interval 对服务检查时间间隔改成1(分钟)
187 retry_check_interval 重试检查时间间隔 改成1 (分钟)
9.再模拟一次故障,报警时间就快很多。