Added README and a flowchart
This commit is contained in:
parent
0bba5cdbce
commit
76af47e699
40
README.md
Normal file
40
README.md
Normal file
@ -0,0 +1,40 @@
|
||||
# Nagios Failover #
|
||||
PHP script to help setting up Nagios with failover hosts. The whole idea of this
|
||||
script is to run in a cronjob on the failover host. When the remote, default,
|
||||
Nagios host goes down, the script will stop touching a "check file" to update
|
||||
it's timestamp. If the remote Nagios is down for more than the predefined number
|
||||
of seconds, the script activates the local Nagios notification. See the flow
|
||||
chart in this directory for more information on how the scripts works.
|
||||
|
||||
## Usage ##
|
||||
Change the variables at the top of the script to suit your environment. Then
|
||||
place the script in a crontab (root for example, or someone with Nagios
|
||||
permissions) and let it run for example every fourth minute. If the $maxAge in
|
||||
the script is 1200 seconds (20 minutes) and the script is being run every fourth
|
||||
minute, then the check can fail 5 times (5x4=20) before activating the local
|
||||
Nagios notifications.
|
||||
|
||||
## Misc thoughts ##
|
||||
Nowadays I only disable notifications on the failover hosts. I used to disable
|
||||
both host checks, service checks and notifications. Over the years this has
|
||||
turned out to generate a lot of false alarms. The false alarms are generate when
|
||||
the failover host for example disables it's checks after being active when a
|
||||
hosts being monitored where experiencing problems. The checks might than have
|
||||
stopped in a soft state with lets say 3/4. When the failover host is then once
|
||||
again activating it's checks, it's starts from this state (3/4) and if something
|
||||
then, for example, a flaky internet connection makes it fail one more time it
|
||||
has reached a hard state and an alarm is being sent.
|
||||
Another problem might be if a host or service was flapping the last time the
|
||||
failover Nagios was active and then had it's host and service checks disabled.
|
||||
The next time the failover Nagios is activated, the host/service which was
|
||||
flapping, is now still considered being in a flapping state, even tough this
|
||||
might have been two years ago, and thus preventing any alarms until the flapping
|
||||
state goes away.
|
||||
For these reasons I now let the failover host run both host and service checks
|
||||
even when no failover is necessary.
|
||||
When the main Nagios then goes down, the failover Nagios has an up-to-date
|
||||
accurate state of all the hosts and services being monitored.
|
||||
|
||||
## Copyright ##
|
||||
This was script was written by Jack-Benny Persson and is released under GNU GPL
|
||||
version 2.
|
@ -22,7 +22,7 @@
|
||||
Simple script to turn on/off Nagios notification/checks etc for use with
|
||||
Nagios failover hosts.
|
||||
Default is to turn on/off notifications.
|
||||
Version 0.2
|
||||
Version 0.3
|
||||
*/
|
||||
|
||||
// Variables to set for your environment
|
||||
|
BIN
nagios-failover.png
Normal file
BIN
nagios-failover.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 31 KiB |
Loading…
x
Reference in New Issue
Block a user