diff --git a/README.md b/README.md new file mode 100644 index 0000000..3d23595 --- /dev/null +++ b/README.md @@ -0,0 +1,40 @@ +# Nagios Failover # +PHP script to help setting up Nagios with failover hosts. The whole idea of this +script is to run in a cronjob on the failover host. When the remote, default, +Nagios host goes down, the script will stop touching a "check file" to update +it's timestamp. If the remote Nagios is down for more than the predefined number +of seconds, the script activates the local Nagios notification. See the flow +chart in this directory for more information on how the scripts works. + +## Usage ## +Change the variables at the top of the script to suit your environment. Then +place the script in a crontab (root for example, or someone with Nagios +permissions) and let it run for example every fourth minute. If the $maxAge in +the script is 1200 seconds (20 minutes) and the script is being run every fourth +minute, then the check can fail 5 times (5x4=20) before activating the local +Nagios notifications. + +## Misc thoughts ## +Nowadays I only disable notifications on the failover hosts. I used to disable +both host checks, service checks and notifications. Over the years this has +turned out to generate a lot of false alarms. The false alarms are generate when +the failover host for example disables it's checks after being active when a +hosts being monitored where experiencing problems. The checks might than have +stopped in a soft state with lets say 3/4. When the failover host is then once +again activating it's checks, it's starts from this state (3/4) and if something +then, for example, a flaky internet connection makes it fail one more time it +has reached a hard state and an alarm is being sent. +Another problem might be if a host or service was flapping the last time the +failover Nagios was active and then had it's host and service checks disabled. +The next time the failover Nagios is activated, the host/service which was +flapping, is now still considered being in a flapping state, even tough this +might have been two years ago, and thus preventing any alarms until the flapping +state goes away. +For these reasons I now let the failover host run both host and service checks +even when no failover is necessary. +When the main Nagios then goes down, the failover Nagios has an up-to-date +accurate state of all the hosts and services being monitored. + +## Copyright ## +This was script was written by Jack-Benny Persson and is released under GNU GPL +version 2. diff --git a/failover.php b/failover.php index 15d979c..8b6a93c 100755 --- a/failover.php +++ b/failover.php @@ -22,7 +22,7 @@ Simple script to turn on/off Nagios notification/checks etc for use with Nagios failover hosts. Default is to turn on/off notifications. - Version 0.2 + Version 0.3 */ // Variables to set for your environment diff --git a/nagios-failover.png b/nagios-failover.png new file mode 100644 index 0000000..14bdb41 Binary files /dev/null and b/nagios-failover.png differ