Archive | Monitoring RSS feed for this section

Nagiosgraph for Postgresql replication

5 Jun

To track postgresql replication lag with nagios you need to create a plugin to track nagios replication. I initially tried to read the ‘slave_lag’ directly from Postgresql, but permissions etc were a pain – so i just created a cron that dumped it every 5 mins and this reads that… the command to read the lag from Postgresql itself is commented out:

#!/bin/bash

# too hard with permissions
#delay=$( sudo -u postgresql psql -h127.0.0.1 -p5433 -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp()) AS slave_lag;" 2>/dev/null | tail -n 3 | head -n 1 | awk '{$1=$1};1')

delay=`tail -n 1 /tmp/postgres_lag.txt | awk '{print $4}'`
delay_int=`printf "%.0f" $delay`
output="Replication Delay: $delay seconds"

if [ "$delay_int" -le 300 ]
then
 echo "OK- $output"
 exit 0
elif [ $delay_int -le 2000 ]
then
 echo "WARNING- $output"
 exit 1
elif [ $delay_int -gt 2000 ]
then
 echo "CRITICAL- $output"
 exit 2
else
echo "UNKNOWN- $output"
exit 3
fi

You then need to edit your nagiosgraph ‘map’ file (called ‘map’) and add this:

# Replication delay
/output:.*eplication Delay: ([.\d]+)\sseconds/
and push @s, [ 'seconds',
 [ 'data', GAUGE, $1 ] ];
Advertisements

Tripwire

8 Nov

I found this old howto i wrote somewhere, thought i’d add it.
================================================================================

INSTALL TRIPWIRE
^^^^^^^^^^^^^^^^
[if re-installing, you’ll need to delete /etc/site.key]

$> cd /etc/tripwire
$> ./twinstall.sh // will install
tripwire

$> /usr/sbin/twadmin –create-polfile twpol.txt // will create a
policy files

[you can edit the twpol.txt policy file now, or wait until after the
next step
so you can see what is wrong with it]

$> /usr/sbin/tripwire –init // initialise the
policy file – this
// will show any
errors etc

[you should probably delete the twpol.txt file now – you can always
recreate it from
the encoded db as long as you know your password]

UPDATE POLICY
^^^^^^^^^^^^^
If the twpol.txt file does not exist, recreate it:

$> /usr/sbin/twadmin –print-polfile > /etc/tripwire/twpol.txt //
create readable policy file from encoded db

now edit twpol.txt to your liking

then create new database:

$> /usr/sbin/twadmin –create-polfile -S site.key /etc/tripwire/twpol.txt

then delete the old encoded db:

$> rm /var/lib/tripwire/imvs$.twd

recreate the encoded database from the new twpol.txt file

$> /usr/sbin/tripwire –init // recreate
encoded db

(To make sure changes took effect, run tripwire again –
/usr/sbin/tripwire –check)

RUN TRIPWIRE
^^^^^^^^^^^^
Run tripwire

$> /usr/sbin/tripwire –check

UPGRADE POLICY (Required if tripwire caught anything)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If changes have been found, you can update your policy db in two ways:

$> /usr/sbin/tripwire –update –twrfile /var/lib/tripwire

or you can run a check interactively

$> /usr/sbin/tripwire –check –interactive

Monitoring DRBD using Nagios and SNMP

9 Oct

Wrote a Naguis script to monitor DRBD using SNMP (I don’t really understand why Nagios made up its own plugin system when you can just use SNMP??), Anyway, to make this work you’ll need to:

1. Add a new ‘Check Command’ into Nagios’ checkcommands.cfg, something like: (the check_smpd_drbd.pl script is below)


define command{
command_name check_snmp_drbd
command_line $USER1$/check_snmp_drbd.pl -h $HOSTADDRESS$
}

2. Add a new ‘service’ definition to your services/**.cfg, something like:


define service{
use generic-service
host_name host_to_monitor
service_description DISK STATUS
is_volatile 0
check_period workhours
max_check_attempts 10
normal_check_interval 5
retry_check_interval 2
contact_groups infoservices-admins
notification_interval 120
notification_period 24x7
notification_options c,r
check_command check_snmp_drbd
}

3. On the target machine, you’ll need to make sure your snmpd daemon is sending you what you want, for Net-SNMP i just changed /etc/snmpd/snmpd.conf appending:


exec drbd_data /sbin/drbdadm state data
exec drbd_home /sbin/drbdadm state home
exec drbd_share /sbin/drbdadm state share

4. Make sure your firewall on the target machine allows snmpd through, you might want to put snmpd in your startup scripts to

5. The script to actually do the monitoring from the Nagios machine is:


#!/usr/bin/perl -w
use strict;
use Getopt::Long;

my %ERRORS=('OK'=>0,'WARNING'=>1,'CRITICAL'=>2,'UNKNOWN'=>3,'DEPENDENT'=>4);
my ($hostname, $snmp_resources, $snmp_values);
my (@resources, @values);
my (@tmp, @tmp2, $tmp3);
my ($key, $value);
my %status = ();
my $i;
my $x;
my $error;

# Get the command line options
# only "-h Hostname"
Getopt::Long::Configure ("bundling");
GetOptions(
'h=s' => \$hostname);

# Grab the snmp details - note this should
# probably use Net::SNMP
$snmp_resources=`/usr/bin/snmpwalk -v 1 -c snmponly $hostname 1.3.6.1.4.1.2021.8.1.2 2>/dev/null`;
$snmp_values=`/usr/bin/snmpwalk -v 1 -c snmponly $hostname 1.3.6.1.4.1.2021.8.1.101 2>/dev/null`;

# Didn't get any output - error
if ($snmp_resources eq "" )
{
print "Unknown host: $hostname\n";
exit $ERRORS{"CRITICAL"};
}

@resources = split(/\n/,$snmp_resources);
@values = split(/\n/,$snmp_values);
for ($i=0;$i< $#resources+1;$i++)
{
@tmp = split(/\s+/,$resources[$i]);
@tmp2 = split(/\s+/,$values[$i]);
$tmp3 = $values[$i];
$tmp3 =~ s/UCD-SNMP-MIB::extOutput..* = STRING: //g;
$status{$tmp[3]} = $tmp3;
}

# Check for "Primary/Secondary" or "Secondary/Primary"
while(($key, $value) = each(%status))
{
if (!($value eq "Primary/Secondary") && !($value eq "Secondary/Primary"))
{
print "ERROR: $key says: $value\n";
$error = 1;
}
}

# Send out status
if ($error)
{
exit $ERRORS{"CRITICAL"};
} else {
print "DRBD OK: ";
foreach $key (keys %status)
{
print $key . ",";
}
print "\n";
exit $ERRORS{"OK"};
}