Archive | June, 2017

Nagiosgraph for Postgresql replication

5 Jun

To track postgresql replication lag with nagios you need to create a plugin to track nagios replication. I initially tried to read the ‘slave_lag’ directly from Postgresql, but permissions etc were a pain – so i just created a cron that dumped it every 5 mins and this reads that… the command to read the lag from Postgresql itself is commented out:

#!/bin/bash

# too hard with permissions
#delay=$( sudo -u postgresql psql -h127.0.0.1 -p5433 -c "SELECT extract(epoch from now() - pg_last_xact_replay_timestamp()) AS slave_lag;" 2>/dev/null | tail -n 3 | head -n 1 | awk '{$1=$1};1')

delay=`tail -n 1 /tmp/postgres_lag.txt | awk '{print $4}'`
delay_int=`printf "%.0f" $delay`
output="Replication Delay: $delay seconds"

if [ "$delay_int" -le 300 ]
then
 echo "OK- $output"
 exit 0
elif [ $delay_int -le 2000 ]
then
 echo "WARNING- $output"
 exit 1
elif [ $delay_int -gt 2000 ]
then
 echo "CRITICAL- $output"
 exit 2
else
echo "UNKNOWN- $output"
exit 3
fi

You then need to edit your nagiosgraph ‘map’ file (called ‘map’) and add this:

# Replication delay
/output:.*eplication Delay: ([.\d]+)\sseconds/
and push @s, [ 'seconds',
 [ 'data', GAUGE, $1 ] ];