Archive | November, 2005

gdb

30 Nov

debugger

to run bacula-fd in the debugger:

$> gdb bacula-fd

at this point bacula-fd is not running, start it like:

$> run -s -f (where -s/-f are command line options)

set break points like:

$> break backup.c: 42 (where 42 is the line number)

Other commands:

‘s’ – step through code
‘p foo’ – print out variable ‘foo’
‘c’ – continue until the next break point

Advertisements

Bounce SSH using NetCat

30 Nov

You can bounce an SSH connection using Netcat (actually you can bounce anything).

e.g. ———– ———— ————–
| internal | —–> | firewall | ——–> | external |
———— ———— ————–

You want to get from internal to external, but have to go through a firewall or something.

On Internal:
– edit ~/.ssh/config and enter in:
Host external
Hostname external
HostKeyAlias external
ProxyCommand /root/nc-tunnel firewall external.domain

Then create the /root/nc-tunnel script:

#!/bin/sh
bouncehost=$1
target=$2
ssh $bouncehost nc -w 1 $target 22

Thats it, now to connect using this from internal just:
$> ssh external

Recovery

29 Nov

DRBD – What to do when something goes wrong

iisjsoo 6 Jan 04
#########################################
This document details the steps required when ONE machine goes down in a DRBD-Heartbeat failover
configuration.

1. A machine (either the Primary or Secondary) has died.

This situation requires immediate action.

If the Primary has died:
In a Heartbeat-failover situation, the Secondary Machine *should have* taken over the Primary Machine’s
services, in this scenario DRBD on the Secondary Machine should still be running and the shared partition
mounted on the Secondary Machine.

You should log into the Secondary Machine and manually check to see that all required services are running.
The services that should be running are documented in /etc/ha.d/haresources
– it is likely that these include (but check the haresources file to be sure):
1. httpd : Apache webserver
2. pks : Public Key Server – required by the EDI
3. named : Bind – the nameserver
4. cvsd : CVS server for external use
5. squid : Squid working as a Reverse Proxy Accelerator (for tunneling IMVS Webmail)
[NOTE: the mailserver, Qmail, runs regardless of Heartbeat, i.e. it is running on the Primary
and Secondary all the time]

It is important to realise that the shared partition (which these services require) could be mounted
*with or without* DRBD being loaded. If it has been mounted *with* DRBD then all data written to this
partition will also be sent to DRBD, hence DRBD knows that there have been changes to the data. However
if the partition was mounted *without* DRBD then DRBD knows nothing about any data written on the
parition by the Secondary Machine.

The real difference between mounting the parition with or without DRBD comes when the dead machine comes
back online. By mounting the partition without DRBD, you are effectively circumventing DRBD, this means
that DRBD’s metadata might be incorrect and indicate that the wrong machine has the good data. In this
event, the reconnection of the Primary Machine may lead to a sync *in the wrong direction* – destroying
your good data!!!

Now the good news – this is easy to avoid as long as you don’t mind a total resync of data (10gig takes
about 30 mins on a 100Mbps network). IN FACT, BY DOING A TOTAL RESYNC YOU CIRCUMVENT ALL PROBLEMS
SPECIFIED ABOVE (ALTHOUGH **IT IS IMPORTANT TO KNOW WHAT IS GOING ON**). Please see step (2).

If the Secondary has died:
No changes will need to be made – the Primary Machine should continue operating okay and will only need
to be stopped in order to begin the resync of data. Please see step (2).

2. Resyncing data
It is recommended that a full resync is done whenever a machine goes down for any length of time or for
hardware maintenance.

Partial resyncs *are* possible, however, a ‘clean slate’ approach is recommended.

To do this, you must delete ALL the metadata on the recovered machine BEFORE it comes back online
– this will force it to resync against the other machine. Note do this ONLY on the machine with the
bad data.

$> rm /var/lib/drbd/drbd[0-9]*

These files will be recreated with zerod contents, thus loosing election as a sync source but they will
become full sync targets (i.e. resyncs in the wrong direction are now impossible!).

Now you will need to restart DRBD on both machines in order to start the sync. Follow the directions
below. The name of the boxen are Good and Bad after the data they contain.

Good> # stop services
Good> umount /data
Good> drbd start
< # IFF you don’t have STONITH, you may start heartbeat here.
It cannot see the Bad node, and will do the following two steps on your behalf.
Good> datadisk start
Good> # start services if you want
# It probably won’t hurt to confirm that /proc/drbd says WFConnect (wait for connection)

Bad> # remove metadata, effectively setting all counters to zero
# This makes sure that it will lose any and all elections about
# “who has the good data”
Bad> rm /var/lib/drbd[0-9]*
Bad> drbd start

This now should connect and start to sync, and show you every now and then the progress of the sync.
(You can also: ‘cat /proc/drbd’ to see what is happening).

If a replication is NOT happening, then you can trigger a full-resync manually:
Bad> drbdsetup replicate

3. Useful tools

3a.
To find out what is going with DRBD:
$> /etc/init.d/drbd status

This will return either:

1. “drbd: /proc/drbd not found. Is DRBD in the kernel?”
DRBD is not loaded into the kernel.
You should not be running a machine in this state. If you have a partition mounted that is
meant to be under the control of DRBD this means that you have circumvented DRBD and mounted
the partition directly. You should unmount the partition, load DRBD into the kernel and then
remount the partition using the datadisk script:

$> unmount /data
$> /etc/init.d/drbd start
$> /etc/ha.d/resource.d/datadisk start

(Note the partitions should have been properly setup in /etc/drbd.conf)

2. “drbd0: stopped”
DRBD is loaded into the kernel but this machine is currently NOT connected to another machine.
This is the correct state a machine should be in when one machine is down.

3. “drbd0: running”
DRBD is loaded and the machine is currently in sync with another.
This is the correct state a machine should be in when both machines are functioning correctly.

3b.
To find out what is going with datadisk:
$> /etc/ha.d/resource.d/datadisk status

This will return either:

1. “datadisk: /proc/drbd not found. Try drbd start first”
This means that drbd is not loaded into the kernel. There is no way datadisk can do anything.

2. “drbd0: stopped”;
NOTE: this is confusing because it says ‘drbd0’ but is really referring to datadisk!
This means that datadisk is not running – hence your partition has not been mounted through
datadisk (if your partition is visible it means that you mounted it manually and circumvented
DRBD/datadisk – which is bad!)

3. “drbd0: running”
NOTE: this is confusing because it says ‘drbd0’ but is really referring to datadisk!
This means that DRBD is running and datadisk has successfully mounted a partition.

Hello world!

29 Nov

Moving to WebPress as Movable Type is too s l o w …