To ensure that the cluster software has been correctly configured, use the following tools located in the /usr/sbin directory:
Test the shared partitions and ensure that they are accessible.
Invoke the /usr/sbin/shutil utility with the -v option to test the accessibility of the shared partitions. See Section 3.11.1 Testing the Shared Partitions for more information.
Test the operation of the power switches.
If power switches are used in the cluster hardware configuration, run the clufence command on each member to ensure that it can remotely power-cycle the other member. Do not run this command while the cluster software is running. See Section 3.11.2 Testing the Power Switches for more information.
Ensure that all members are running the same software version.
Invoke the rpm -q clumanager command and rpm -q redhat-config-cluster command on each member to display the revision of the installed cluster software RPMs.
The following section explains the cluster utilities in further detail.
The shared partitions must refer to the same physical device on all members. Invoke the /usr/sbin/shutil utility with the -v command to test the shared partitions and verify that they are accessible.
If the command succeeds, run the /usr/sbin/shutil -p /cluster/header command on all members to display a summary of the header data structure for the shared partitions. If the output is different on the members, the shared partitions do not point to the same devices on all members. Check to make sure that the raw devices exist and are correctly specified in the /etc/sysconfig/rawdevices file. See Section 2.4.4.3 Configuring Shared Cluster Partitions for more information.
The following example shows that the shared partitions refer to the same physical device on cluster members clu1.example.com and clu2.example.com via the /usr/sbin/shutil -p /cluster/header command:
/cluster/header is 140 bytes long SharedStateHeader { ss_magic = 0x39119fcd ss_timestamp = 0x000000003ecbc215 (14:14:45 May 21 2003) ss_updateHost = clu1.example.com |
All fields in the output from the /usr/sbin/shutil -p /cluster/header command should be the same when run on all cluster members. If the output is not the same on all members, perform the following:
Examine the /etc/sysconfig/rawdevices file on each member and ensure that the raw character devices and block devices for the primary and backup shared partitions have been accurately specified. If they are not the same, edit the file and correct any mistakes. Then re-run the Cluster Configuration Tool. See Section 3.5 Editing the rawdevices File for more information.
Ensure that you have created the raw devices for the shared partitions on each member. See Section 2.4.4.3 Configuring Shared Cluster Partitions for more information.
To determine the bus configuration on each member, examine the system startup messages by running dmesg |less to the point where the system probes the SCSI subsystem. Verify that all members identify the same shared storage devices and assign them the same name.
Verify that a member is not attempting to mount a file system on the shared partition. For example, make sure that the actual device (for example, /dev/sdb1) is not included in an /etc/fstab file.
After performing these tasks, re-run the /usr/sbin/shutil utility with the -p option.
If either network-attached or serial-attached power switches are employed in the cluster hardware configuration, install the cluster software and invoke the clufence command to test the power switches. Invoke the command on each member to ensure that it can remotely power-cycle the other member. If testing is successful, then the cluster can be started.
The clufence command can accurately test a power switch only if the cluster software is not running. This is due to the fact that for serial attached switches, only one program at a time can access the serial port that connects a power switch to a member. When the clufence command is invoked, it checks the status of the cluster software. If the cluster software is running, the command exits with a message to stop the cluster software.
The clufence command line options are as follows:
-d — Turn on debugging
-f — Fence (power off) member
-u — Unfence (power on) member
-r — Reboot (power cycle) member
-s — Check status of all switches controlling member
When testing power switches, the first step is to ensure that each cluster member can successfully communicate with its attached power switch. The following output of the clufence command shows that the cluster member is able to communicate with its power switch:
[27734] info: STONITH: rps10 at /dev/ttyS0, port 0 controls clumember1.example.com [27734] info: STONITH: rps10 at /dev/ttyS0, port 1 controls clumember2.example.com |
In the event of an error in the clufence output, check the following:
For serial attached power switches:
Verify that the device special file for the remote power switch connection serial port (for example, /dev/ttyS0) is specified correctly in the cluster configuration file; in the Cluster Configuration Tool, display the Power Controller dialog box to check the serial port value. If necessary, use a terminal emulation package such as minicom to test if the cluster member can access the serial port.
Ensure that a non-cluster program (for example, a getty program) is not using the serial port for the remote power switch connection. You can use the lsof command to perform this task.
Check that the cable connection to the remote power switch is correct. Verify that the correct type of cable is used (for example, an RPS-10 power switch requires a null modem cable), and that all connections are securely fastened.
Verify that any physical dip switches or rotary switches on the power switch are set properly.
For network based power switches:
Verify that the network connection to network-based power switches is operational. Most switches have a link light that indicates connectivity.
It should be possible to ping the network power switch; if not, then the switch may not be properly configured for its network parameters.
Verify that the correct password and login name (depending on switch type) have been specified in the cluster configuration file (as established by running the Cluster Configuration Tool and viewing the properties specified in the Power Controller dialog box). A useful diagnostic approach is to verify Telnet access to the network switch using the same parameters as specified in the cluster configuration.
After successfully verifying communication with the switch, attempt to power cycle the other cluster member. Prior to doing this, we recommend you verify that the other cluster member is not actively performing any important functions (such as serving cluster services to active clients). Running the command clufence -f clumember2.example.com displays the following output upon a successful shutdown and fencing operation (which means that the system does not receive power from the power switch until the system has been unfenced):
[7397] info: STONITH: rps10 at /dev/ttyS0, port 0 controls clumember1.example.com [7397] info: STONITH: rps10 at /dev/ttyS0, port 1 controls clumember2.example.com [7397] notice: STONITH: clumember2.example.com has been fenced! |
Ensure that all members in the cluster are running the same version of the Red Hat Cluster Manager software.
To display the version of the Cluster Configuration Tool and the Cluster Status Tool, use either of the following methods:
Choose Help => About. The About dialog includes the version numbers.
Invoke the following commands:
rpm -q redhat-config-cluster rpm -q clumanager |
The version of the clumanager package can also be determined by invoking the clustat -v command.