Wednesday 24 June 2015

RAC Health Check OR Cluster Health Check - Oracle 11GR2 RAC




RAC Health Check OR Cluster Health Check - Oracle 11GR2 RAC



This document covers RAC Health Check or Cluster Health Check in Oracle 11GR2 RAC with common errors  faced and resolution applied to them:



Switch to Oracle user:

[H21401113@a1212testingracdb1d ~]$ su - oracle
Password:

First Check what all database instances are availble on database server:

[oracle@a1212testingracdb1d ~]$ ps -ef|grep pmon
grid     32509     1  0 Jun07 ?        00:09:20 asm_pmon_+ASM1
oracle   37038 36979  0 12:09 pts/4    00:00:00 grep pmon
oracle   42969     1  0 Jun07 ?        00:11:53 ora_pmon_RACTESTDB1
oracle   43326     1  0 Jun07 ?        00:53:52 ora_pmon_REMARKSDBPRD1
oracle   43743     1  0 Jun07 ?        00:12:01 ora_pmon_SALPRDDB1


Switch to Grid user:

[oracle@a1212testingracdb1d ~]$ su - grid
Password:



Go to GRID_HOME/bin directory:


[grid@a1212testingracdb1d ~]$ cd $GRID_HOME/bin




Check all cluster resources from a top level view(Not Detailed):


Use Command: ./crsctl check cluster -all


[grid@a1212testingracdb1d bin]$ ./crsctl check cluster -all
**************************************************************
a1212testingracdb1d:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
a1212testingracdb2d:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************





Check Statistics of all cluster resources:

Use Command:./crsctl stat res -t


[grid@a1212testingracdb1d bin]$ ./crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.CARDONE.dg
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
ora.FRA_CARDONE.dg
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
ora.LISTENER.lsnr
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
ora.OCRDATA.dg
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
ora.asm
               ONLINE  ONLINE       a1212testingracdb1d          Started
               ONLINE  ONLINE       a1212testingracdb2d          Started
ora.gsd
               OFFLINE OFFLINE      a1212testingracdb1d
               OFFLINE OFFLINE      a1212testingracdb2d
ora.net1.network
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
ora.ons
               ONLINE  ONLINE       a1212testingracdb1d
               ONLINE  ONLINE       a1212testingracdb2d
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       a1212testingracdb1d
ora.LISTENER_SCAN2.lsnr
      1        ONLINE  ONLINE       a1212testingracdb2d
ora.LISTENER_SCAN3.lsnr
      1        ONLINE  ONLINE       a1212testingracdb2d
ora.c1proddb.db
      1        ONLINE  ONLINE       a1212testingracdb1d          Open
      2        ONLINE  ONLINE       a1212testingracdb2d          Open
ora.cvu
      1        ONLINE  ONLINE       a1212testingracdb2d
ora.a1212testingracdb1d.vip
      1        ONLINE  ONLINE       a1212testingracdb1d
ora.a1212testingracdb2d.vip
      1        ONLINE  ONLINE       a1212testingracdb2d
ora.oc4j
      1        ONLINE  ONLINE       a1212testingracdb1d
ora.scan1.vip
      1        ONLINE  ONLINE       a1212testingracdb1d
ora.scan2.vip
      1        ONLINE  ONLINE       a1212testingracdb2d
ora.scan3.vip
      1        ONLINE  ONLINE       a1212testingracdb2d





Please Note:


If ora.gsd is offline:

ora.gsd OFFLINE OFFLINE
ora.labsx86-1.gsd OFFLINE OFFLINE
ora.labsx86-2.gsd OFFLINE OFFLINE

One may only need to enable this, if you are running Oracle 9i RAC in the cluster. Otherwise nothing to worry.



If ora.cvu is offline:

In you don't want use cluster verify utility then it's fine. Otherwise enable it if you want to use cluvfy utility.


Check if all cluster nodes are online and visible to all nodes:


Run following command from each cluster node:

Use Command:./olsnodes

[grid@a1212testingracdb1d bin]$ ./olsnodes
a1212testingracdb1d
a1212testingracdb2d


CLUVFY:

If you doubt that you are facing cluster configuration related issue you can check your RAC configuration settings using following command:

Use Command: cluvfy stage -post crsinst -n a1212testingracdb1d,a1212testingracdb2d



[grid@a1212testingracdb1d bin]$ cluvfy stage -post crsinst -n a1212testingracdb1d,a1212testingracdb2d
Performing post-checks for cluster services setup
Checking node reachability...
Node reachability check passed from node "a1212testingracdb1d"

Checking user equivalence...
User equivalence check passed for user "grid"
Checking node connectivity...
Checking hosts config file...
Verification of the hosts config file successful
Check: Node connectivity for interface "bond0"
Node connectivity passed for interface "bond0"
TCP connectivity check passed for subnet "3.143.50.0"

Check: Node connectivity for interface "bond1"
Node connectivity passed for interface "bond1"
TCP connectivity check passed for subnet "194.168.7.0"
Checking subnet mask consistency...
Subnet mask consistency check passed for subnet "3.143.50.0".
Subnet mask consistency check passed for subnet "194.168.7.0".
Subnet mask consistency check passed.
Node connectivity check passed
Checking multicast communication...
Checking subnet "1.***.50.0" for multicast communication with multicast group "200.0.1.0"...
Check of subnet "1.***.50.0" for multicast communication with multicast group "200.0.1.0" passed.
Checking subnet "111.***.7.0" for multicast communication with multicast group "200.0.1.0"...
Check of subnet "111.***.7.0" for multicast communication with multicast group "200.0.1.0" passed.
Check of multicast communication passed.
Time zone consistency check passed
Checking Oracle Cluster Voting Disk configuration...
ASM Running check passed. ASM is running on all specified nodes
Oracle Cluster Voting Disk configuration check passed
Checking Cluster manager integrity...

Checking CSS daemon...
Oracle Cluster Synchronization Services appear to be online.
Cluster manager integrity check passed

UDev attributes check for OCR locations started...
UDev attributes check passed for OCR locations

UDev attributes check for Voting Disk locations started...
UDev attributes check passed for Voting Disk locations
Default user file creation mask check passed
Checking cluster integrity...

Cluster integrity check passed

Checking OCR integrity...
Checking the absence of a non-clustered configuration...
All nodes free of non-clustered, local-only configurations

ASM Running check passed. ASM is running on all specified nodes
Checking OCR config file "/etc/oracle/ocr.loc"...
OCR config file "/etc/oracle/ocr.loc" check successful

Disk group for ocr location "+OCRDATA" available on all the nodes

NOTE:
This check does not verify the integrity of the OCR contents. Execute 'ocrcheck' as a privileged user to verify the contents of OCR.
OCR integrity check passed
Checking CRS integrity...
Clusterware version consistency passed
CRS integrity check passed
Checking node application existence...
Checking existence of VIP node application (required)
VIP node application check passed
Checking existence of NETWORK node application (required)
NETWORK node application check passed
Checking existence of GSD node application (optional)
GSD node application is offline on nodes "a1212testingracdb1d,a1212testingracdb2d"
Checking existence of ONS node application (optional)
ONS node application check passed

Checking Single Client Access Name (SCAN)...
Checking TCP connectivity to SCAN Listeners...
TCP connectivity to SCAN Listeners exists on all cluster nodes
Checking name resolution setup for "cardonedb-scan.r5.money.ge.com"...
Verification of SCAN VIP and Listener setup passed
Checking OLR integrity...
Checking OLR config file...
OLR config file check successful

Checking OLR file attributes...
OLR file check successful

WARNING:
This check does not verify the integrity of the OLR contents. Execute 'ocrcheck -local' as a privileged user to verify the contents of OLR.
OLR integrity check passed
User "grid" is not part of "root" group. Check passed
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
CTSS resource check passed

Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed
Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP

Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
NTP Configuration file check passed
Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes:
        a1212testingracdb1d,a1212testingracdb2d
PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
PRVF-5415 : Check to see if NTP daemon or service is running failed
Clock synchronization check using Network Time Protocol(NTP) failed

PRVF-9652 : Cluster Time Synchronization Services check failed
Checking VIP configuration.
Checking VIP Subnet configuration.
Check for VIP Subnet configuration passed.
Checking VIP reachability
Check for VIP reachability passed.
Post-check for cluster services setup was unsuccessful.
Checks did not pass for the following node(s):
        a1212testingracdb1d,a1212testingracdb2d



----------------------------------------------------------------------------------------------------------------------

I am getting "INFO: PRVF-9652 : Cluster Time Synchronization Services check failed"

When testing the Cluster Time Synchronization Services check failed as below.


Another way to verify this:

Use command: ./cluvfy comp clocksync



[grid@node-01 bin]$ ./cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Observer state. Switching over to clock synchronization checks using NTP


Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
PRVF-5402 : Warning: Could not find NTP configuration file "/etc/ntp.conf" on node "node-01"
PRVF-5405 : The NTP configuration file "/etc/ntp.conf" does not exist on all nodes
PRVF-5414 : Check of NTP Config file failed on all nodes. Cannot proceed further for the NTP tests

Checking daemon liveness...
Liveness check failed for "ntpd"
Check failed on nodes:
        node-01
PRVF-5494 : The NTP Daemon or Service was not alive on all nodes
PRVF-5415 : Check to see if NTP daemon or service is running failed
Clock synchronization check using Network Time Protocol(NTP) failed


--------------------------------------------------------------------------------------------------------------------------


Resolution:


mv  /etc/sysconfig/ntpd  /etc/sysconfig/ntpd_bk

mv /etc/ntp.conf /etc/ntp.conf_bk

[grid@a1212testingracdb1d bin]$ mv  /etc/sysconfig/ntpd  /etc/sysconfig/ntpd_bk

[oracle@a1212testingracdb1d ~]$ mv /etc/ntp.conf /etc/ntp.conf_bk


Then run "cluvfy comp clocksync" in both nodes.


[grid@node-01 ~]$cd $GRID_HOME/bin
[grid@node-01 ~]$ cluvfy comp clocksync

Verifying Clock Synchronization across the cluster nodes

Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed

Checking if CTSS Resource is running on all nodes...
CTSS resource check passed


Querying CTSS for time offset on all nodes...
Query of CTSS for time offset passed

Check CTSS state started...
CTSS is in Active state. Proceeding with check of clock time offsets on all nodes...
Check of clock time offsets passed


Oracle Cluster Time Synchronization Services check passed

Verification of Clock Synchronization across the cluster nodes was successful.
[grid@node-01 ~]$



No comments:

Post a Comment