Sunday, 9 August 2015

Oracle RAC 11gR2 Interconnect Performance Tuning

 Oracle RAC 11gR2 Interconnect Performance Tuning

We are going to discuss RAC interconnect performance tuning today:
We are assuming here we are using Linux as operating system and UDP(User Datagram Protocol). Through interconnect nodes connect with each other, load balancing completely depends upon interconnect.So our interconnect performance is one of the major part  of our cluster performance.

So first of all we have to choose hardware wisely which will support high speed interconnect. 

Interconnect hardware

In notion to implement faster interconnect we need to implement faster hardware such as  10 Gigabit Ethernet (10 GigE) or InfiniBand
Lets discuss first option to increase the throughput of your interconnect is the implementation of 10 GigE technology, which represents the next level of Ethernet. Although it is becoming increasingly common, note that 10 GigE does require specific certification on a platform-by-platform basis, check with oracle support for your platform. 

InfiniBand is available and supported with two options. Reliable Datagram Sockets (RDS) protocol is the preferred option, because it offers up to 30 times the bandwidth advantage and 30 times the latency reduction over Gigabit Ethernet. IP over InfiniBand (IPoIB) is the other option, which does not do as well as RDS, since it uses the standard UDP or TCP, but it does still provide much better bandwidth and much lower latency than Gigabit Ethernet.


UDP buffers

We should pick fastest possible network to be used for the interconnect. To maximize your speed and efficiency on the interconnect,we should ensure that the User Datagram Protocol (UDP) buffers are set to the correct values. On Linux, you can check this via the following command:
sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net.core.wmem_default
net.core.rmem_max = 4194300
net.core.wmem_max = 1048500
net.core.wmem_default = 262100
You can get correct value for your platform from Oracle Support.We can also verify this values  directly from the files in the directory /proc/sys/net/core. We can reset this values to correct number using  SYSCTL commands:
sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.wmem_max=1048576
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144

The numbers in this example are the recommended values for Oracle RAC on Linux and are more than sufficient for the majority of configurations. The values determined by rmem_max and wmem_max are on a “per-socket” basis. So if you set rmem_max to 8MB, and you have 800 processes running, each with a socket open for communications in the interconnect, then each of these 800 processes could potentially use 8MB, meaning that the total memory usage could be 1.6GB just for this UDP buffer space. So if rmem_default is set to 1MB and rmem_max is set to 8MB, you are sure that at least 800MB will be allocated (1MB per socket). Anything more than that will be allocated only as needed, up to the max value. So the total memory usage depends on the rmem_default, rmem_max, the number of open sockets, and the variable piece of how much buffer space each process is actually using.
It could depend on the network latency or other characteristics of how well the network is performing and how much network load there is altogether. Total number of Oracle-related open UDP sockets:

netstat -anp -udp | grep ora | wc -l


Jumbo Frames:

Now Starting with checking the performance of our interconnect which is already implemented assuming that our RAC is running and released to business user now:

We have have NIC errors or one of the NIC fails in 11GR2 it's not complete loss of interconnection between node with help of link aggregation.

Prior to Oracle 11gR2, system  were designed with the single point of failure. ( link aggregation=>NIC bonding, NIC teaming, or port trunking are also used for the same concept.) The central idea behind link aggregation is to have two private networks act as one. The two private networks are combined together to appear to the operating system as one unit. To the OS, the network adapters look like one adapter. If one of the physical network adapters were to fail, the OS would hardly notice and network traffic would proceed through the remaining adapter.
Oracle Grid Infrastructure now provides RAC HAIP, which is link aggregation moved to the clusterware level. Instead of bonding the network adapters on the OS side, Grid Infrastructure in instructed to use multiple network adapters. Grid Infrastructure will still start HAIP even if the system is configured with only one private network adapter.  


To find out whether we have NIC related issues:
We could check for "gc cr lost blocks" wait event in Automatic Workload Repository (AWR)/sysstats.

If we find "gc cr lost blocks" wait event, we need to check for following errors on the NIC:

Dropped packets/fragments
Buffer overflows
Packet reassembly failures or timeouts
TX/RX errors

We will use following  commands to find any errors:

netstat -s
Ifconfig -a
ORADEBUG
Now we are going to identify Interconnect performance from AWR:
These wait events from AWR/sysstat can indicate contention related to RAC.

GC current block busy
GV cr block busy
GC current buffer busy
GC buffer busy acquire/release 


These wait events in the AWR indicate that there might be a Hot Block that is causing these wait events. From the AWR Segment Statistics, you can find the objects


Enq:TX Index Contention
Gc buffer busy
Gc current block busy
Gc current split



Under Global Cache and Enqueue Services – we will check for Workload Characteristics:


Avg global cache cr block receive time (ms): should be <=15 ms
Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time on ksxp (ms): should be <1 ms
Under Interconnect Ping Latency Stats
Avg Latency 8K msg should be close to Avg Latency 500B msg.


We will find following issue if multiple sessions are inserting into a single object or are using a sequence, and the indexed column is sequentially increasing .To address the specific issues:


Identify the indexes and Global Hash Partition them
Increase the Sequence Cache if ordering is not a problem.


No comments:

Post a Comment