Oracle RAC 11gR2 Interconnect Performance Tuning
We are going to discuss RAC interconnect performance tuning today:
We are assuming here we are using Linux as operating system and UDP(User Datagram Protocol). Through interconnect nodes connect with each other, load balancing completely depends upon interconnect.So our interconnect performance is one of the major part of our cluster performance.
So first of all we have to choose hardware wisely which will support high speed interconnect.
Interconnect hardware
In notion to implement faster interconnect we need to implement faster hardware such as 10 Gigabit Ethernet (10 GigE) or InfiniBand.
Lets discuss first option to increase the throughput of your interconnect is the implementation of 10 GigE technology, which represents the next level of Ethernet. Although it is becoming increasingly common, note that 10 GigE does require specific certification on a platform-by-platform basis, check with oracle support for your platform.
InfiniBand is available and supported with two options. Reliable Datagram Sockets (RDS) protocol is the preferred option, because it offers up to 30 times the bandwidth advantage and 30 times the latency reduction over Gigabit Ethernet. IP over InfiniBand (IPoIB) is the other option, which does not do as well as RDS, since it uses the standard UDP or TCP, but it does still provide much better bandwidth and much lower latency than Gigabit Ethernet.
UDP buffers
We should pick fastest possible network to be used for the interconnect. To maximize your speed and efficiency on the interconnect,we should ensure that the User Datagram Protocol (UDP) buffers are set to the correct values. On Linux, you can check this via the following command:
sysctl net.core.rmem_max net.core.wmem_max net.core.rmem_default net.core.wmem_defaultnet.core.rmem_max = 4194300
net.core.wmem_max = 1048500
net.core.wmem_default = 262100
You can get correct value for your platform from Oracle Support.We can also verify this values directly from the files in the directory /proc/sys/net/core. We can reset this values to correct number using SYSCTL commands:
sysctl -w net.core.rmem_max=4194304sysctl -w net.core.wmem_max=1048576
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144
The numbers in this example are the recommended values for Oracle RAC on Linux and are more than sufficient for the majority of configurations. The values determined by rmem_max and wmem_max are on a “per-socket” basis. So if you set rmem_max to 8MB, and you have 800 processes running, each with a socket open for communications in the interconnect, then each of these 800 processes could potentially use 8MB, meaning that the total memory usage could be 1.6GB just for this UDP buffer space. So if rmem_default is set to 1MB and rmem_max is set to 8MB, you are sure that at least 800MB will be allocated (1MB per socket). Anything more than that will be allocated only as needed, up to the max value. So the total memory usage depends on the rmem_default, rmem_max, the number of open sockets, and the variable piece of how much buffer space each process is actually using.
It could depend on the network latency or other characteristics of how well the network is performing and how much network load there is altogether. Total number of Oracle-related open UDP sockets:
netstat -anp -udp | grep ora | wc -l
Jumbo Frames:
To increase the performance of your interconnect you may use of jumbo frames.When you use Ethernet, a variable frame size of 46–1500 bytes is the transfer unit used in all over the Ethernet.The upper bound is 1500 MTU (Maximum Transmission Unit).Jumbo frames allows the Ethernet frame to exceed the MTU of 1500 bytes up to a maximum of 9000 bytes (on most platforms—though platforms will vary).In Oracle RAC, the setting of DB_BLOCK_SIZE multiplied by the MULTI_BLOCK_READ_COUNTdetermines the maximum size of a message for the global cache, and the PARALLEL_EXECUTION_MESSAGE_SIZE determines the maximum size of a message used in Parallel Query.These message sizes can range from 2K to 64K or more, and hence will get fragmented more so with a lower/ default MTU.Increasing the frame size (by enabling jumbo frames) can improve the performance of the interconnect by reducing the fragmentation when shipping large amounts of data across that wire.
Please note: All hardware do not supports jumbo frames. Because of the differences in specific server and network hardwarerequirements, jumbo frames must be thoroughly tested before implementation in a production environment.
Please note: All hardware do not supports jumbo frames. Because of the differences in specific server and network hardwarerequirements, jumbo frames must be thoroughly tested before implementation in a production environment.
Now Starting with checking the performance of our interconnect which is already implemented assuming that our RAC is running and released to business user now:
We have have NIC errors or one of the NIC fails in 11GR2 it's not complete loss of interconnection between node with help of link aggregation.
Prior to Oracle 11gR2, system were designed
with the single point of failure. ( link aggregation=>NIC bonding, NIC teaming, or port trunking are also
used for the same concept.) The central idea behind link aggregation is to have
two private networks act as one. The two private networks are combined together
to appear to the operating system as one unit. To the OS, the network adapters
look like one adapter. If one of the physical network adapters were to fail, the
OS would hardly notice and network traffic would proceed through the remaining
adapter.
Oracle Grid Infrastructure now provides RAC HAIP, which is link aggregation
moved to the clusterware level. Instead of bonding the network adapters on the
OS side, Grid Infrastructure in instructed to use multiple network adapters.
Grid Infrastructure will still start HAIP even if the system is configured with
only one private network adapter.
To find out whether we have NIC related issues:
We could check for "gc cr lost blocks" wait event in Automatic Workload Repository (AWR)/sysstats.
Dropped packets/fragments
Buffer overflows
Packet reassembly failures or timeouts
TX/RX errors
We will use following commands to find any errors:
netstat -s
Ifconfig -a
ORADEBUG
Now we are going to identify Interconnect performance from AWR:
These wait events from AWR/sysstat can indicate contention related to RAC.
GC current block busy
GV cr block busy
GC current buffer busy
GC buffer busy acquire/release
These wait events in the AWR indicate that there might be a Hot Block that is causing these wait events. From the AWR Segment Statistics, you can find the objects
Enq:TX Index Contention
Gc buffer busy
Gc current block busy
Gc current split
Under Global Cache and Enqueue Services – we will check for Workload Characteristics:
Avg global cache cr block receive time (ms): should be <=15 ms
Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time on ksxp (ms): should be <1 ms
Under Interconnect Ping Latency Stats
Avg Latency 8K msg should be close to Avg Latency 500B msg.
We will find following issue if multiple sessions are inserting into a single object or are using a sequence, and the indexed column is sequentially increasing .To address the specific issues:
Identify the indexes and Global Hash Partition them
Increase the Sequence Cache if ordering is not a problem.
No comments:
Post a Comment