Oracle Rac Taf

download Oracle Rac Taf

of 6

Transcript of Oracle Rac Taf

  • 7/30/2019 Oracle Rac Taf

    1/6

    Oracle RAC and Hardware Failover

    To detect a node failure, the Cluster Manager uses a backgroundprocessGlobal Enqueue Service Monitor (LMON)to monitor

    the health of the cluster. When a node fails, the Cluster Managerreports the change in the cluster's membership to Global CacheServices (GCS) and Global Enqueue Service (GES). Theseservices are then re-mastered based on the current membership ofthe cluster.To successfully re-master the cluster services, Oracle RAC keepstrack of all resources and resource states on each node and thenuses this information to restart these resources on a backup node.

    These processes also manage the state of in-flight transactions andwork with TAF to either restart or resume the transactions on thenew node. Now let's see how Oracle RAC and TAF work togetherto ensure that a server failure does not cause an unplanned serviceinterruption.

    Using Transparent Application Failover

    After an Oracle RAC node crashesusually from a hardwarefailureall new application transactions are automatically reroutedto a specified backup node. The challenge in rerouting is to notlose transactions that were "in flight" at the exact moment of thecrash. One of the requirements of continuous availability is theability to restart in-flight application transactions, allowing a failednode to resume processing on another server without interruption.Oracle's answer to application failover is a new Oracle Netmechanism dubbed Transparent Application Failover. TAF allows

    the DBA to configure the type and method of failover for eachOracle Net client.For an application to use TAF, it must use failover-aware API callsfrom the Oracle Call Interface (OCI). Inside OCI are TAF callbackroutines that can be used to make any application failover-aware.

  • 7/30/2019 Oracle Rac Taf

    2/6

    While the concept of failover is simple, providing an apparentinstant failover can be extremely complex, because there are manyways to restart in-flight transactions. The TAF architecture offersthe ability to restart transactions at either the transaction

    (SELECT) or session level:

    SELECT failover. With SELECT failover, Oracle Net keeps trackof allSELECT statements issued during the transaction, trackinghow many rows have been fetched back to the client for eachcursor associated with a SELECTstatement. If the connection tothe instance is lost, Oracle Net establishes a connection to another

    Oracle RAC node and re-executes the SELECTstatements,repositioning the cursors so the client can continue fetching rowsas if nothing has happened. The SELECT failover approach is bestfor data warehouse systems that perform complex and time-consuming transactions.

    SESSION failover.

    When the connection to an instance is lost, SESSION failoverresults only in the establishment of a new connection to anotherOracle RAC node; any work in progress is lost. SESSION failoveris ideal for online transaction processing (OLTP) systems, wheretransactions are small.Oracle TAF also offers choices on how to restart a failedtransaction. The Oracle DBA may choose one of the followingfailover methods:

    BASIC failover.

    In this approach, the application connects to a backup nodeonly after the primary connection fails. This approach has lowoverhead, but the end user experiences a delay while the newconnection is created.

  • 7/30/2019 Oracle Rac Taf

    3/6

    PRECONNECT failover.

    In this approach, the application simultaneously connects to both aprimary and a backup node. This offers faster failover, because a

    pre-spawned connection is ready to use. But the extra connectionadds everyday overhead by duplicating connections.Currently, TAF will fail over standard SQL SELECT statementsthat have been caught during a node crash in an in-flighttransaction failure. In the current release of TAF, however, TAFmust restart some types of transactions from the beginning of thetransaction.The following types of transactions do not automatically fail over

    and must be restarted by TAF:Transactional statements. Transactions involving INSERT,UPDATE, orDELETE statements are not supported by TAF.ALTER SESSION statements. ALTER SESSION andSQL*Plus SETstatements do not fail over.The following do not fail over and cannot be restarted:Temporary objects. Transactions using temporary segments in theTEMP tablespace and global temporary tables do not fail over.

    PL/SQL package states. PL/SQL package states are lost duringfailover.Using Oracle RAC and TAF TogetherThe continuous availability features of Oracle RAC and TAF cometogether when these products cooperate in restarting failedtransactions. Let's take a closer look at how this works.Within each connected Oracle Net client, tnsnames.ora file

    parameters define the failover types and methods for that client.The parameters direct Oracle RAC and TAF on how to restart any

    transactions that may be in-flight during a hardware failure on thenode.It is important to note that TAF failover control is external to theOracle RAC cluster, and each Oracle Net client may have uniquefailover types and methods, depending on processing requirements.

  • 7/30/2019 Oracle Rac Taf

    4/6

    The following is a client tnsnames.ora file entry for a node,including its current TAF failover parameters:

    bubba.world =(DESCRIPTION_LIST =

    (FAILOVER = true)(LOAD_BALANCE = true)(DESCRIPTION =(ADDRESS =(PROTOCOL = TCP)(HOST = redneck)(PORT = 1521))(CONNECT_DATA =(SERVICE_NAME = bubba)

    (SERVER = dedicated)(FAILOVER_MODE =

    (BACKUP=cletus)(TYPE=select)(METHOD=preconnect)(RETRIES=20)(DELAY=3)

    )

    ))The failover_mode section of the tnsnames.ora file lists the

    parameters and their values:BACKUP=cletus. This names the backup node that will take overfailed connections when a node crashes. In this example, the

    primary server is bubba, and TAF will reconnect failedtransactions to the clients instance in case of server failure.

    TYPE=select. This tells TAF to restart all in-flight transactionsfrom the beginning of the transaction (and not to track cursor stateswithin each transaction).METHOD=preconnect. This directs TAF to create two connectionsat transaction startup time: one to the primary bubba database anda backup connection to the clients database. In case of instance

  • 7/30/2019 Oracle Rac Taf

    5/6

    failure, the clients database will be ready to resume the failedtransaction.RETRIES=20. This directs TAF to retry a failover connection upto 20 times.

    DELAY=3. This tells TAF to wait three seconds betweenconnection retries.Remember, you must set these TAF parameters inevery tnsnames.ora file on every Oracle Net client that needstransparent failover.Putting It All TogetherAn Oracle Net client can be a single PC or a huge applicationserver. In the architectures of giant Oracle RAC systems, each

    application server has a customized tnsnames.ora file that governsthe failover method for all connections that are routed to thatapplication server.Watching TAF in ActionThe transparency of TAF operation is a tremendous advantage toapplication users, but DBAs need to quickly see what hashappened and where failover traffic is going, and they need to beable to get the status of failover transactions. To provide this

    capability, the Oracle data dictionary has several new columns inthe V$SESSION view that give the current status of failovertransactions.The following query calls the new FAILOVER_TYPE,FAILOVER_METHOD, and FAILED_OVER columns of theV$SESSION view. Be sure to note that the query is restricted tononsystem sessions, because Oracle data definition language(DDL) and data manipulation language (DML) are not recoverablewith TAF.

    selectusername,sid,serial#,failover_type,failover_method,

  • 7/30/2019 Oracle Rac Taf

    6/6

    failed_overfrom

    v$sessionwhere

    username not in ('SYS','SYSTEM','PERFSTAT')and

    failed_over = 'YES';You can run this script against the backup node after an instancefailure to see those transactions that have been reconnected withTAF. Remember, TAF will quickly redirect transactions, so you'llonly see entries for a short period of time immediately after the

    failover. A backup node can have a variety of concurrent failovertransactions, because the tnsnames.ora file on each Oracle Netclient specifies the backup node, the failover type, and the failovermethod.