Oracle Golden Gate Presentation
Embed Size (px)
Transcript of Oracle Golden Gate Presentation
Networking Computers from Different Geographical positions.
Oracle Golden Gatean overview
Oracle Golden Gate delivers low-impact, real-time, data acquisition, distribution and delivery across heterogeneous systems. Using this technology, it enables cost-effective and low-impact real-time data integration and continuous availability of solutions.
Oracle Golden Gate is the leading real-time data integration software available in the industry. The software moves transactional data across heterogeneous database, hardware and operating systems environment with minimal impact. The software platform captures, routes, transforms and delivers data in real-time enabling organizations to maintain continuous uptime for critical applications during planned and unplanned outages. Additionally it moves data from transaction processing environments to read-only reporting databases and analytical applications for accurate, timely reporting and improved Business Intelligence for the enterprise.
Features of Golden GateHigh Performance : Continuous capture and delivery of data from sources to targets with end-to-end total latency of only sub-seconds. High performance and low overhead even at high volumes.
Flexibility / Extensibility : Capture and delivery of data between a variety of relational, open systems/open source, and legacy databases on all major platforms. It can be deployed unidirectional or bi-directional in multiple topologies including one to many, many to many, many to one and cascading. It has the ability to feed third party ETL solutions.
Low-Impact : Log-based capture and queueing changed data outside of the DBMS results in negligible overhead source system. Moving only committed transaction with compression minimizes network overhead.
Transaction Integrity : Maintains ACID (Atomicity, Consistency, Isolation and Durability) of transactions during the data movement. Applies the data in the same order it was commit in the source database.
Reliability : Enables guaranteed delivery and data integrity after interruptions / failures.
Improved Business Sight : Enables fresh data for better decision making by feeding analytical systems from OLTP systems with sub-second latency.
Lowered Integration Costs : Reads data base log files and moves only committed transactions to minimize overhead on the infrastructure while augmenting existing data integration investments.
Continuous System Availability : Eliminates planned and unplanned outages for mission critical systems to allow uninterrupted business operations.
Heterogeneity : Supports all major databases and platforms allowing companies to use the same product for all their real-time data integration and continuous data availability needs.
Reduced Risks : Offers data integrity and reliability between source and target systems while providing resilience against network / site outages.
OGG Supported processing methods and databases:
Golden Gate enables the exchange and manipulation of data at the transaction level among multiple, heterogeneous platforms across the enterprise. It’s modular architecture gives you the flexibility to extract and replicate selected data records, transactional changes across variety of topologies.
With this flexibility and the filtering, transformation and custom processing features of Golden Gate you can support numerous business requirements as follows:
• Business Continuance and high availability
• Initial Load and database migration
• Data Integration
• Decision Support and data warehousing
Golden Gate is composed of the following components:
• Extract• Data Pump• Replicat• Trails or Extract Files• Checkpoints• Manager• Collector
Logical Architecture of the Golden Gate
Overview of Extract The Extract process runs on the source system and is the capture mechanism of the Golden Gate.
Extract can be configured in two ways as follows:
Initial Loads : For Initial Data Loads, Extract extracts a current set of data directly from their source objects.
Change Synchronization : To keep source data synchronized with another set of data, Extract extracts transactional changes made to data (i.e., inserts, updates and deletes) after the initial synchronization has taken place.
When processing transactional data changes, Extract obtains the data from a data source that can be one of the following :
1) The database transaction logs (such as oracle redo logs). This method is also known as log based extraction.
2) A Golden Gate Vendor Access Module (VAM). The VAM is a communication layer that passes data changes and transaction metadata to the Extract process.
Multiple extract processes can operate on different objects at the same time for example one process could continuously extract transactional data changes and stream them to a decision-support database, while another process performs batch extracts for periodic reporting. Or two Extract processes could extract and transmit in parallel to 2 Replicat processes to minimize target latency when the databases are large.
Sample Extract parameter file
EXTRACT captUSERID ggs, PASSWORD *********DISCARDFILE /ggs/capt.dsc, PURGERMTHOST sysb, MGRPORT 7809RMTTRAIL /ggs/dirdat/aaTABLE fin.*;TABLE sales.*;
Overview of Data pumps :
A Data pump is a secondary extract group within the source Golden Gate configuration. If a data pump is not used, Extract must send data to a remote trail on the target. In a typical configuration that includes a data pump, however the primary Extract group writes to a trail on the source system. The data pump reads this trail and sends the data across the network to a remote trail on the target.
Like a primary extract group, a data pump can be configured for either online or batch processing.
Using data pump is one of the best practice. Reasons for using data pump are as follows:
1. Protection against network and target failures: In a basic Golden Gate configuration, with only a trail on the target system, there is nowhere on the source system to store data that Extract continuously extracts into memory. If the network on the target system becomes unavailable, the primary Extract could run out of memory and abnormally end. However with a trail and data pump on the source system, captured data can be moved to disk, preventing the abend. When the connectivity is restored, the data pump extracts the data from the source trail and sends it to the target system.
2) You are implementing several phases of data filtering or transformation : When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system, and then use another data pump or the Replicat group to perform the second transformation.
3) Consolidating data from many sources to a central target : When synchronizing multiple source databases with a central target database, you can store expected data on each source system and use data pumps on each of those systems to send the data to a trail on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target systems to accommodate data arriving from multiple sources.
4) Synchronizing one source with multiple target : When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent through other targets.
Overview of Replicat :
The Replicat system runs on the target system. Replicat reads extracted data changes that are specified in the Replicat configuration, and then it replicates them to the target database. Replicat can be configured as one of the following :
1)Initial Loads : For Initial Loads, Replicat can apply data to target objects or route them to a high speed bulk-load utility.
2) Change Synchronization : To maintain synchronization Replicat applies extracted transactional changes to target objects using native database calls, statement caches and local database access. We can use multiple Replicat processes with multiple Extract processes in parallel to increase throughput.
You can delay replicat so that it waits a specific amount of time before applying data to the target database. A delay may be desirable to control data arrival across different time zones.
Sample Replicat parameter fileREPLICAT delivUSERID ggs, PASSWORD ****SOURCEDEFS /ggs/dirdef/defsDISCARDFILE /ggs/deliv.dsc, PURGEGETINSERTSMAP fin.account, TARGET fin.acctab,COLMAP (account = acct,balance = bal,branch = branch);MAP fin.teller, TARGET fin.telltab,WHERE (branch = “NY”);IGNOREINSERTSMAP fin.teller, TARGET fin.telltab,WHERE (branch = “LA”);
Overview of Trail : To support the continuous extraction and replication of supported database changes, Golden Gate stores those changes temporarily on disk in a series of files called Trail. A Trail can exist on the source or target system, depending on how you configure Golden Gate. The use of Trail also allows extraction and replication activities to occur independently of each other. With those processes separated, you have more choices for how data is delivered. For example, intead of extracting and replicating changes continuously but store them in the Trail for replication to the target later whenever the target layer need them.
Read and Write process to Trail :
The primary extract process writes to a trail. Only one extract process can write to a trail.
Processes that read the trail are :
Data dump extract : Extracts data from a local trail for further processing. If needed, and transfer it to the target system or to the next Golden Gate process downstream in the Golden Gate configuration.
Replicat : Reads a trail to apply change data to the target database.
Trail files are created as needed during processing. By default each file in a trail is 10MB in size. All file names in a trail begin with the same two characters. As the files are created, each name is appended with a unique, six-digit serial (unique) number from 000000 to 999999. Ex. D:\ggs\dirdat\tr000009. By default, trails are stored in the dirdat sub-directory of the Golden Gate directory.
How processes write to Trail : To maxize throughput, and to minimize Input/Output load on the system, extracted data is sent into and out of a trail in large blocks. Transactional order is preserved. By default, Golden Gate writes data to the trail in universal data format, a proprietary format which allows it to be exchanged rapidly and accurately among heterogeneous databases. However, the data can be written in other formats that are compatible with different applications.
Extract operates in two modes : Append mode and Overwrite mode.
Trail file format : As of Golden Gate version 10.0 each file of a trail contains a file header record that is stored at the beginning of the file. The file header contains the information about the trail file itself.
Each data record in a Golden Gate trail file contains a header area and data area. The header contains Information about the transaction environment, and the data area contains the actual data values that were extracted.
Overview of Extract files :
When processing a one-time run, such as an initial load or a batch run that synchronizes transactional changes Golden Gate stores the extracted changes in a extract file instead of a trail. The extract file typically is a single file but can be configured to roll over into multiple files in anticipation of limitations on the size of a file that are imposed by the operating system. In this sense, it is similar to a trail, except that check points are not recorded. The file or files are created automatically during the run. The same versioning features that apply to trails also apply to extract files.
Overview of Checkpoints :Checkpoints store current read and write positions of a process to disk for recovery purpuses. Checkpoints ensure that database changes marked for synchronization are extracted by Extract and replicat and they prevent redundant processing.
The read checkpoint of a process is always synchronized with write checkpoint. Thus if Golden Gate needs to re-read something that is already sent to the target system (for example in case of a process failure) checkpoints enable accurate recovery to the point where a new transaction starts, and Golden Gate resume processing.
Checkpoints work with inter process acknowledgements to prevent messages from being lost in the network.
Extract creates checkpoints for its positions in the data source and in the trail. Replicat creates checkpoints for its position in the trail.
A checkpoint system is used for Extract and Replicat processes that operate continuously, but it is not required for batch mode. A batch process can be re-run from its start point, whereas continuous processing requires the support for planned or unplanned interruptions that is provided by checkpoints.
Checkpoint information is maintained in checkpoint files within the dirchk sub-directory of the Golden Gate directory. Optionally Replicat checkpoints can be maintained in a checkpoint table within the target database, in addition to a standard checkpoint file.
Overview of Manager :
Manager is the control process of Golden Gate. Manager must be running on each system in the Golden Gate configuration before Extract or Replicat can be started and manager must remain running while those processes are running so that resource management functions are performed.
Manager performs the following functions :Monitor and restarts Golden Gate processes.Issue threshold reports, for example when throughput slows down or synchronization latency increases.Maintain trail files and logs.Allocate data storage space.Report errors and events.Receive and route user requests from the user interface.One manager can process can control many Extract or Replicat processes.
Overview of Collector :
Collector is a process that runs in the back ground on the target system. Collector receives extracted database changes that are sent across the TCP/IP network and it writes them to a trail or extract file. Manager starts collector automatically when a network connection required. When manager starts collector the process is known as Dynamic collector. We can run collector manually. This is known as static collector.
When a dynamic collector is used, it can receive information from only one Extract process, so there must be a dynamic collector for each extract that you use. When a static collector is used, several extract processes can share one collector. However a one-to-one ratio is optimal.