Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8...

64
1 ©2015 Talend Inc Talend Presentation

Transcript of Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8...

Page 1: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

1

©2015 Talend Inc

Talend Presentation

Page 2: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

2

Connecting the Data-Driven Enterprise

Data-Driven companies…

• 23 times greater customer acquisition

• 6 times greater customer retention

• 19 times more profitability

Page 3: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

3

Fast Facts

• Founded in 2006

• Open source

• Almost 500 employees in 8 countries (50+ new in Q2’15)

• 1,700+ customers

• Raised over $100M from tier-one funds (incl. Silver Lake & Balderton Capital)

Talend Overview

Revenue

Growth

2007 2008 2009 2010 2011 2012 2013 2014

Page 4: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

4

Unified Integration Platform

• Lowest Cost of Ownership

• Open, Standards Based

• Run in the Cloud, On-premises or Hybrid

• Big Data Leadership

Page 5: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

5

Financial Services

IT Services

Media

Manufacturing

Retail and Consumer Goods

Travel and Transportation

Technology

A Broad customer base across industry and segments

Page 6: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

6

Market trends

Digital is transforming the way IT is consumed and designed• Open and connected apps rather than centralized and siloed.

• Information architecture has to be revisited for real time, big data, agility, etc.

• Cloud as a must, not an option

The integration market is hot• Equipment rate in mid-market still low, demand is rising

• Disruptive trends like Big Data and Cloud drive new opportunities at large accounts

• Expertise matters, knowledge can be valued at high rates

Project revenues are being shrinked, need to find new growth drivers• From Big Bang to Land and Expand

• From Capex to Opex

• From implementation to build, run and extend

Page 7: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

7

• Innovate with the Cloud and Big Data

• Roll-out new projects with MDM, BI/DI, and Application Integration

• Industrialize and standardize with shared services for Data Quality, Data and Application Integration

• Refresh your legacy systems

Why Change?

Market Start

Zenith of

Industrialization

Com

mo

ditiz

atio

n

Talend enables you to position at every step of your customer Cycle

Data Quality

Custom Coding to ETL

Custom solutionto MDM

Ent. Standardsfor integration

iPaaS

Big Data

BI Relaunch

ETL

Offloading

Migrations

New Apps project With ESB

Master Data Mgmt

New BI project with ETL

Cloud

Re-platforming

Page 8: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

8

Why now ?

$2.1B

$3.9B

$5.5 B

Source: Gartner, IDC, Data Warehousing Institute

Data Integration and Quality

App. Integration

MDM

Big Data

REVENUE (BILLIONS)

iPaaS

• Market Growth: 13%,

• Talend Growth: 33%

Talend MDM grew 86% in 2014

Talend in Gartner’s MDM for MDM and in ForresterWave

Talend as a visionary in Gartner’s Magic Quadrant

for On Premises Application Integration Suite

Talend Big Data grew 126 % in 2014

Only native integration platform on Hadoop

31 % market growth between 2012 and 2018

Launch of Talend Integration Cloud starting

in March 2015

$0.4B

$0.3B

Page 9: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

9

Why Talend ?

• Ramp up and fuel your volume business• Low learning curve, familiar to your teams

• Flexible pricing models across your different markets

• Open platform allow to adapt to any customer context

• Nurture your customers for life• Subscription based licensing help you install a sustainable

relationship with your customer

• Large portfolio of products to expand over time across your customers processes and applications

• Innovate • Get ready for Big Data

• Turn existing processes into real-time and data driven

• Connecting The Enterprise

Page 10: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

10

Talend’s proposition for VAR

Upsell to existing accounts

DI

BI

DI, AI

CRM

DI, AI

AppsDI

Open new accounts

DI

DI

DIMDM ESB

BigData

DI DM BD ESB MDM

Nurture accounts and drive recurring revenue

Expand accounts

DI

DMMDM ESBBig Data

DI DI DI DI

MDM ESBBig Data

Platform

• Open new accounts by leveraging

Talend’s brand and open source model

• Grow your project margin by

embedding the integration layer as a

building block of your projects

• Drive recurring revenue at your

accounts through Talend’s subscription

based model

• Expand your customer’s share of wallet

by leveraging Talend unified platform

capabilities across integration domains

Page 11: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

11

Evaluating the business benefits

Page 12: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

12

About important Components

• Logs & Errors Components

• The Logs & Errors family of components allow you to log information about the execution of your Job. With the exception of tDie, these components play no functional part in the task-specific processing of your Job; however, they play an important pat in the debugging your Jobs and helping to ensure their smooth running.

• This article gives an overview of each of these components, providing a good understanding of where each of these may help. In later articles, we'll look at each of these components in more detail.

Page 13: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

13

About important Components cont…

• tLogRow

• The component tLogRow allows you to write row data to the Job log file, or console window, if you're running your Job from within Talend Studio.

• If you're running your Job from within Talend Studio, remember that, writing a large volume of data to the console windows, makes Talend Studio very unresponsive.

Page 14: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

14

About important Components cont…

• tAsset / tAssetCatcher

• This pair of components allows you to send and catch non-blocking trigger messages.

• tAssert send the message to tAssertcatcher will catch the message.

Page 15: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

15

About important Components cont…

• tChronometerStart / tChronomoterStop

• tChronometerStop records and displays the elapsed time since the start of a SubJob or an associated tChronometerStart.

• tChronometerStart Component Reference

• The tChronometerStart component is part of the Logs & Errors family of components and is used in conjunction with tChronometerStop.

• For more information on the usage of this component, read our tutorial on tChronometerStop.

• tChronometerStart Global Variables

• The following global variables are available, for use in within your Job (Flow).

• (Long) globalMap.get("tChronometerStart_1_STARTTIME")

Page 16: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

16

• tChronometerStop Component Reference• The tChronometerStop component is part of the Logs & Errors family of

components.• tChronometerStop records and displays the processing time of

a SubJob. tChronometerStop may be used on its own, to record the processing time of a whole SubJob, or in conjunction with tChronometerStart to record from a specific SubJob.

• tChronometerStop Example• The following example shows a simple Job with

two tChronometerStop components, each recording a specific duration of execution time.

Page 17: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

17

Chronometer demo displays the elapsed time since the start of a SubJob or an associated tChronometerStart

Page 18: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

18

• tChronometerStop_1

• This component records and displays the elapsed time since the invocation of tChronometerStart_1.

Page 19: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

19

• tChronometerStop_3

• This component records and displays the elapsed time since the beginning of the whole SubJob.

• Note. There appears to be some inconsistencies in Talend Studio's sequencing of the tChronometer components; hence the apparent jump in the sequences shown in this tutorial.

Page 20: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

20

• tChronometerStop Basic Settings

• As well a specifying the Since options, you also have control over the presentation of the results.

• Execution Result

• The execution timings of this sample Job can be seen below.

• tarting job tChronometerStopExample at 18:10 28/08/2013. [statistics] connecting to socket on port 3918 [statistics] connected [ tChronometerStop_1 ] 3001 milliseconds [ tChronometerStop_3 ] 10002 milliseconds [statistics] disconnected Job tChronometerStopExampleended at 18:10 28/08/2013. [exit code=0]

Page 21: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

21

Log cont..

• tChronometerStop Global Variables

• The following global variables are available, for use in within your Job (Flow).

• (Long) globalMap.get("tChronometerStop_1_STOPTIME")

• (Long) globalMap.get("tChronometerStop_1_DURATION")

• tChronometerStart Global Variables

• The following global variables are available, for use in within your Job (Flow).

• (Long) globalMap.get("tChronometerStart_1_STARTTIME")

Page 22: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

22

• tDie

• This component sends a message to a tLogCatcher and allows the Job to terminate a Job, with a specified Exit Code, once the message has been processed.

Page 23: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

23

• tFlowMeter / tFlowMeterCatcher

• This pair of components allows you to record and catch the data-flow metrics of your Job.

Page 24: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

24

• tStatCatcher

• The tStatCatcher component catches statistics that are generated by a Job or individual components. Statistics are always collected for Jobs, components must have tStatCatcher Statistics enabled (Advanced settings).

• Tstatscatcher demo displays the tStatCatcher, using a tLogRowcomponent.

Page 25: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

25

Page 26: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

26

• tLogCatcher

• This component carches messages from tDie and tWarn.

Page 27: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

27

• tFlowMeter / tFlowMeterCatcher

• This pair of components allows you to record and catch the data-flow metrics of your Job.

Page 28: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

28

Page 29: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

29

Custom Code Components

• The Talend Custom Code components allow you to extend the functionality of Talend beyond the functionality that is available by simply connecting other components together. This may be basic functionality that is provided by the tJava component, through to more complex processing available through components such as tJavaFlex, tJavaRow and tLibraryLoad.

Page 30: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

30

See inside the each component for more details

Page 31: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

31

tSetGlobalVar Component

• tSetGlobalVar Component

• The tSetGlobalVar Component is a convenient method for adding GobalVariables to globalMap.

• In the following screenshot, you can see that a simple Job has been created to define two new Global Variables which are added to globalMap using tSetGlobalVar.

• This is equivalent to using a tJava component to make the following assignments.

• globalMap.put("myString", "Hello World!");globalMap.put("myInteger", 999);

• The tJava Component shown in this example simply prints the values of the newly created variables.

• System.out.println("myString=" + globalMap.get("myString"));System.out.println("myInteger=" + globalMap.get("myInteger"));

Page 32: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

32

tSetGlobalVar Component

Page 33: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

33

Orchestration Components

• Orchestration Components

• The Talend Orchestration components allow you control the behaviour and execution of your Job.

• tPreJob & tPostJob

• Add some pre and post-processing to your Jobs with this pair of Orchestration components.

• Read More »

• tRunJob

• Organise your Jobs using tRunJob. Modularise your Job or call reusable Reusable Jobs.

Page 34: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

34

Processing Components

• Processing Components

• Processing Components

• The Talend Processing components are the family of components that allow you to work with and transform your data.

• tBufferInput

• Read your buffered data, using the tBufferInputcomponent.

• Read More »

• tBufferOutput

• Buffer your data for output, using the tBufferOutputcomponent.

• Read More »

• tMap

• The tMap component is at the core of everyting you do. Map, Join, Transform your data and more.

Page 35: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

35

Talend null Handling

• Refer Doc

Page 36: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

36

• Talend String Handling

• Talend Date Handling

• Talend Data Validation

• Talend Schema Reference

• Talend Java Tips

• Talend Routines Tutorial

• Talend Job Deployment & Scheduling

• Working with Databases

• Text Files

Page 37: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

37

Talend Data Generation

• Talend Data Generation

• Talend provides the tRowGenerator component, for generating data. This component allows you to specify an arbitrary number of rows that should be generated, define a Schema, and then assign values to the columns that have been defined. Usually, random values are assigned, using the methods provided by Repository->Code->Routines->System->TalendDataGeneration; however, you may assign data using Routines of your choice.

• My preference is to use tRowGenerator for row generation and, perhaps, the assignment of a Primary Key; but to map the remainder of my data using one or more tMap components. This offers maximum flexibility.

• The methods provided by Talend's TalendDataGeneration Routines will give you a good start to your data generation needs; however, you may find them limiting. As part of this tutorial, I have built a set of Routines, TBEDataGeneration, which offer a greater breadth of data (for Address and Person). I will add to these, from time to time. These routines have a UK slant; however, you may modify these to suit your own needs.

Page 38: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

38

Talend Performance

• Talend Performance

• As with building any software, performance (usually meaning speed of execution) is a key input to your design and development.

• Measuring load on tMap by limiting lookup component

• Using multithreading concept

• Using Parallel execution

• Using tSort which firmly use “ Shot on Disk “

• By tParallelize allows you to synchronize the execution of a subjob with the execution of other subjobs in your main Job. tParallelize helps you manage complex Job systems.

Page 39: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

39

Metadata

• A set of data that describes and gives information about other data.

• Metadata is holding the schema information to facilitate with reusable properties.

• Below example will gives you clarity:-

Page 40: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

40

Metadata about DB conection

Page 41: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

41

Metadata about file delimited

Page 42: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

42

Metadata about XML

Page 43: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

43

Metadata about .xlsx

Page 44: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

44

Metadata about json file

Page 45: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

45

Routine/User define function

• Talend has provide facilities to create own user define function through java code, once it created it will reflect in expression.

Page 46: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

46

Routine/User define function

Page 47: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

47

Routine/User define function

Page 48: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Talend CDC

When we talk about OLAP ( , the extraction and transportation of data from one or more databases into a targetOnline Analytical Processing)system or systems for analysis. But this involves the extraction and transportation of huge volumes of data and is very expensive in bothresources and time.

The ability to capture only the changed source data and to move it from a source to a target system(s) in real time is known as Change Data. Capturing changes reduces traffic across a network and thus helps reduce ETL time.Capture (CDC)

The CDC feature, introduced in , simplifies the process of identifying the change data since the last extraction. CDC in Talend Studio  Talendquickly identifies and captures data that has been added to, updated in, or removed from database tables and makes this change dataStudio 

available for future use by applications or individuals. The CDC feature is available for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQLServer, Informix, Ingres, Teradata, and AS/400.

Warning

The CDC feature works only with database systems running on the same server.

Three different CDC modes are available in :Talend StudioTrigger: this mode is the by-default mode used by CDC components.Redo/Archive log: this mode is used with Oracle v11 and previous versions and AS/400.XStream: this mode is used only with Oracle v12 with OCI.

For detailed information on these three modes, see the following sections.

Trigger modeThis mode is available for the following databases: MySQL, Oracle, DB2, PostgreSQL, Sybase, MS SQL Server, Informix, Ingres, and Teradata.

The mode places a trigger that launches change data capture on every monitored source table. This, by turn, imposes little modificationsTriggeron database structure.

With this mode, data extraction takes place at the same time the , , or operations occur in the source tables, and the changeInsert Update Deletedata is stored inside the database in change tables. The changed data, thus captured, is then made available to the target system(s) in acontrolled manner, using subscriber views.

In mode, CDC can have only one publisher but many subscribers. CDC creates subscriber tables to control accessibility of the changeTriggertable data by the target system(s). A target system is any application that wants to use the data captured from the source system.

The below figure shows the basic architecture of a CDC environment in mode in .Trigger Talend StudioIn this example, CDC monitors the changes made to a Product table. The changes are caught and published in a change table to which twosubscribers have access: a CRM application and an Accounting application. These two systems fetch the changes and use them to update theirdata.

CDC Redo/Archive log modeThe mode is only available for Oracle v11 and previous versions and AS/400 databases. It is equivalent to the archive log Redo/Archive log mode for Oracle and to the journal mode for AS/400.

In an Oracle database, a is a file which logs the history of changes made to data. In an AS/400 database, these changes are logged Redo log automatically in the database's internal logbook (journal). These changes include the insert, update and delete operations which data mayundergo.

Redo/Archive log mode is less intrusive than mode because in contrast to mode, it does not require modifications to the Trigger   Trigger database structure.

When setting up this mode for Oracle, only one subscriber can have access rights to the change table. This subscriber must Redo/Archive log be a database user who holds the subscription rights. Also, there is a subscription table which controls access to the subscriber change table. Thesubscription change table is a comprehensive, internal table which reflects the state of the Oracle database at the moment at which the Redo/Arc

option was activated.hive log 

When setting up this mode for AS/400, a save file, called and provided in your Studio, is restored on AS/400 and used to install a fitcdc.savf program called . When the subscriber views the changes made ( ) or consumes them for reuse (using a c RUNCDC View all changes  tAS400CDC omponent), the program reads and analyzes the logbook (journal) and the attached receiver from the source table and updates the RUNCDCchange table accordingly. The AS/400 mode (journal) creates subscription tables to prevent unauthorized target systemsCDC Redo/Archive log from accessing the data in the change tables. A target system means any application which tries to use data captured in the source system.

In this example, the CDC monitors the changes made to a Product table, thanks to the data contained in the database's logbook (journal). TheCDC reads the logbook and records the changes which have been made to the data. These changes are collected and published in a table ofchanges to which two subscribers have access, a CRM application and an Accounting application. These two systems fetch the changes and usethem to update their data.

Page 49: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

XStream modeXStream Out provides Oracle Database components and application programming interfaces that enable you to share data changes made to anOracle database with other systems. It also provides a transaction-based interface for streaming the changes captured from the redo log of theOracle database to client applications with an outbound server. An outbound server is an optional Oracle background process that sends datachanges to a client application.

XStream In provides Oracle Database components and application programming interfaces that enable you to share data changes made to othersystems with an Oracle database. It also provides a transaction-based interface for sending information to an Oracle database from clientapplications with an inbound server. An inbound server is an optional Oracle background process that receives data changes from a clientapplication.

The mode is only available for Oracle v12 with OCI in . For more information about the mode, see XStream   Talend Studio  XStream   http://docs.or.acle.com/cd/E11882_01/server.112/e16545/toc.htm

CDC: a publish/subscribe principleThe CDC architecture is based on the publisher/subscriber model.

The publisher captures the change data and makes it available to the subscribers. The subscribers utilize the change data obtained from thepublisher.

The main tasks performed by the publisher are:identifying the source tables from which the change data needs to be captured.capturing the change data and storing it in specially created change tables.allowing subscribers controlled access to the change data.

In mode, or the AS/400 mode (journal) the subscriber is a table that only lists the applications that have access rights Trigger   Redo/Archive log to the change tables. In the Oracle mode, the subscriber is a user of the database. The subscriber may not be interested in all Redo/Archive log the data that is published by the publisher.

Setting up a CDC environmentThe CDC feature is part of ; you do not need to install any software other than to use CDC. Talend Studio  Talend Studio 

However, if you want to use CDC in mode for an Oracle, you must first of all configure the database so that it generates the Redo/Archive log redo records that hold all insert, update or delete changes made in datafiles. For further information, see Prerequisites for the Oracle

.Redo/Archive log mode

If you want to use CDC in mode for AS/400, you must verify that the prerequisites on your AS/400 are all met. For further Redo/Archive log information, see . The prerequisites on AS/400

Note

For the time being, CDC is only available in Java and is for Oracle, MySQL, DB2, PostgreSQL, Sybase, MS SQL Server, Informix,Ingres, and Teradata in mode, for Oracle and AS/400 databases in mode, and for Oracle in XStream mode. Trigger   Redo/Archive log 

Note

To set up a CDC environment you must understand the basics involved in designing a Job in , and particularly the Talend Studiodefinition of metadata items.

Note

When setting up a CDC environment, make sure that the database connection for CDC is on the same server with the source data towhich changes are to be captured.

How to set up CDC in Trigger modeThe following two sections provide a two-step guide to set up the CDC environment in mode in : the first step explains how toTrigger Talend Studioconfigure your system for CDC and the second step explains how to extract the modified data.

Page 50: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

How to configure CDC in Trigger mode

Page 51: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Below are configuration steps that need to be set up for a given publisher/subscriber scenario.just once

Page 52: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

STEP 1: SET UP A PUBLISHER

Page 53: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

1. 2.

To set up a publisher, you need to:

Set up a database connection dedicated to CDC.Set up a connection to the database where data is located.

Note:- For instance

If you work with an MS SQL Server, you must set the two connections to the same database but using two OR more different schema.and correct version of Talend studio.

Page 54: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

STEP 2: IDENTIFY THE SOURCE TABLE

Create the Connection in Talend as simple as other connection creation for CRM schema and created the customer table as mentionedabove:- 

Create the Connection in Talend as simple as other connection creation for DWH ( Data warehouse schema) and created the customertable as mentioned above:- 

Page 55: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

For monitoring, i have created the Talend as Schema in Myssql.

Next Step is to create the CDC  in CRM connection;

Page 56: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Next click on " Create subscription ) auto pop with query inside will appear click on "Execute "

Page 57: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Once the script is executed then, this process will .create table in Talend schema

It has create TSUBSCRIBERS table in Talend schema

Page 58: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

 

Page 59: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Once you click on Add CDC below pop up appears change the Subscriber Name to Table name as " Customers  which is in CRM schema.

Page 60: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

 

place your source file and create the job and ran the job

Page 61: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one

Now the CDC entry has marked in and as usual Insert, update and Delete in the Data Dumped into TALEND schema CRM Schema DWH table  schema

Page 62: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one
Page 63: Talend Presentation · 3 Fast Facts •Founded in 2006 •Open source •Almost 500 employees in 8 countries (50+ new in Q2’15) •1,700+ customers •Raised over $100M from tier-one