PowerCenter Basic Concepts

38
1 PowerCenter Basic Concepts Ale Ribeiro June 6, 2006

description

DW

Transcript of PowerCenter Basic Concepts

Page 1: PowerCenter Basic Concepts

1

PowerCenter Basic Concepts

Ale RibeiroJune 6, 2006

Page 2: PowerCenter Basic Concepts

2

Agenda

• What is PowerCenter?

• PowerCenter Client Applications

• Demo

• PowerCenter – Designer, Workflow Manager, Workflow Monitor

• PowerCenter Architecture

• Where do we use PowerCenter in IT?

• Q&A

Page 3: PowerCenter Basic Concepts

3

PowerCenter

• Is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover, and integrate data from virtually any business system, in any format, and deliver that data throughout the enterprise at any speed

• An ETL Tool (Extract, Transform and Load)

Page 4: PowerCenter Basic Concepts

4

PowerCenter Client Applications

Designer

Create ETL mappings

Workflow Manager

Create and start workflows

Workflow Monitor

Monitor and control

workflows

Development

Repository Manager

Manage repository• connections• folders• objects• users and groups

Administration Console (browser-based)

Perform domain and repository service tasks:• Create/configure nodes and repository services• Upgrade/delete• Start/stop• Backup/restore

Administration

Administration Console

Page 5: PowerCenter Basic Concepts

5

Designer Tools – Create mappings

SourceAnalyzer:

create source objects

TargetDesigner:

create target objects

TransformationDeveloper:

create reusable transformations

MappletDesigner:

createmapplets

MappingDesigner:

create mappings

Page 6: PowerCenter Basic Concepts

6

Mapping

Logically Defines the ETL Process:

• Reads data from sources

• Applies transformation logic to data

• Writes transformed data to targets

Source TargetTransformations

Note: Sources and targets can be flat files, relational tables, XML files, application systems, message queues, etc

Unit 1

Page 7: PowerCenter Basic Concepts

7

Mapping (cont’d)

• A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the Integration Service runs a session, it uses the instructions configured in the mapping to read, transform, and write data.

• Every mapping must contain the following components: ♦Source definition. Describes the characteristics of a source table or file.

♦Transformation. Modifies data before writing it to targets. Use different transformation objects to perform different functions.

♦Target definition. Defines the target table or file.

♦Links. Connect sources, targets, and transformations so the Integration Service can move the data as it transforms it.

• A mapping can also contain one or more mapplets. A mapplet is a set of transformations that you build in the Mapplet Designer and can use in multiple mappings.

Page 8: PowerCenter Basic Concepts

8

Example

• Give me an Excel file with Total Order Amount per Customer. I also need to know when this data was extracted (date) and the customer type initial ( first letter of the customer type)

• Define the sources• Orders

• Customers

• Define any required transformation• Sum of order amount

• Get extracted date

• Get first letter of customer type

• Create the file

Page 9: PowerCenter Basic Concepts

9

Transformations

• Generate, modify, or pass data

• Data passes into and out of transformations through ports that you link in a mapping

• Passive transformations do not change the number of rows received

• Active transformations can change the number of rows received

Unit 1

Page 10: PowerCenter Basic Concepts

10

PowerCenter Transformations (partial list)

Source Qualifier: reads data from flat file and relational sources

Expression: performs row-level calculations

Filter: drops rows conditionally

Sorter: sorts data

Aggregator: performs aggregate calculations

Joiner: joins heterogeneous sources

Lookup: looks up values and passes them to other objects

Update Strategy: tags rows for insert, update, delete, reject

Router: routes rows conditionally

Transaction Control: allows data-driven commits and rollbacks

Page 11: PowerCenter Basic Concepts

11

Advanced PowerCenter Transformations

Union: Performs a union-all join between two data streams

Java: allows Java syntax to be used within PowerCenter

Midstream XML Parser: reads XML from anywhere in mapping

Midstream XML Generator: writes XML to anywhere

More Source Qualifiers: read from XML, message queues and applications

Page 12: PowerCenter Basic Concepts

12

Mapplet – Set of transformation that can be reusable

MappletInput & Output

transformations (pass data from or to mapping)

Mapplet Designer Tool

Unit 14

Page 13: PowerCenter Basic Concepts

13

Example: Data Sources Defined Outside Mapplet

Source data defined outside the Mapplet

Mapplet

Mapplet Input transformation

Mapping

Mapplet Output transformation

Unit 14

Page 14: PowerCenter Basic Concepts

14

Recap

1. ETL

2. Designer

3. Mapping

4. Transformation

5. Mapplet

a. Extract, transform and load data

b. Create mapping objects

c. Logically defines the ETL process

d. Generates or manipulates data

− Set of transformations that can be reused in multiple mappings

Page 15: PowerCenter Basic Concepts

15

Workflow Manager Tools – Create and Start Workflow

Create reusable tasks Create worklets Create workflows

Page 16: PowerCenter Basic Concepts

16

Task

• An executable set of actions, functions or commands

• Examples:

Session task runs a mapping

Command task runs a shell script

Email task sends an email

Decision task branches workflow conditionally

Timer task waits for a specified period

Page 17: PowerCenter Basic Concepts

17

Session

• Task that executes a mapping

• Define Log Options, Error handling, Connections

Page 18: PowerCenter Basic Concepts

18

Decision Task

� Tests for a condition during the workflow and sets a flag based on the condition

� Use a link condition (or a Control task) downstream to test the flag and control execution flow

� Can use workflow variables in condition

Options on all tasks to fail parent

and disable

Treat inputs as AND/OR

Unit 16

Page 19: PowerCenter Basic Concepts

19

Email Task

� Sends an email within a workflow

Note: emails can also be sent post-session in a Session task

� Can be used with a link condition to notify success or failure of prior tasks

Unit 16

Page 20: PowerCenter Basic Concepts

20

Event Wait Task

� Pauses processing of the pipeline until a specified event occurs

� Events can be:

� Pre-defined – file watch

� User-defined – created by an Event Raise task elsewhere in the workflow

Unit 17

Page 21: PowerCenter Basic Concepts

21

Event Wait Task (cont’d)

Events Tab

User-defined events must be declared in the workflow Events tab

Specify either a pre-defined or user-defined event

Page 22: PowerCenter Basic Concepts

22

Event Raise Task

� Sets the location of a user-defined event in the workflow

� User-defined events are triggered when the PowerCenter Server executes the Event Raise Task

� User-defined events must be declared in the workflow Events tab

Used with the Event Wait Task

Page 23: PowerCenter Basic Concepts

23

Command Task

� Specifies one or more UNIX command or shell script, DOS command or batch file for Integration Services to run during a workflow

Note: UNIX and DOS commands

can also be run pre- or post-session in a Session task

� Command task status (success or failure) is held in the task-specific variable $command_task_name.STATUS

Page 24: PowerCenter Basic Concepts

24

Command Task (cont’d)

Add Cmd

Remove Cmd

Page 25: PowerCenter Basic Concepts

25

• Session, Email and Command tasks can be reusable

• Use the Task Developer to create reusable tasks

• Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow

In a workflow, a reusable task is indicated by a special symbol

Reusable Tasks

Unit 17

Page 26: PowerCenter Basic Concepts

26

Worklet

� An object representing a set or grouping of Tasks

� Can contain any Task available in the Workflow Manager

� Worklets expand and execute inside a Workflow

� A Workflow which contains a Worklet is called the “parent Workflow”

� Worklets CAN be nested

� Reusable Worklets – create in the Worklet Designer

� Non-reusable Worklets – create in the Workflow Designer

Unit 18

Page 27: PowerCenter Basic Concepts

27

Workflow

• A collection of ordered tasks

• Tasks can be linked sequentially, concurrently and/or combined

• Links can be conditional on previous tasks completing

Unit 1

Page 28: PowerCenter Basic Concepts

28

Workflow Structure

• Workflow 1

• Session 1

• Worklet A

• Session A1

• Session A2

• Session A3

• Worklet B � Worklet C

Session B1�Session B2 Session C1

Session C2

1

123

1 234

Page 29: PowerCenter Basic Concepts

29

Workflow Schedule

•Workflow can be scheduled to run continuously, repeat at a given time or interval, or start manually. •The Integration Service runs a workflow unless the prior workflow run fails. •When a workflow fails, the Integration Service removes the workflow from the schedule, and you must reschedule it

Page 30: PowerCenter Basic Concepts

30

Workflow Monitor

• Check Workflow Status

• Recover Workflow

• Get session log

Page 31: PowerCenter Basic Concepts

31

Recap

1. Workflow

2. Worklet

3. Task

4. Workflow Manager

5. Workflow Monitor

a. A collection of ordered tasks

b. Set of tasks

c. An executable mapping, functions or commands

d. Create and start workflows

e. Monitor and control workflows

Unit 1

Page 32: PowerCenter Basic Concepts

32

PowerCenter Architecture

Sources Targets

Repository

Integration Service

Repository Service Process

Repository Service

Domain

Administration Console

PowerCenter Client

Page 33: PowerCenter Basic Concepts

33

Architecture – Components

• Domain is a collection of nodes and services. Primary unit of administration

• The Repository Service manages connections to the PowerCenter repository from client applications. The Repository Service is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. The Repository Service ensures the consistency of metadata in the repository.

• The Integration Service reads mapping and session information from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules that you configure in the mapping. The Integration Service loads the transformed data into the mapping targets.

• The Administration Console is a web application that you use to manage a PowerCenter domain. If you have a user login to the domain, you can access the Administration Console. Use the Administration Console to perform administrative tasks such as managing logs, user accounts, and domain objects. Domain objects include services, nodes, and licenses.

• The PowerCenter repository resides in a relational database. The repository database tables contain the instructions required to extract, transform, and load data. PowerCenter Client applications access the repository database tables through the Repository Service.

Page 34: PowerCenter Basic Concepts

34

Metadata

• Defines data and processes

• Examples:

• Source and target definitions

• Type (flat file, database table, XML file, etc)

• Datatype (character string, integer, decimal, etc)

• Other attributes (length, precision, etc.)

• Mapping logic

• Workflow logic

• Stored in a metadata repositoryRepository

Page 35: PowerCenter Basic Concepts

35

Recap

1. Metadata

2. Repository

3. Repository Manager

4. Integration Service

a. Defines data and processes

b. Collection of tables that contains PowerCenter metadata

c. Repository organization and security

d. ETL processing engine

Match the terms and explanations:

Unit 1

Page 36: PowerCenter Basic Concepts

36

Where do we use PowerCenter?

• Data Warehouse(SalesVision) and Data Mart (Horizon) Loads

• Customer Hub Load

• Interfaces –

• PowerCafe Orders � Peoplesoft

• Magic Leads�PowerCafe

• Customer Portal Online Support Access�Atlas

• ADS Sales Rep Accounts�SalesPortal LDAP

Page 37: PowerCenter Basic Concepts

37

PowerCenter Connect Options

Packaged Applications and Systems

Databases and Flat Files

Messaging and Standards

Hierarchical* Software as a Service (SaaS)

Hyperion Essbase DB2 HTTP Adabas salesforce.com

Lotus Notes Flat files IBM MQSeries C-ISAM

PeopleSoft Informix JMS Complex flat files

SAP Netweaver BW Netezza LDAP Datacom

SAS SQL Server MSMQ IDMS

Siebel Sybase ODBC IMS

Teradata TIBCO Rendezvous VSAM

Web logs webMethods

Web Services

XML

Page 38: PowerCenter Basic Concepts

38

Questions?