PowerCenter Basic Concepts

download PowerCenter Basic Concepts

of 38

  • date post

    07-Nov-2014
  • Category

    Documents

  • view

    21
  • download

    6

Embed Size (px)

description

DW

Transcript of PowerCenter Basic Concepts

PowerCenter Basic ConceptsAle Ribeiro June 6, 2006

1

Agenda What is PowerCenter? PowerCenter Client Applications Demo PowerCenter Designer, Workflow Manager, Workflow Monitor PowerCenter Architecture

Where do we use PowerCenter in IT? Q&A

2

PowerCenter Is a single, unified enterprise data integration platform that allows companies and government organizations of all sizes to access, discover, and integrate data from virtually any business system, in any format, and deliver that data throughout the enterprise at any speed An ETL Tool (Extract, Transform and Load)

3

PowerCenter Client ApplicationsAdministrationAdministration Console

Development

Repository ManagerManage repository connections folders objects users and groups

Administration Console (browser-based)Perform domain and repository service tasks: Create/configure nodes and repository services Upgrade/delete Start/stop Backup/restore

Designer

Workflow Manager

Workflow MonitorMonitor and control workflows

Create ETL Create and mappings start workflows

4

Designer Tools Create mappings

Target Transformation Mapplet Source Designer: Developer: Designer: Analyzer: create create source create target create reusable objects transformations mapplets objects

Mapping Designer: create mappings

5

MappingLogically Defines the ETL Process: Reads data from sources Applies transformation logic to data Writes transformed data to targets

Source

Transformations

Target

Note: Sources and targets can be flat files, relational tables, XML files, application systems, message queues, etc

Unit 1

6

Mapping (contd) A mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation. Mappings represent the data flow between sources and targets. When the Integration Service runs a session, it uses the instructions configured in the mapping to read, transform, and write data. Every mapping must contain the following components:Source definition. Describes the characteristics of a source table or file. Transformation. Modifies data before writing it to targets. Use different transformation objects to perform different functions. Target definition. Defines the target table or file. Links. Connect sources, targets, and transformations so the Integration Service can move the data as it transforms it.

A mapping can also contain one or more mapplets. A mapplet is a set of transformations that you build in the Mapplet Designer and can use in multiple mappings.

7

Example Give me an Excel file with Total Order Amount per Customer. I also need to know when this data was extracted (date) and the customer type initial ( first letter of the customer type) Define the sources Orders Customers

Define any required transformation Sum of order amount Get extracted date Get first letter of customer type

Create the file

8

Transformations Generate, modify, or pass data Data passes into and out of transformations through ports that you link in a mapping Passive transformations do not change the number of rows received Active transformations can change the number of rows received

Unit 1

9

PowerCenter Transformations (partial list)Source Qualifier: reads data from flat file and relational sources Expression: performs row-level calculations Filter: drops rows conditionally Sorter: sorts data Aggregator: performs aggregate calculations Joiner: joins heterogeneous sources Lookup: looks up values and passes them to other objects Update Strategy: tags rows for insert, update, delete, reject Router: routes rows conditionally Transaction Control: allows data-driven commits and rollbacks10

Advanced PowerCenter TransformationsUnion: Performs a union-all join between two data streams Java: allows Java syntax to be used within PowerCenter Midstream XML Parser: reads XML from anywhere in mapping Midstream XML Generator: writes XML to anywhere

More Source Qualifiers: read from XML, message queues and applications

11

Mapplet Set of transformation that can be reusable

Mapplet Input & Output transformations (pass data from or to mapping)

Mapplet Designer Tool

Unit 14

12

Example: Data Sources Defined Outside MappletMapping

Source data defined outside the Mapplet

Mapplet

Mapplet Input transformation

Mapplet Output transformation

Unit 14

13

Recap1. 2. 3. 4. 5. ETL Designer Mapping Transformation Mapplet

a. b. c. d.

Extract, transform and load data Create mapping objects Logically defines the ETL process Generates or manipulates data Set of transformations that can be reused in multiple mappings

14

Workflow Manager Tools Create and Start Workflow

Create reusable tasks

Create worklets

Create workflows

15

Task An executable set of actions, functions or commands Examples:Session task runs a mapping Command task runs a shell script Email task sends an email Decision task branches workflow conditionally Timer task waits for a specified period

16

Session Task that executes a mapping Define Log Options, Error handling, Connections

17

Decision TaskTests for a condition during the workflow and sets a flag based on the condition Use a link condition (or a Control task) downstream to test the flag and control execution flow Can use workflow variables in condition

Options on all tasks to fail parent and disable

Treat inputs as AND/OR

Unit 16

18

Email TaskSends an email within a workflowNote: emails can also be sent post-session in a Session task

Can be used with a link condition to notify success or failure of prior tasks

Unit 16

19

Event Wait TaskPauses processing of the pipeline until a specified event occurs Events can be: Pre-defined file watch User-defined created by an Event Raise task elsewhere in the workflow

Unit 17

20

Event Wait Task (contd)Events Tab

Specify either a pre-defined or user-defined event

User-defined events must be declared in the workflow Events tab

21

Event Raise TaskSets the location of a user-defined event in the workflow User-defined events are triggered when the PowerCenter Server executes the Event Raise Task User-defined events must be declared in the workflow Events tab

Used with the Event Wait Task22

Command TaskSpecifies one or more UNIX command or shell script, DOS command or batch file for Integration Services to run during a workflowNote: UNIX and DOS commands can also be run pre- or postsession in a Session task

Command task status (success or failure) is held in the task-specific variable$command_task_name.STATUS

23

Command Task (contd)

Add Cmd Remove Cmd

24

Reusable Tasks Session, Email and Command tasks can be reusable Use the Task Developer to create reusable tasks Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow

In a workflow, a reusable task is indicated by a special symbol

Unit 17

25

WorkletAn object representing a set or grouping of Tasks Can contain any Task available in the Workflow Manager Worklets expand and execute inside a Workflow A Workflow which contains a Worklet is called the parent Workflow Worklets CAN be nested Reusable Worklets create in the Worklet Designer Non-reusable Worklets create in the Workflow Designer

Unit 18

26

Workflow A collection of ordered tasks Tasks can be linked sequentially, concurrently and/or combined Links can be conditional on previous tasks completing

Unit 1

27

Workflow Structure Workflow 11 1 2 3 Session 1 Worklet A Session A1 Session A2 Session A3

Worklet BSession B1 Session B2

Worklet C 2Session C1 Session C2

1

3 4

28

Workflow ScheduleWorkflow can be scheduled to run continuously, repeat at a given time or interval, or start manually. The Integration Service runs a workflow unless the prior workflow run fails. When a workflow fails, the Integration Service removes the workflow from the schedule, and you must reschedule it

29

Workflow Monitor Check Workflow Status Recover Workflow Get session log

30

Recap1. 2. 3. 4. 5. Workflow Worklet Task Workflow Manager Workflow Monitor

a. b. c. d. e.

A collection of ordered tasks Set of tasks An executable mapping, functions or commands Create and start workflows Monitor and control workflows

Unit 1

31

PowerCenter ArchitectureDomainSourcesIntegration Service

Targets

Repository Service Repository Service Process

Administration Console

PowerCenter Client

Repository

32

Architecture Components Domain is a collection of nodes and services. Primary unit of administration The Repository Service manages connections to the PowerCenter repository from client applications. The Repository Service is a separate, multi-threaded process that retrieves, inserts, and updates metadata in the repository database tables. The Repository Service ensures the consistency of metadata in the repository. The Integration Service reads mapping and session information from the repository. It extracts data from the mapping sources and stores the data in memory while it applies the transformation rules that you configure in the mapping. The Integration Service loads the transformed data into the mapping targets. The Adm