9836455 DataStage Introduction

88
06/09/22 shakthidhar bommireddy 1 DataStage Enterprise Edition

description

dsintro

Transcript of 9836455 DataStage Introduction

Page 1: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 1

DataStageEnterprise Edition

Page 2: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 2

Introduction to DataStage EEPart 1

Page 3: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 3

Ascential Platform

Page 4: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 4

What is DataStage?

o Design jobs for Extraction, Transformation, andloading (ETL)

o Ideal tool for data integration projects - such as,data warehouses, data marts, and systemmigrations

o Import, export, create, and managed metadata foruse within jobs

o Schedule, run, and monitor jobs all withinDataStage

o Administer your DataStage development andexecution environments

Page 5: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 5

DataStage Server and Clients

Page 6: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 6

Datastage Administrator

Page 7: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 7

Datastage Administrator

Use the Administrator to specify general server defaults, add and delete projects, and to set project properties.

Use the Administrator Project Properties window to: Set job monitoring limits and other Director defaults on the General

tab. Set user group privileges on the Permissions tab. Enable or disable server-side tracing on the Tracing tab. Specify a user name and password for scheduling jobs on the

Schedule tab. Specify hashed file stage read and write cache sizes on the Tunables

tab.

Page 8: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 8

Client Logon

Page 9: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 9

DataStage Manager

Page 10: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 10

DataStage Manager

Use the Manager to store and manage reusable metadata for the jobs you define in the Designer. This metadata includes table and file layouts and routines for transforming extracted data.

Manager is also the primary interface to the DataStage repository. In addition to table and file layouts, it displays the routines, transforms, and jobs that are defined in the project. Custom routines and transforms can also be created in Manager.

Page 11: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 11

DataStage Designer

Page 12: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 12

DataStage Designer

The DataStage Designer allows you to use familiar graphical point-and-click techniques to develop processes for extracting, cleansing, transforming, integrating and loading ,data into warehouse tables.

The Designer provides a "visual data flow" method to easily interconnect and configure reusable components.

Page 13: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 13

DataStage Director

Use the Director to validate, run, schedule, and monitor your DataStage jobs. You can also gather statistics as thejob runs.

Page 14: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 14

Developing in DataStage Define your project's properties: Administrator Open (attach to) your project Import metadata that defines the format of data stores your jobs

wil read from or write to: Manager Design the job: Designer

· - Define data extractions (reads)· - Define data flows.- Define data integration· - Define data transformations· - Define data constraints· - Define data loads (writes)· - Define data aggregations

Compile and debug the job: Designer Run and monitor the job: Director

Page 15: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 15

DataStage Projects

Page 16: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 16

DataStage Projects

Page 17: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 17

Review

o DataStage Designer is used build and compileyour ETL jobs.

o Manager is used to execute your Jobs after you build them.o Director is used to execute your jobs after you

build them.o Administrator is used to set global and project

properties.

Page 18: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 18

Intro Part 2:Configuring Projects

Page 19: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 19

Module Objectives

After this module you will be able to:- Explain how to create and delete projects- Set project properties in Administrator- Set EE global properties in Administrator

Page 20: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 20

Project Properties Projects can be created and deleted in Administrator. Project properties and defaults are set in Administrator.

Recall from module 1: In DataStage all development work is donewithin a project. Projects are created during installation and after installationusing Administrator.

Each project is associated with a directory. The directory stores the objects(jobs, metadata, custom routines, etc.) created in the project.

Before you can work in a project you must attach to it (open it).

You can set the default properties of a project using DataStage Administrator.

Page 21: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 21

Setting Project Properties

Page 22: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 22

Licensing Tab

Page 23: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 23

Projects General Tab

Page 24: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 24

Projects General TabClick Properties on the DataStage Administration window to open theProject Properties window. There are nine tabs. (The Mainframe tab isonly enabled if your license supports mainframe jobs.) The default isthe General tab.

If you select the Enable job administration in Director box, you can performsome administrative functions in Director without opening Administrator.

When a job is run in Director, events are logged describing the progress of thejob. For example, events are logged when a job starts, when it stops, and whenit aborts. The number of logged events can grow very large. The Auto-purge ofjob log box tab allows you to specify conditions for purging these events.

You can limit the logged events either by number of days or number of jobruns.

Page 25: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 25

Environment Variables

Page 26: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 26

Permissions Tab

Page 27: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 27

Permissions Tab

Use this page to set user group permissions for accessing and using

DataStage. All DataStage users must belong to a recognized user role

before they can log on to DataStage. This helps to preventunauthorized access to DataStage projects.

There are three roles of DataStage user: DataStage Developer, who has full access to all areas of a

DataStage project. DataStage Operator, who can run and manage released

DataStage jobs. <None>, who does not have permission to log on to DataStage.

UNIX note: In UNIX, the groups displayed are defined in /etc/group.

Page 28: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 28

Tracing Tab

Page 29: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 29

Tracing Tab This tab is used to enable and disable server-side tracing.

The default is for server-side tracing to be disabled. When you enable it, information about server activity is recorded for any clients that subsequently attach to the project. This information is written to trace files. Users with in-depth knowledge of the system software can use it to help identify the cause of a client problem. If tracing

is enabled, users receive a warning message whenever they invoke a DataStage client.

Warning: Tracing causes a lot of server system overhead. This should only be used to diagnose serious problems.

Page 30: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 30

TunablesTab

On the Tunables tab, you can specify the sizes of the memory caches used when reading rows in hashed files andWhen writing rows to hashed files. Hashed files are mainly used for lookups and are discussed in a later module.

Page 31: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 31

Parallel Tab

You should enable OSH for viewing - OSH is generated when you compile a job.

Page 32: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 32

Intro Part 3: Managing Meta Data

Page 33: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 33

Module Objectives

After this module you will be able to:- Describe the DataStage Manager components and functionality- Import and export DataStage objects- Import metadata for a sequential file

Page 34: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 34

What Is Metadata?

Metadata is "data about data" that describes the formats of sources and targets. This includes general format information such as whether the record columns are delimited and, if so, the delimiting character. It also includes the specific column definitions.

Page 35: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 35

DataStage Manager

DataStage Manager is a graphical tool for managing the contents of your DataStage project repository, which contains metadata and other DataStage components such as jobs and routines.The left pane contains the project tree. There are seven main branches, but you can create subfolders under each. Select a folder in the project tree to display its contents.

Page 36: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 36

Manager Contents

Metadata describing sources and targets: Table definitions

DataStage objects: jobs, routines, table, definitions, etc..

Page 37: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 37

Import and Export

Any object in Manager can be exported to a file Can export whole projects Use for backup Sometimes used for version control Can be used to move DataStage objects from one

project to another Use to share DataStage jobs and projects with other developers

Page 38: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 38

Export Procedure

In Manager, click "Export>DataStage Components" Select DataStage objects for export Specified type of export: DSX, XML Specify file path on client machine

Page 39: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 39

Review Q

You can export DataStage objects such as jobs, but you can't export metadata, such as field definitions of a sequential file. (T/F)

The directory to which you export is on the DataStage client machine, not on the

DataStage server machine. (T/F)

Page 40: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 40

Exporting DataStage Objects

Page 41: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 41

Exporting DataStage Objects

Page 42: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 42

Import Procedure

In Manager, click "lmport>DataStageComponents“

Select DataStage objects for import

Page 43: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 43

Importing DataStage Objects

Page 44: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 44

Import Options

Page 45: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 45

Metadata Import

Import format and column destinations from sequential files

Import relational table column destinations Imported as "Table Definitions" Table definitions can be loaded into job

stages

Page 46: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 46

Sequential File Import Procedure In Manager, click Import>Table Definitions>Sequential File Definitions Select directory containing sequential file and

then the file Select Manager category Examined format and column definitions and

edit is necessary

Page 47: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 47

Manager Table Definition

In Manager, select the category (folder) that contains the table definition. Double-click the table definition to open the Table Definition window.Click the Columns tab to view and modify any column definitions. Select the Format tab to edit the file format specification.

Page 48: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 48

Importing Sequential Metadata

Page 49: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 49

Intro Part 4: Designing and Documenting Jobs

Page 50: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 50

Module Objectives

After this module you will be able to: -Describe what a DataStage job is- List the steps involved in creating a job- Describe links and stages- Identify the different types of stages- Design a simple extraction. and load job- Compile your job- Create parameters to make your job flexible- Document your job

Page 51: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 51

What Is a Job?

Executable DataStage program Created in DataStage Designer, but can use

components from Manager Built using a graphical user interface Compiles into Orchestrate shell language

(OSH)

Page 52: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 52

Job Development Overview

In Manager, import metadata defining sources and targets In Designer, add stages defining data extractions

and loads Add Transformers and other stages to defined data transformations Add links defining the flow of data from sources to targets Compile the job In Director, validate, run, and monitor your job

Page 53: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 53

Designer Work Area

Page 54: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 54

Designer Toolbar

Page 55: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 55

Tools Palette

Page 56: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 56

Adding Stages and Links

Stages can be dragged from the tools palette or from the stage type branch of the repository view

Links can be drawn from the tools palette or by right clicking and dragging from one stage to another

Page 57: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 57

Sequential File Stage

Used to extract data from, or load data to, a sequential file

Specify full path to the file Specify a file format: fixed width or delimited Specified column definitions Specify write action

Page 58: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 58

Designer - Create New Job

Page 59: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 59

Drag Stages and Links Using Palette

Page 60: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 60

Assign Meta Data

Meta data may be dragged from the repository and dropped on a link.

Page 61: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 61

Editing a Sequential Source Stage

Page 62: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 62

Editing a Sequential Source Stage Any required properties that are not completed will appear in red. You are defining the format of the data flowing out of the stage,

that is, to the output link. Define the output link listed in the Output name box. You are defining the file from which the job will read. If the file

doesn't exist, you will get an error at run time. On the Format tab, you specify a format for the source file. You will be able to view its data using the View data button. Think of a link as like a pipe. What flows in one end flows out the

other end (at the transformer stage).

Page 63: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 63

Editing a Sequential Target

Page 64: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 64

Editing a Sequential Target Defining a sequential target stage is similar to defining a sequential

source stage. You are defining the format of the data flowing into the stage, that is,

from the input links. Define each input link listed in the Input name box. You are defining the file the job will write to. If the file doesn't exist, it will

be created. Specify whether to overwrite or append the data in the Update action set of buttons.

On the Format tab, you can specify a different format for the target file than you specified for the source file.

If the target file doesn't exist, you will not (of course!) be able to view its data until after the job runs. If you click the View data button, DataStage will return a "Failed to open ..." error.

The column definitions you defined in the source stage for a given (output) link will appear already defined in the target stage for the corresponding (input) link.

Think of a link as like a pipe. What flows in one end flows out the other end. The format going in is the same as the format going out.

Page 65: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 65

Transformer Stage

Used to define constraints, derivations, and column mappings

A column mapping maps an input column to an output column

In this module will just defined column mappings (no derivations)

Page 66: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 66

Transformer Stage Elements

Page 67: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 67

Transformer Stage Elements

There are two: transformer and basic transformer. Both look the same

but access different routines and functions.

Notice the following elements of the transformer: The top, left pane displays the columns of the input links. The top, right pane displays the contents of the stage variables. The lower, right pane displays the contents of the output link.

Unresolved column mapping will show the output in red. For now, ignore the Stage Variables window in the top, right

pane. This will be discussed in a later module. The bottom area shows the column definitions (metadata) for the

input and output links.

Page 68: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 68

Create Column Mappings

Page 69: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 69

Creating Stage Variables

Stage variables are used for a variety of purposes:CountersTemporary registers for derivationsControls for constraints

Page 70: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 70

Result

Page 71: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 71

Adding Job Parameters

Makes the job more flexible Parameters can be:

- Used in constraints and derivations- Used in directory and file names

Parameter values are determined at run time

Page 72: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 72

Adding Job Documentation

Job Properties- Short and long descriptions- Shows in Manager

Annotation stage- Is a stage on the tool palette- Shows on the job GUI (work area)

Page 73: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 73

Job Properties Documentation

Page 74: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 74

Annotation Stage on the Palette

Page 75: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 75

Annotation Stage Properties

Page 76: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 76

Final Job Work Area withDocumentation

Page 77: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 77

Compiling a Job

Before you can run your job, you must compile it. To compile it, click File>Compile or click the Compile button on the tool bar. The Compile Job window displays the status of the compile.

A compile will generate OSH.

Page 78: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 78

Errors or Successful Message

If an error occurs:Click Show Error to identify the stage where the error occurred. This will highlight the stage in error.Click More to retrieve more information about the error. This can be lengthy for parallel jobs,

Page 79: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 79

Intro Part 5: Running Jobs

Page 80: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 80

Module Objectives

After this module you will be able to:- Validate your job- Use DataStage Director to runyour job- Set to run options- Monitor your job's progress- View job log messages

Page 81: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 81

Prerequisite to Job Execution

Page 82: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 82

DataStage Director

Can schedule, validating, and run jobs Can be invoked from DataStage Manager or

Designer Tools > Run Director

Page 83: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 83

Running Your Job

This shows the Director Status view. To run a job, select it and then click Job>Run Now.Better yet:Shift to log view from main Director screen. Then click green arrow to execute job.

Page 84: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 84

Run Options - Parameters andLimits

Page 85: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 85

Run Options - Parameters andLimitsThe Job Run Options window is displayed when you click Job>RunNow.This window allows you to stop the job after: A certain number of rows. A certain number of warning messages.You can validate your job before you run it. Validation performs somechecks that are necessary in order for your job to run successfully.These include: Verifying that connections to data sources can be made. Verifying that files can be opened. Verifying that SOL statements used to select data can be prepared.Click Run to run the job after it is validated. The Status column displaysthe status of the job run.

Page 86: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 86

Director Log View

Click the Log button in the toolbar to view the job log. The job log records events that occur during the execution of a job.These events include control events, such as the starting, finishing, and aborting of a job; informational messages; warning messages; error messages; and program-generated messages.

Page 87: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 87

Message Details are Available

Page 88: 9836455 DataStage Introduction

04/22/23 shakthidhar bommireddy 88

Other Director Functions

Schedule job to run on a particular date/time Clear job log Set Director options

- Row limits- Abort after x warnings