Informatica Overview - Find Skills Inc.findskills.com/files/Informatica_Overview.… · PPT...
Transcript of Informatica Overview - Find Skills Inc.findskills.com/files/Informatica_Overview.… · PPT...
Informatica Overview
Contents
• Introduction• Clients• Server(s)• Source, Target, Repository• Connectivity
What is Informatica?
• Allows you to load data into a centralized location, such as a datamart or data warehouse.
• ETL Tool– Extract data from multiple sources– Transform the data according to business logic
and needs– Load the transformed data into file and
relational targets
Example
EMP_ID
EMP_NAME
EMP_CITY
EMP_STATE
EMP_COUNTRY EMP_DATE_OF_JOINING
EMP_ID
EMP_NAME
EMP_CITY
EMP_STATE
EMP_COUNTRY
YRS_OF_SRV
EMPLOYEE EMP_DETAILS
Transform Date of Joining to Yrs of Service
Data Warehousing
Data Warehouse
ExtractTransform
LoadOperational Sources
Metadata
End UserDeveloper
Informatica Architecture Design Process Client Tool Review
– Repository Manager– Designer – Server Manager
DesignerServer ManagerRepository Manager
Server
Repository
Informatica’s Architecture
Targets 1-nSources 1-nPowerPlugs
Data Models
Informatica Design Process
5.
2.
3.
1.
Sessions
Source Def
Target Def
Mapping
4.
1. Create Repository2. Import Source Definitions3. Create Target Schema4. Create Mappings5. Load Data
Informatica Client
Repository Manager – Can view much of the metadata in the Repository through the Repository Manager.
Designer – Create Source-to-Target mappings that contain transformation instructions for the Informatica Server.
Server Manager – Create, schedule, and monitor sessions. You create a session based on a transformation and schedule it to run on the Informatica Server.
Informatica Client
Repository Manager
Metadata Repository
Information about the data mart system Catalogs the repository Directs the server Contains record of user access Can be shared Can be searched and reported Bridged through Metadata Exchange
Repository Manager
Navigator Window
Output Window
Dependency Window
Analysis Window
Folder Attributes
• FOLDER OWNER - user who serves as focal point for folder permissions
• PERMISSIONS - rights to read, write, and/or execute objects in a folder
• SHARED - property that allows you to make shortcuts to objects in a folder
• SHORTCUT - a dynamic link to an object stored in a shared folder
• VERSIONS - folder iterations that indicate development stages
Informatica Client
Designer
Designer Workspace
Output Window
Workspace
Workbook Tabs
Navigator
Status Bar
Open Folder List
Designer Options
Tablescolumns viewedcolumn sizeobject sizeobject colors
Generalworkspace optionsreload objects on opengroup source definitions
Formatworkspace colorsimport keysautomatic Source
Qualifier creation
Informatica Client
Server Manager
Server Manager
Monitor Window
Navigator
Configure Window
Output Window
Designer
• Source Analyzer• Warehouse Designer• Transformation Developer• Mapplet Designer• Mapping Designer
Source Analyzer
• Identify the sources used to build the warehouse. • Create repository definitions for these sources
Analyzing Sources Relational – Oracle, Sybase, Informix, IBM DB2,
Microsoft SQL Server, and Teradata
File – Fixed and delimited flat file, COBOL file, and XML
Other – Microsoft Excel, Microsoft Access
Extended – PeopleSoft, SAP R/3, Sieble, and IBM MQSeries (need to purchase additional products for these sources)
Mainframe – Need to purchase additional products.
Warehouse Designer
• Create relational tables in Target database• Edit target definitions• Preview relational target data
Targets
• Relational – Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata
• File – Fixed and delimited flat files and XML• Extended – SAP BW, IBM MQ Series (need to purchase
additional products for these targets)• Other - Microsoft Access
Mixing Sources and Targets
• You can combine data from different platforms and source types.
Sybase
Oracle
Flat File
Transformation Developer
• Generates ,modifies, passes data through ports• 12 objects for transforming data
Transformations Types• Source Qualifier represents all data queried from the
source
• Normalizer normalizes records from VSAM or relational sources
• Expression performs simple calculations• Filter serves as a conditional filter• Aggregator performs aggregate calculations• Rank limits records to top or bottom range
Transformations Types contd…• Update Strategy allows for logic to insert, update,
delete, or reject data• Lookup looks up values and passes to other
objects • Stored Procedure calls a stored procedure and
captures return values• External Procedure calls a procedure defined in a
shared library• Sequence Generator generates unique ID values• Joiner allows for heterogeneous joins
SourceQualifier Transformation
• Represents records that Informatica server reads when it runs a session
• Automatically attached when a Source is added to a mapping
Use a Source Qualifier to:
Filter Records when the Informatica Server reads source data
Specify sorted ports
– Order by clause
Select only distinct values from a source
Create a custom query for the Informatica Server to read source data
Expression Transformation
Calculate values in a single row– Adjust employee salaries, concatenate first and
last names, convert string to number… Perform Any Non-Aggregate Calculations Test conditional statements before you output to
target
Example
EMP_ID
EMP_NAME
ROLE_CODE
BASIC_SALARY
EMP_ID
EMP_NAME
ROLE_CODE
GROSS_SALARY
EMPLOYEE EMP_SALARY
Gross Salary= Basic Salary * 3.5
Aggregator Transformation
Allows you to perform aggregate calculations, such as averages and sums
While the Expression is on a row-by-row basis, the aggregator can perform calculations on groups
Example
PU_CODE
PROJECT_CODE
REVENUE
PU_CODE
MAX_REVENUE
MIN_REVENUE
AVG_REVENUE
REVENUE PU_REVNUE
Aggregator Transformation
Max Revenue = Max (Revenue) Min Revenue = Min(Revenue) Avg Revenue = Avg (Revenue)
Filter Transformation
Provides the means for filtering rows in a mapping– Employees who are currently working in the
project “NML” of “WENA” as “SE”
Only rows that meet the condition pass through the mapping.
Filter Transformation
All ports are input/output Returns TRUE or FALSE for each row passed
through the mapping based on the condition Discarded rows do not appear in the session log
or reject files The input ports must only come from one
transformation
Filter vs Source Qualifier (SQ)
SQ provides better performance
SQ only lets you filter rows from relational sources, Filter Transformation filters rows from any source
SQ only uses standard SQL, Filter can use any statement or function that returns True/False
Example
EMP_ID
EMP_NAME
PROJECT_CODE
PU_CODE
ROLE_CODE
EMP_ID
EMP_NAME
PROJECT_CODE
PU_CODE
ROLE_CODE
EMPLOYEE NM_EMP_DETAILS
Filter Transformation
Where Project = “NML” and PU = “WENA” and Role = “SE”
Router Transformation Groups data into many groups Routes rows of data that do not meet any
condition to a default group Can enter any expression that returns a single
value Condition returns True or False for each row If the condition = NULL, the row is assumed as
FALSE
Router Transformation
• One Group can be connected to One transformation or target
• One Output Port can be connected to multiple transformations or targets
• Multiple Output ports in One Group can be connected to multiple transformations or targets
• CANNOT Connect more than One Group to One Transformation or Target
Lookup Transformation Looks up data in a relational table
– Can be the Source, Target, or any database that the Informatica Client and Server can connect to
– Lookup table can be a single table or can join multiple tables
Lookups can: – Get a related value (your source include Employee_ID
and you want Employee_Name),– Perform a calculation– Update a slowly changing dimension table (check if
records exist on a target)
Lookup Transformation For each input row, the Informatica Server queries
the lookup table based on the lookup ports and the condition in the transformation– The Informatica Server can return values from that
lookup (static cache)
– The Informatica Server inserts a row into the cache to flag rows as new or existing (dynamic cache)
OR
Connected and UnconnectedLookup Transformations
CONNECTED LOOKUP UNCONNECTED LOOKUPPart of the mapping data flow Separate from the mapping data flowReturns multiple values (bylinking output ports to anothertransformation)
Returns one value (by checking theReturn (R) port option for the outputport that provides the return value)
Executed for every recordpassing through thetransformation
Only executed when the lookupfunction is called
More visible, shows where thelookup values are used
Less visible, as the lookup is calledfrom an expression within anothertransformation
Default values are used Default values are ignored
Example
EMP_ID
EMP_NAME
PROJECT_CODE
EMP_ID
EMP_NAME
PROJECT_CODE
PROJECT_DESC
EMPLOYEE_PROJECT NM_EMP_DETAILS
LOOK UP Transformation Get
PROJECT.PROJECT_DESC Where PROJECT.PROJECT_CODE = NM_EMP_DETAILS.PROJECT_CODE
PROJECT_CODE PROJECT_DESC
PROJECT
Update Strategy Two Ways Of doing
Within a Session
Within a Mapping
Update Strategy Within a Session
Instruct the Informatica Server how to treat the rows when the session is configured
Within a Mapping
Use the update strategy transformation to flag records for insert, delete, update, or reject.
Constraint for each Database Operation
Operation Constant Numeric ValueInsert DD_INSERT 0
Update DD_UPDATE 1
Delete DD_DELETE 2Reject DD_REJECT 3
Joiner Transformation Active Transformation
Join two flat files
Join two tables from different databases
Join a flat file with a relational table
Transformation Overview Three views: Iconized View -- shows transformation in relation to mapping Normal View -- shows data flow through transformation Edit View -- shows transformation properties and allows for
editing
Transformation Overview
Data passes through
I/O ports unchanged
DATE_ENTERED passes into transformation through an input port.
It is used in MTH port to extract month. Month is passed through MTH output port to
another transformation.
Normal view shows data flow through the transformation
Transformation OverviewEdit view provides flexibility in defining transformation rules
-Switch between transformation
s-Enter comments
-Make reusable
Define transformation level properties
Define port level handling
Transformations and ExpressionsCalculation or conditional statement
Used in Expression, Aggregator, Rank, Filter, Update Strategy
Performs calculation based on ports, functions, operators, variables, literals, constants, and return values from other transformations
Mapplets
• Reusable Object• Include multiple transformations• Include Source definitions• Multiple groups of output ports
Move and transform data from sources to targets Includes
source definitions target definitions transformations.
Mapping
Source TargetTransformations
Mapping
Mapping Designer
Iconized
Mapping List
Status Bar
Transformation Toolbar
Validation Three different levels of validation:
Connection validation Expression validation Mapping validation
Connection Validation Connecting ports with mismatched datatypes Connecting output ports to a source Connecting a source to anything but a Source
Qualifier or Normalizer Connecting an output to a output, or an input
to a input Connecting more than one active
transformation to another transformation Copying columns to a target definition
Expression Validation
Parse the current expression, with remote port searching (references to a port in another transformation are resolved) Parse expression attributes such as filter condition, lookup condition, SQL Query, etc. Parse default values
Mapping Validation Mapping validation will take place with menu
commands: Mapping | Validate Repository | Save
Mapping validation will: Perform connection validation Perform expression validation Check the mapping flow validation
Data from Source Qualifier mapped to a target Targets are connected to transformations
Informatica Server
• Reads information from the Repository
• Extracts data from the Sources and stores the data in memory while it applies the transformation rules you created
• Loads the transformed data into the mapping targets
Transformation Process
Source Def
Target Def Mapping
Session
Server Manager
Server
Sources Targets
Repository
Session Metadatasource informationtarget informationmappingschedulingerror handlingpre- / post-session scriptstuning parametersoutput log informationtransformation overrides
Definitions
• Session - A set of instructions that tells the Informatica Server how and when to move data from sources to targets
• Batch - A group of Sessions which are to be run together
Server Manager
Monitor Window
Navigator
Configure Window
Output Window
Process Configure server Create session Run session Monitor session View logs Tune session
Server Configuration Server Variables
Establish default directories for files and caches.
Variables are server specific. Allows for easy deployment. Can be overridden at session level. Changing the variable updates sessions. Directories must exist prior to session
launch.
Server Output
.idx
.perf
Control
E-mailTarget
Data
EventLog
ErrorLog
.bad
.dat
.log
Source Settings - Session Wizard
Source Options... Source Type: Relational
Database name Source Type: File
fixed / delimited properties file list FTP properties
Source Database Source Type: Relational Database connection
Treat rows as: Source Type: Relational, File Tells server how to treat source rows Insert, update, delete, data driven Works in conjunction with Target Options
Source Select source type File, Relational, Heterogeneous
Target Settings - Session Wizard
Target Select target type File, Relational Flat File Options:
File properties FTP properties Loader properties
Target Options: Target type: Relational Tell server how to load target
Insert, Update Delete Truncate target Bulk, Normal, Test
Target Database Target Type: Relational Database connection
Launching Sessions Manual start - Manually launch a session
from within the Server Manager Session Schedule - Schedule a session using
business cycle start, stop, and repeat intervals Batching - Use batches to run session
concurrently or sequentially Event based - Configure a session to launch
based upon the appearance of an indicator file Command Line - Launch a session from the
command line prompt
After the Session Launches...
Poll/Refresh Session Status View Session Details View Performance Monitor View Logs Tune the Session Correct Session Problems
Select Server Requests | Session Details
Number of rows loaded/failed
Read/Write throughput
Most current Server message
Audit trail in repository or session log
Monitor Session
Performance Monitor Select Server Requests | Session Performance
Details or open file <sessionname>.perf
Need to configure session properties to save the performance detail counters
Help determine where session performance can be improved
Monitor Session
Log Files - Session Wizard
Log Files... Log file path and name Reject file path and name Session log archive options
Connectivity Overview