ETLquestions

8/8/2019 ETLquestions

1/21

1. Where do we use connected and unconnected lookups?

If return port is only one then go for unconnected. More than one returnport is not possible with Unconnected. If more than one returns port thengo for Connected.

2. What are the various test procedures used to check whetherthe data is loaded in the backend, performance of the mapping,and quality of the data loaded in INFORMATICA.

The best procedure to take a help of debugger where we monitor eachand every process of mappings and how data is loading based onconditions breaks

3. What is the difference between ETL tool and OLAP tools

ETL tool is meant for extraction data from the legacy systems and loadinto specified database with some process of cleansing data.

Eg: Informatica, data stage ....etc

OLAP is meant for Reporting purpose in OLAP data available inMultidimensional model. so that u can write simple query to extract datafro the data base.

Eg: Businee objects, Cognos....etc

ETL tools are used to extract the data from different sources & OLAP toolsare used to analyze the data

ETL tools are used to extract, transformation and loading the data intodata warehouse / data martOLAP tools are used to create cubes/reports for business analysis fromdata warehouse / data mart

4. What is ODS (operation data source)?

ODS - Operational Data Store.

ODS Comes between staging area & Data Warehouse. The data is ODSwill be at the low level of granularity.

Once data was populated in ODS aggregated data will be loaded into EDWthrough ODS.


2/21

ODS is the Operational Data Source which is also called transactional dataODS is the source of a warehouse. Data from ODs is staged, transformedand then moved to data warehouse.

An updateable set of integrated operational data used for enterprise- wide

tactical decision-making. Contains live data, not snapshots, and hasminimal history retained can we lookup a table from source qualifiertransformation. ie. unconnected lookup

You cannot lookup from a source qualifier directly. However, you canoverride the SQL in the source qualifier to join with the lookup table toperform the lookup.

5. What are the different Lookup methods used in Informatica?

In the lookup transformation mainly 2 types

1) connected 2)unconnected lookup

Connected lookup:

1) It receive the value directly from pipeline

2) It will use both dynamic and static

3) It return multiple value

4) It support user defined value

Unconnected lookup:

1)it receives the value : lkp expression

2)it will be use only dynamic

3)it return only single value

4)it does not support user defined values

What is a mapping, session, worklet, workflow, mapplet?

Session: A session is a set of instructions that tells the Informatica Server

how and when to move data from sources to targets.

Mapplet : Mapplet is the set of transformation which we can make for


3/21

reusability.It is a whole logic.

Workflow: it is the pipeline, which pass or flow the data from source to

target.

7. What is the difference between Power Center & Power Mart?

Power Mart is designed for:

Low range of warehouses only for local repositories mainly desktopenvironment.

we can connect to single and multiple Repositories, generally usedin big Enterprises.

Power mart is designed for:

High-end warehouses Global as well as local repositories ERP support.

Power Mart: we can connect to only a single Repository.

8. What are the various tools? - Name a few

The various ETL tools are as follows.

InformaticaData stageBusiness Objects Data Integrator

OLAP tools are as follows.CognosBusiness Objects

9. What are snapshots? What are materialized views?

Materialized view:

Answer 1.Materialized view is a view in which data is also stored in sometemp table.i.e if we will go with the View concept in DB in that we only

store query and once we call View it extract data from DB. But Inmaterialized View data is stored in some temp tables.

Answer 2. Materialized view means it stores pre calculated data, it is aphysical representation and it's occupies the space

Snapshot:


4/21

Answer 1. A snapshot is a table that contains the results of a query of oneor more tables or views, often located on a remote database.

Answer 2.Snapshot is a specific interval of data,

10. What is partitioning? What are the types of partitioning?

Partitioning is a part of physical data warehouse design that is carried outto improve performance and simplify stored-data management.Partitioning is done to break up a large table into smaller, independentlymanageable components because it:1. Reduces work involved with addition of new data.2. Reduces work involved with purging of old data.

Two types of partitioning are:1. Horizontal partitioning.

2. Vertical partitioning (reduces efficiency in the context of a datawarehouse).

11.What are the modules in Power Mart?

1. Power Mart Designer2. Server3. Server Manager4. Repository5. Repository Manager

12. What is a staging area? Do we need it? What is the purpose ofa staging area?

Staging area is place where you hold temporary tables on data warehouseserver. Staging tables are connected to work area or fact tables. Webasically need staging area to hold the data, and perform data cleansingand merging, before loading the data into warehouse.

In the absence of a staging area, the data load will have to go from theOLTP system to the OLAP system directly, which in fact will severelyhamper the performance of the OLTP system. This is the primary reason

for the existence of a staging area. In addition, it also offers a platform forcarrying out data cleansing.

According to the complexity of the business rule, we may require stagingarea, the basic need of staging area is to clean the OLTP source data andgather in a place.


5/21

13. How to determine what records to extract?

Data modeler will provide the ETL developer, the tables that are to beextracted from various sources.When addressing a table some dimension key must reflect the need for a

record to get extracted. Mostly it will be from time dimension (e.g. date>= 1st of current month) or a transaction flag (e.g. Order Invoiced Stat).Foolproof would be adding an archive flag to record, which gets resetwhen record changes.

Draw the inference if slowly changing dimension and based on the Type1/2 or3 tablesdefined.

14.What are the various transformation available?

Transformation plays an important role in Data warehouse.

Transformation is used when data is moved from source to destination.Depending upon criteria transformations are done. Some of thetransformations are

The Various Type Of Transformation In Informatica

Source Qualifier

Aggregate

Sequence Generator

Sorter

Router

Filter

Lookup

Update Strategy

Joiner

Normalizer

Expression

Rank

Stored Procedure

15.What is a three-tier data warehouse?

Three-tier data warehouse contains three tiers such as bottom tier, middletier and top tier.Bottom tierdeals with retrieving related data or information from variousinformation repositories by using SQL.Middle tiercontains two types of servers.1.ROLAP server


6/21

2.MOLAP serverTop tierdeals with presentation or visualization of the results.The 3 tiers are:1. Data tier - bottom tier - consists of the database2. Application tier - middle tier - consists of the analytical server

3. Presentation tier - tier that interacts with the end-user

16.How can we use mapping variables in Informatica? Where do weuse them?

After creating a variable, we can use it in any expression in a mapping ora mapplet. Also they can be used in source qualifier filter, user definedjoins or extract overrides and in expression editor of reusabletransformations.Their values can change automatically between sessions.

17.What are the various methods of getting incremental records or deltarecords from the source systems?

Getting incremental records from source systems to target can be doneby using incremental aggregation transformationOne foolproof method is to maintain a field called 'Last Extraction Date'and then impose a condition in the code saying 'current_extraction_date> last_extraction_date'.

Using mapping parameters and variable or type1 we can easily define

from where parameter will start and how variable will change as deltaswill get from OLTP systems

18.Can we use procedural logic inside Infromatica? If yes how, if now howcan we use external procedural logic in Infromatica?

We can use External Procedure Transformation to use externalprocedures. Both COM and Informatica Procedures are supported usingExternal procedure Transformation

Can we override a native sql query within Informatica? Where do we do it?

How do we do it?we can override a sql query in the sql override property of a sourcequalifier

19.What is latest version of Power Center / Power Mart?

The Latest Version is 9


7/21

20.How do we call shell scripts from informatica?

You can use a Command task to call the shell scripts, in the following ways:1. Standalone Command task. You can use a Command task anywhere in theworkflow or worklet to run shell commands.

2. Pre- and post-session shell command. You can call a Command task as thepre- or post-session shell command for a Session task. For more informationabout specifying pre-session and post-session shell commands

There is a task named command task, using that you can write or call Shellscript, DOS commands or BAT files21.What are active transformation / Passive transformations?

Transformations can be active or passive. An active transformation canchange the number of rows that pass through it, such as a Filter

transformation that removes rows that do not meet the filter condition. Apassive transformation does not change the number of rows that passthrough it, such as an Expression transformation that performs acalculation on data and passes all rows through the transformation

Active transformations

Advanced External ProcedureAggregatorApplication Source QualifierFilter

JoinerNormalizerRankRouterUpdate Strategy

Passive transformationExpressionExternal ProcedureMaplet- InputLookup

Sequence generatorXML Source QualifierMaplet - Output

22.When do we analyze the tables? How do we do it?


8/21

When the data in the data warehouse changes frequently we need toanalyze the tables. Analyze tables will compute/update the table statisticsthat will help to boost the performance of your SQL.

23.Compare ETL & Manual development?

There are pros and cons of both tools based ETL and hand-coded ETL. Toolbased ETL provides maintainability, ease of development and graphical viewof the flow. It also reduces the learning curve on the team.

Hand coded ETL is good when there is minimal transformational logicinvolved. It is also good when the sources and targets are in the sameenvironment. However, depending on the skill level of the team, this canextend the overall development time.

Can anyone please explain why and where do we exactly use the lookuptransformations?

You can use the Lookup transformation to perform many tasks, including:Get a related value. For example, your source includes employee ID, butyou want to include the employee name in your target table to make yoursummary data easier to read.Perform a calculation. Many normalized tables include values used in a

calculation, such as gross sales per invoice or sales tax, but not thecalculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookuptransformation to determine whether rows already exist in the target.

Lookup Transformation can be used mainly for slowly changing dimensionsand for getting related values

Look Up Transformation is generally used when a fixed data is not present inthe mappings we use but is required in the warehouse or look up is more

importantly used to compare the values...

Ex1) in the transactional data we have only name and custid .. but thecomplete name (with first and last is required by the biz user..) and there isa separate table (either in source or target data base) that has the first n lastnames in it.


9/21

Ex2) u need to compare the prices of the existing goods with its previousprices (referred as type3 ) a look up table containing the OLAP data could behandy

In real time scenario where update strategy transformation is used?

if we DML operations in session properties then what is the use ofupdate strategy transformation ?

We can use Update strategy transformation in two ways .

1.Mapping level.

2.session level.

Importance of Update strategy transformation in both cases as follows.

In real time if we want to update the existing record with the same sourcedata you can go for session level update logic.

If you want to apply different set of rules for updating or inserting a record,even that record is existed in the warehouse table .you can go for mappinglevel Update strategy transformation. It means that if you are using Routertransformation for performing different activities.

EX: If the employee 'X1234 ' is getting Bonus then updating the Allowancewith 10% less. If not, inserting the record with new Bonus in the Warehousetable.

Lets suppose we have some 10,000 odd records in source system and whenload them into target how do we ensure that all 10,000 records that areloaded to target doesn't contain any garbage values.

24. How do we test it? We can't check every record, as number ofrecords is huge?

Select count (*) From both source table and Target table and compare theresult.

25. What is Entity relation? How is works with Data warehousingETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in termsof Logical view. The entity is called as a table in terms of Physical view.


10/21

The Entity relationship is nothing but maintaining a primary key, foreign keyrelation between the tables for keeping the data and satisfying the Normalform.

There are 4 types of Entity Relationships.

1.One-One,

2.One-Many,

3.Many-One,

4.Many-Many.

In the Data warehouse modeling Entity Relationship is nothing but, aRelationship between dimension and facts tables (ie: Primary, foreign key

relations between these tables).

The fact table getting data from dimensions tables because it containingprimary keys of dimension tables as a foreign keys for getting summarizeddata for each record.

26. Where do we use connected and un connected lookups

If return port only one then we can go for unconnected. More than one returnport is not possible with Unconnected. If more than one return port then gofor Connected.

27. Explain the process of extracting data from source systems,storing in ODS and how data modeling is done.

There are various ways of Extracting Data from Source Systems. Forexample, you can use a DATA step; an Import Process .It depends with yourinput data styles. What kind of File/database it is residing in. Storing yourdata in an ODS can be done thru an ODS stmt/export stmt/FILE stmt, againwhich depends on the file & data format, you want your output to be in.

IDP is the portal for display of reports, stored process, information maps and

a whole bunch of thing ideally required for a dashboard reporting.

IMS, is the GUI to help u convert your technical data and map it to businessdata (change names, add filters, add new columns etc)

28. What is the difference between ETL tool and OLAP tools


11/21

ETL tool is meant for extraction data from the legacy systems and load intospecified database with some process of cleansing data.

ex: Informatica, data stage ....etc

OLAP is meant for Reporting purpose. in OLAP data available inMultidimensional model. so that u can write simple query to extract data frothe data base.

ex: Businee objects,Cognos....etc

ETL tools are used to extract the data from different sources & OLAP toolsare used to analyze the data ......

ETL tools are used to extract, transformation and loading the data into datawarehouse / data mart

OLAP tools are used to create cubes/reports for business analysis from datawarehouse / data mart

29. What are the various tools? - Name a few

1) ETL Tools

IBM Web sphere Information Integration(Ascential Data stage) Ab Initio Informatica

2) OLAP Tools

Business Objects Cognos Hyperion Microsoft Analysis Services Micro strategy

3) Reporting Tools

Business Objects (Crystal Reports) Cognos Actuate

30. What is the difference between Power Center & Power Mart?


12/21

Power Mart is designed for:

Low range of warehousesonly for local repositoriesmainly desktop environment.

Power mart is designed for:

High-end warehousesGlobal as well as local repositoriesERP support

Power Center : we can connect to single and multiple Repositories, generallyused in big Enterprises.Power Mart : we can connect to only a single Repository.

Informatica Power Center is used to maintain the Global Repository, But notin the case of Informatica Power mart. For more you can analyse thearchitecture of Informatica

Powermart:

We can register only local repositories

Partitioning is not available here

doesnot support ERP

PowerCentre:

We can make repositories to GLOBAL

Partitioning is available

Supports ERP

31. What is Entity relation? How is works with Data warehousingETL modeling?

Entity is nothing but an Object, it has characteristics. We call entity in termsof Logical view. The entity is called as a table in terms of Physical view.

The Entity relationship is nothing but maintaining a primary key, foreign keyrelation between the tables for keeping the data and satisfying the Normalform.


13/21

There are 4 types of Entity Relationships.

1.One-One,

2.One-Many,

3.Many-One,

4.Many-Many.

In the Data warehouse modeling Entity Relationship is nothing but, aRelationship between dimension and facts tables (ie: Primary, foreign keyrelations between these tables).

The fact table getting data from dimensions tables because it containingprimary keys of dimension tables as a foreign keys for getting summarized

data for each record.

In real time scenario where update strategy transformation is used? if wedml operations in session properties then what is the use of update strategytransformation ?

We can use Update strategy transformation in two ways .

1.Mapping level.

2.session level.

Importance of Update strategy transformation in both cases as follows.

In real time if we want to update the existing record with the same sourcedata you can go for session level update logic.

If you want to apply different set of rules for updating or inserting a record,even that record is existed in the warehouse table .you can go for mappinglevel Update strategy transformation. It means that if you are using Router

transformation for performing different activities.

EX: If the employee 'X1234 ' is getting Bonus then updating the Allowancewith 10% less. If not, inserting the record with new Bonus in the Warehousetable.

32. What is a staging area? Do we need it? What is the purpose ofa staging area?


14/21

Staging area is place where you hold temporary tables on data warehouseserver. Staging tables are connected to work area or fact tables. We basicallyneed staging area to hold the data , and perform data cleansing and merging, before loading the data into warehouse.

In the absence of a staging area, the data load will have to go from the OLTPsystem to the OLAP system directly, which in fact will severely hamper theperformance of the OLTP system. This is the primary reason for the existenceof a staging area. In addition, it also offers a platform for carrying out datacleansing.

Staging area is a temp schema used to1. Do Flat mapping i.e. dumping all the OLTP data in to it without applyingany business rules. Pushing data into staging will take less time becausethere is no business rules or transformation applied on it.

2. Used for data cleansing and validation using First Logic.

33.What are active transformation / Passive transformations?

An active transformation can change the number of rows as output after atransformation, while a passive transformation does not change the numberof rows and passes through the same number of rows that was given to it asinput.

Transformations can be active or passive. An active transformation canchange the number of rows that pass through it, such as a Filter

transformation that removes rows that do not meet the filter condition. Apassive transformation does not change the number of rows that passthrough it, such as an Expression transformation that performs a calculationon data and passes all rows through the transformation

Active transformationsAdvanced External ProcedureAggregatorApplication Source QualifierFilter

JoinerNormalizerRankRouterUpdate StrategyPassive transformation


15/21

ExpressionExternal ProcedureMaplet- InputLookupSequence generator

XML Source QualifierMaplet - Output

34. How do we extract SAP data Using Informatica? What is ABAP?What are IDOCS?

To extract SAP DATA.

Go to source analyzer, click on source, now u will get option 'Import fromSAP'

Click on this now give your SAP access user, client, password and filter

criteria as table name (so it will take lesser time). After connecting, importthe sap source.

Now one important thing after finishing the map save it and generate ABAPCode for the map. Then only workflow will be running fine.

35. Where do we use semi and non-additive facts?

Additive: A measure can participate arithmetic calculations using all or anydimensions.

Ex: Sales profit

Semi additive: A measure can participate arithmetic calculations using somedimensions.

Ex: Sales amount

Non-Additive measure cant participate arithmetic calculations usingdimensions.

Ex: temperature

36. What is a mapping, session, worklet, workflow, and mapplet?

Session: A session is a set of instructions that tells the Informatica Serverhow and when to move data from sources to targets.Mapplet: Mapplet is the set of transformation, which we can make forreusability. It is a whole logic.


16/21

Workflow: it is the pipeline, which pass or flow the data from source totarget.

A mapping represents dataflow from sources to targets.A mapplet creates or configures a set of transformations.

A workflow is a set of instructions that tell the Informatica server how toexecute the tasks.

A worklet is an object that represents a set of tasks.

A session is a set of instructions to move data from sources to targets.

Mapping - represents the flow and transformation of data from source totarget.Mapplet - a group of transformations that can be called within a mapping.

Session - a task associated with a mapping to define the connections andother configurations for that mapping.Workflow - controls the execution of tasks such as commands, emails andsessions.Worklet - a workflow that can be called within a workflow.

Session - a task associated with a mapping to define the connections andother configurations for that mapping.

Workflow - controls the execution of tasks such as commands, emails andsessions.

Worklet - a workflow that can be called within a workflow.Mapping - represents the flow and transformation of data from source totarget.

Mapplet - a group of transformations that can be called within a mapping.

37. What is a three-tier data warehouse?

Three-tier data warehouse contains three tier such as bottom tier, middle tierand top tier.Bottom tier deals with retrieving related datas or information from variousinformation repositories by using SQL.


17/21

Middle tier contains two types of servers.1.ROLAP server2.MOLAP serverTop tier deals with presentation or visualization of the results

The 3 tiers are:1. Data tier - bottom tier - consists of the database2. Application tier - middle tier - consists of the analytical server3. Presentation tier - tier that interacts with the end-user

38. What are the various methods of getting incremental recordsor delta records from the source systems?

Getting incremental records from source systems to target can be doneby using incremental aggregation transformation

One foolproof method is to maintain a field called 'Last Extraction Date' andthen impose a condition in the code saying 'current_extraction_date >last_extraction_date'.

39. Compare ETL & Manual development?

There are pros and cons of both tool based ETL and hand-coded ETL. Toolbased ETL provides maintainability, ease of development and graphical viewof the flow. It also reduces the learning curve on the team.

Hand coded ETL is good when there is minimal transformational logic

involved. It is also good when the sources and targets are in the sameenvironment. However, depending on the skill level of the team, this canextend the overall development time.

40. Can Informatica load heterogeneous targets fromheterogeneous sources?

Yes! it loads from heterogeneous sources..

41. What are snapshots? What are materialized views & where dowe use them? What is a materialized view log?

Materialized view is a view in which data is also stored in some temp table.i.eif we will go with the View concept in DB in that we only store query andonce we call View it extract data from DB. But In materialized View data isstored in some temp tables.

A snapshot is a table that contains the results of a query of one or moretables or views, often located on a remote database.


18/21

Snapshot is a specific interval of data,

Materialized view means it stores precaluculated data, it is a physicalrepresentation and it's occupies the space

42. What is Full load & Incremental or Refresh load?

By Full Load or One-time load we mean that all the data in the Sourcetable(s) should be processed. This contains historical data usually. Once thehistorical data is loaded we keep on doing incremental loads to process thedata that came after one-time load.

Full Load is the entire data dump load taking place the very first time.Gradually to synchronize the target data with source data, there are further2 techniques:-Refresh load - Where the existing data is truncated and reloaded

completely.Incremental - Where delta or difference between target and source data isdumped at regular intervals. Timestamp for previous delta load has to be maintained.

Full Load: completely erasing the contents of one or more tables and reloading withfresh data.

Incremental Load: applying ongoing changes to one or more tables based on apredefined schedule.

43. What is the metadata extension?

Informatica allows end users and partners to extend the metadata stored inthe repository by associating information with individual objects in therepository. For example, when you create a mapping, you can store yourcontact information with the mapping. You associate information withrepository metadata using metadata extensions.

Informatica Client applications can contain the following types of metadataextensions:

Vendor-defined. Third-party application vendors create vendor-

defined metadata extensions. You can view and change the values ofvendor-defined metadata extensions, but you cannot create, delete, orredefine them.

User-defined. You create user-defined metadata extensions usingPowerCenter/PowerMart. You can create, edit, delete, and view user-defined metadata extensions. You can also change the values of user-defined extensions.


19/21

For this purpose only they came with new product called InformaticaSuperGlue.

Informatica Metadata Exchange (MX) provides a set of relational views thatallow easy SQL access to the Informatica metadata repository. The

Repository Manager generates these views when you create or upgrade arepository.

MX views provide information to help you analyze the following types ofmetadata stored in the repository:

Database definition metadata Source metadata Target metadata Mapping and transformation metadata Session and workflow metadata

Try Super glue also good tool

44. How do we call shell scripts from informatica?

You can use a Command task to call the shell scripts, in the following ways:1. Standalone Command task. You can use a Command task anywhere in theworkflow or worklet to run shell commands.2. Pre- and post-session shell command. You can call a Command task as thepre- or post-session shell command for a Session task. For more informationabout specifying pre-session and post-session shell commands

There is a task named command task, using that you can write or call Shellscript, DOS commands or BAT files

What is Full load & Incremental or Refresh load?By Full Load or One-time load we mean that all the data in the Sourcetable(s) should be processed. This contains historical data usually. Once thehistorical data is loaded we keep on doing incremental loads to process thedata that came after one-time load.

Full Load is the entire data dump load taking place the very first time.

Gradually to synchronize the target data with source data, there are further2 techniques:-Refresh load - Where the existing data is truncated and reloadedcompletely.Incremental - Where delta or difference between target and source data isdumped at regular intervals. Timestamp for previous delta load has to bemaintained


20/21

Full Load: completely erasing the contents of one or more tables andreloading with fresh data.

Incremental Load: applying ongoing changes to one or more tables based ona predefined schedule.

When u load data initially in to data warehouse it will be Full Load.

If u r loading data everyday in to data warehouse it will be incrementalload

Do we need an ETL tool? When do we go for the tools in the market?

ETL Tools are meant to extract, transform and load the data into DataWarehouse for decision making. Before the evolution of ETL Tools, theabove-mentioned ETL process was done manually by using SQL code created

by programmers. This task was tedious and cumbersome in many casessince it involved many resources, complex coding and more work hours. Ontop of it, maintaining the code placed a great challenge among theprogrammers.

These difficulties are eliminated by ETL Tools since they are very powerfuland they offer many advantages in all stages of ETL process starting fromextraction, data cleansing, data profiling, transformation, debugging andloading into data warehouse when compared to the old method.

1. Normally ETL Tool stands for Extraction Transformation Loader

2. This helps you to extract the data from different ODS/Database,

3. If you have a requirement like this you need to get the ETL tools, else

you no need any ETL4.

What is Informatica Metadata and where is it stored?

Informatica Metadata contains all the information about the source tables,target tables, the transformations, so that it will be useful and easy toperform transformations during the ETL process

The Informatica Metadata is stored in Informatica repository


21/21

ETLquestions

Documents

Transcript of ETLquestions