Datastage Questions

DATASTAGE QUESTIONS

1. What is the flow of loading data into fact & dimensional tables? A) Fact table - Table with Collection of Foreign Keys corresponding to the Primary Keys in Dimensional table. Consists of fields with numeric values. Dimension table - Table with Unique Primary Key. Load - Data should be first loaded into dimensional table. Based on the primary key values in dimensional table, the data should be loaded into Fact table.

2. What is the default cache size? How do you change the cache size if needed? A. Default cache size is 256 MB. We can increase it by going into Datastage Administrator and selecting the Tunable Tab and specify the cache size over there.

3. What are types of Hashed File? A) Hashed File is classified broadly into 2 types. a) Static - Sub divided into 17 types based on Primary Key Pattern. b) Dynamic - sub divided into 2 types i) Generic ii) Specific. Dynamic files do not perform as well as a well, designed static file, but do perform better than a badly designed one. When creating a dynamic file you can specify the following Although all of these have default values) By Default Hashed file is "Dynamic - Type Random 30 D"

4. What does a Config File in parallel extender consist of? A) Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location.

5. What is Modulus and Splitting in Dynamic Hashed File? A. In a Hashed File, the size of the file keeps changing randomly. If the size of the file increases it is called as "Modulus". If the size of the file decreases it is called as "Splitting".

6. What are Stage Variables, Derivations and Constants? A. Stage Variable - An intermediate processing variable that retains value during read and doesn?t pass the value into target column. Derivation - Expression that specifies value to be passed on to the target column. Constant - Conditions that are either true or false that specifies flow of data with a link.

7. Types of views in Datastage Director? There are 3 types of views in Datastage Director a) Job View - Dates of Jobs Compiled. b) Log View - Status of Job last run c) Status View - Warning Messages, Event Messages, Program Generated Messages.

8. Types of Parallel Processing?

A) Parallel Processing is broadly classified into 2 types. a) SMP - Symmetrical Multi Processing. b) MPP - Massive Parallel Processing.

9. Orchestrate Vs Datastage Parallel Extender? A) Orchestrate itself is an ETL tool with extensive parallel processing capabilities and running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0 i.e Parallel Extender.

10. Importance of Surrogate Key in Data warehousing? A) Surrogate Key is a Primary Key for a Dimension table. Most importance of using it is it is independent of underlying database. i.e. Surrogate Key is not affected by the changes going on with a database.

11. How to run a Shell Script within the scope of a Data stage job? A) By using "ExcecSH" command at Before/After job properties.

12. How to handle Date conversions in Datastage? Convert a mm/dd/yyyy format to yyyy-dd-mm? A) We use a) "Iconv" function - Internal Conversion. b) "Oconv" function - External Conversion.

Function to convert mm/dd/yyyy format to yyyy-dd-mm is Oconv(Iconv(Filedname,"D/MDY[2,2,4]"),"D-MDY[2,2,4]")

13 How do you execute datastage job from command line prompt? A) Using "dsjob" command as follows. dsjob -run -jobstatus projectname jobname

14. Functionality of Link Partitioner and Link Collector? Link Partitioner: It actually splits data into various partitions or data flows using various partition methods. Link Collector: It collects the data coming from partitions, merges it into a single data flow and loads to target.

15. Types of Dimensional Modeling? A) Dimensional modeling is again sub divided into 2 types. a) Star Schema - Simple & Much Faster. Denormalized form. b) Snowflake Schema - Complex with more Granularity. More normalized form.

16. Differentiate Primary Key and Partition Key? Primary Key is a combination of unique and not null. It can be a collection of key values called as composite primary key. Partition Key is a just a part of Primary Key. There are several

methods of partition like Hash, DB2, and Random etc. While using Hash partition we specify the Partition Key.

17. Differentiate Database data and Data warehouse data? A) Data in a Database is a) Detailed or Transactional b) Both Readable and Writable. c) Current.

18. Containers Usage and Types? Container is a collection of stages used for the purpose of Reusability. There are 2 types of Containers. a) Local Container: Job Specific b) Shared Container: Used in any job within a project.

19. Compare and Contrast ODBC and Plug-In stages? ODBC: a) Poor Performance. b) Can be used for Variety of Databases. c) Can handle Stored Procedures. Plug-In: a) Good Performance. b) Database specific. (Only one database) c) Cannot handle Stored Procedures.

20. Dimension Modelling types along with their significance Data Modelling is Broadly classified into 2 types. a) E-R Diagrams (Entity - Relatioships). b) Dimensional Modelling.

21. Explain Data Stage Architecture? Data Stage contains two components, Client Component. Server Component. Client Component: ? Data Stage Administrator. ? Data Stage Manager ? Data Stage Designer ? Data Stage Director

Server Components: ? Data Stage Engine ? Meta Data Repository ? Package Installer

Data Stage Administrator:

Used to create the project. Contains set of properties We can set the buffer size (by default 128 MB) We can increase the buffer size. We can set the Environment Variables. In tunable we have in process and inter-process In-process?Data read in sequentially Inter-process? It reads the data as it comes. It just interfaces to metadata.

Data Stage Manager: We can view and edit the Meta data Repository. We can import table definitions. We can export the Data stage components in .xml or .dsx format. We can create routines and transforms We can compile the multiple jobs.

Data Stage Designer: We can create the jobs. We can compile the job. We can run the job. We can declare stage variable in transform, we can call routines, transform, macros, functions. We can write constraints.

Data Stage Director: We can run the jobs. We can schedule the jobs. (Schedule can be done daily, weekly, monthly, quarterly) We can monitor the jobs. We can release the jobs.

22. What is Meta Data Repository? Meta Data is a data about the data. It also contains ? Query statistics ? ETL statistics ? Business subject area ? Source Information ? Target Information ? Source to Target mapping Information.

23. What is Data Stage Engine? It is a JAVA engine running at the background.

24. What is Dimensional Modeling? Dimensional Modeling is a logical design technique that seeks to present the data in a standard framework that is, intuitive and allows for high performance access.

25. What is Star Schema? Star Schema is a de-normalized multi-dimensional model. It contains centralized fact tables surrounded by dimensions table. Dimension Table: It contains a primary key and description about the fact table. Fact Table: It contains foreign keys to the dimension tables, measures and aggregates.

26. What is surrogate Key? It is a 4-byte integer which replaces the transaction / business / OLTP key in the dimension table. We can store up to 2 billion record.

27. Why we need surrogate key? It is used for integrating the data may help better for primary key. Index maintenance, joins, table size, key updates, disconnected inserts and partitioning.

28. What is Snowflake schema? It is partially normalized dimensional model in which at two represents least one dimension or more hierarchy related tables.

29. Explain Types of Fact Tables? Factless Fact: It contains only foreign keys to the dimension tables. Additive Fact: Measures can be added across any dimensions. Semi-Additive: Measures can be added across some dimensions. Eg, % age, discount Non-Additive: Measures cannot be added across any dimensions. Eg, Average Conformed Fact: The equation or the measures of the two fact tables are the same under the facts are measured across the dimensions with a same set of measures.

30. Explain the Types of Dimension Tables? Conformed Dimension: If a dimension table is connected to more than one fact table, the granularity that is defined in the dimension table is common across between the fact tables. Junk Dimension: The Dimension table, which contains only flags. Monster Dimension: If rapidly changes in Dimension are known as Monster Dimension. De-generative Dimension: It is line item-oriented fact table design.

31. What are stage variables? Stage variables are declaratives in Transformer Stage used to store values. Stage variables are active at the run time. (Because memory is allocated at the run time).

32. What is sequencer? It sets the sequence of execution of server jobs.

33. What are Active and Passive stages? Active Stage: Active stage model the flow of data and provide mechanisms for combining data streams, aggregating data and converting data from one data type to another. Eg, Transformer,

aggregator, sort, Row Merger etc. Passive Stage: A Passive stage handles access to Database for the extraction or writing of data. Eg, IPC stage, File types, Universe, Unidata, DRS stage etc.

34. What is ODS? Operational Data Store is a staging area where data can be rolled back.

35. What are Macros? They are built from Data Stage functions and do not require arguments. A number of macros are provided in the JOBCONTROL.H file to facilitate getting information about the current job, and links and stages belonging to the current job. These can be used in expressions (for example for use in Transformer stages), job control routines, filenames and table names, and before/after subroutines. These macros provide the functionality of using the DSGetProjectInfo, DSGetJobInfo, DSGetStageInfo, and DSGetLinkInfo functions with the DSJ.ME token as the JobHandle and can be used in all active stages and before/after subroutines. The macros provide the functionality for all the possible InfoType arguments for the DSGet?Info functions. See the Function call help topics for more details. The available macros are: DSHostName DSProjectName DSJobStatus DSJobName DSJobController DSJobStartDate DSJobStartTime DSJobStartTimestamp DSJobWaveNo DSJobInvocations DSJobInvocationId DSStageName DSStageLastErr DSStageType DSStageInRowNum DSStageVarList DSLinkRowCount DSLinkLastErr DSLinkName 1) Examples 2) To obtain the name of the current job: 3) MyName = DSJobName To obtain the full current stage name: MyName = DSJobName : ?.? : DSStageName

36. What is keyMgtGetNextValue? It is a Built-in transform it generates Sequential numbers. Its input type is literal string & output type is string.

37. What are stages?

The stages are either passive or active stages. Passive stages handle access to databases for extracting or writing data. Active stages model the flow of data and provide mechanisms for combining data streams, aggregating data, and converting data from one data type to another.

38. What index is created on Data Warehouse? Bitmap index is created in Data Warehouse.

39. What is container? A container is a group of stages and links. Containers enable you to simplify and modularize your server job designs by replacing complex areas of the diagram with a single container stage. You can also use shared containers as a way of incorporating server job functionality into parallel jobs. DataStage provides two types of container: ? Local containers. These are created within a job and are only accessible by that job. A local container is edited in a tabbed page of the job?s Diagram window. ? Shared containers. These are created separately and are stored in the Repository in the same way that jobs are. There are two types of shared container

40. What is function? ( Job Control ? Examples of Transform Functions ) Functions take arguments and return a value. ? BASIC functions: A function performs mathematical or string manipulations on the arguments supplied to it, and return a value. Some functions have 0 arguments; most have 1 or more. Arguments are always in parentheses, separated by commas, as shown in this general syntax: FunctionName (argument, argument) ? DataStage BASIC functions: These functions can be used in a job control routine, which is defined as part of a job?s properties and allows other jobs to be run and controlled from the first job. Some of the functions can also be used for getting status information on the current job; these are useful in active stage expressions and before- and after-stage subroutines.

To do this ... Use this function ... Specify the job you want to control DSAttachJob Set parameters for the job you want to control DSSetParam Set limits for the job you want to control DSSetJobLimit Request that a job is run DSRunJob Wait for a called job to finish DSWaitForJob Gets the meta data details for the specified link DSGetLinkMetaData Get information about the current project DSGetProjectInfo Get buffer size and timeout value for an IPC or Web Service stage DSGetIPCStageProps Get information about the controlled job or current job DSGetJobInfo Get information about the meta bag properties associated with the named job DSGetJobMetaBag Get information about a stage in the controlled job or current job DSGetStageInfo Get the names of the links attached to the specified stage DSGetStageLinks Get a list of stages of a particular type in a job. DSGetStagesOfType

Get information about the types of stage in a job. DSGetStageTypes Get information about a link in a controlled job or current job DSGetLinkInfo Get information about a controlled job?s parameters DSGetParamInfo Get the log event from the job log DSGetLogEntry Get a number of log events on the specified subject from the job log DSGetLogSummary Get the newest log event, of a specified type, from the job log DSGetNewestLogId Log an event to the job log of a different job DSLogEvent Stop a controlled job DSStopJob Return a job handle previously obtained from DSAttachJob DSDetachJob Log a fatal error message in a job's log file and aborts the job. DSLogFatal Log an information message in a job's log file. DSLogInfo Put an info message in the job log of a job controlling current job. DSLogToController Log a warning message in a job's log file. DSLogWarn Generate a string describing the complete status of a valid attached job. DSMakeJobReport Insert arguments into the message template. DSMakeMsg Ensure a job is in the correct state to be run or validated. DSPrepareJob Interface to system send mail facility. DSSendMail Log a warning message to a job log file. DSTransformError Convert a job control status or error code into an explanatory text message. DSTranslateCode Suspend a job until a named file either exists or does not exist. DSWaitForFile Checks if a BASIC routine is cataloged, either in VOC as a callable item, or in the catalog space. DSCheckRoutine Execute a DOS or Data Stage Engine command from a before/after subroutine. DSExecute Set a status message for a job to return as a termination message when it finishes DSSetUserStatus

41. What is Routines? Routines are stored in the Routines branch of the Data Stage Repository, where you can create, view or edit. The following programming components are classified as routines: Transform functions, Before/After subroutines, Custom UniVerse functions, ActiveX (OLE) functions, Web Service routines

42. What is data stage Transform?

43. What is Meta Brokers?

44. What is usage analysis?

45. What is job sequencer?

46. What are different activities in job sequencer?

47. What are triggers in data Stages? (conditional, unconditional, otherwise)

48. Are u generated job Reports? S

49. What is plug-in?

50. Dimension Modeling types along with their significance :Data Modelling is broadly classified into 2 types. A) E-R Diagrams (Entity - Relatioships).

B) Dimensional Modelling.

51.What are the command line functions that import and export the DS jobs? Answer: ? dsimport.exe - imports the DataStage components. ? dsexport.exe - exports the DataStage components.

52. What are OConv () and Iconv () functions and where are they used? IConv() - Converts a string to an internal storage format OConv() - Converts an expression to an output format.

53. What does a Config File in parallel extender consist of? Config file consists of the following. a) Number of Processes or Nodes. b) Actual Disk Storage Location.

54. Functionality of Link Partitioner and Link Collector? Answer: Link Partitioner: It actually splits data into various partitions or data flows using various Partition methods. Link Collector: It collects the data coming from partitions, merges it into a single data flow and loads to target.

55. Did you Parameterize the job or hard-coded the values in the jobs? Always parameterized the job. Either the values are coming from Job Properties or from a ?Parameter Manager? ? a third part tool. There is no way you will hard?code some parameters in your jobs. The often Parameterized variables in a job are: DB DSN name, username, password, dates W.R.T for the data to be looked against at.

56. Have you ever involved in updating the DS versions like DS 5.X, if so tell us some the steps you have taken in doing so? Yes. The following are some of the steps: Definitely take a back up of the whole project(s) by exporting the project as a .dsx file See that you are using the same parent folder for the new version also for your old jobs using the hard-coded file path to work. After installing the new version import the old project(s) and you have to compile them all again. You can use 'Compile All' tool for this. Make sure that all your DB DSN's are created with the same name as old ones. This step is for moving DS from one machine to another. In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there is tool on DS CD that can do this for you. Do not stop the 6.0 server before the upgrade, version 7.0 install process collects project information during the upgrade. There is NO rework (recompilation of existing jobs/routines) needed after the upgrade.

57. How did you handle reject data? Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected.

58. What are other Performance tunings you have done in your last project to increase the performance of slowly running jobs? ? Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts. ? Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values for faster inserts, updates and selects. ? Tuned the 'Project Tunables' in Administrator for better performance. ? Used sorted data for Aggregator. ? Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. ? Removed the data not used from the source as early as possible in the job. ? Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries. ? Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the jobs. ? If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel. ? Before writing a routine or a transform, make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories. ? Constraints are generally CPU intensive and take a significant amount of time to process. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal. ? Try to have the constraints in the 'Selection' criteria of the jobs itself. This will eliminate the unnecessary records even getting in before joins are made. ? Tuning should occur on a job-by-job basis. ? Use the power of DBMS. ? Try not to use a sort stage when you can use an ORDER BY clause in the database. ? Using a constraint to filter a record set is much slower than performing a SELECT ? WHERE?.

? Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than using ODBC or OLE.

59. What are Routines and where/how are they written and have you written any routines before? Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different types of Routines: 1. Transform Functions 2. Before-After Job subroutines 3. Job Control Routines

60. How did you handle an 'Aborted' sequencer? In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.

61. What are Sequencers? Sequencers are job control programs that execute other jobs with preset Job parameters.

62. Read the String functions in DS Functions like [] -> sub-string function and ':' -> concatenation operator Syntax:

string [ [ start, ] length ] string [ delimiter, instance, repeats ]

63. What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? Use crontab utility along with dsexecute() function along with proper parameters passed.

64. How would call an external Java function which are not supported by DataStage? Starting from DS 6.0 we have the ability to call external Java functions using a Java package from Ascential. In this case we can even use the command line to invoke the Java function and write the return values from the Java program (if any) and use that files as a source in DataStage job.

65. When should we use ODS? DWH's are typically read only, batch updated on a schedule ODS's are maintained in more real time, trickle fed constantly

66. What is the Batch Program and how can generate? Batch program is the program it's generate run time to maintain by the Datastage itself but u can easy to change own the basis of your requirement (Extraction, Transformation, Loading) .Batch program are generate depends your job nature either simple job or sequencer job, you can see this program on job control option.

67. Suppose that 4 job control by the sequencer like (job 1, job 2, job 3, job 4 ) if job 1 have 10,000 row ,after run the job only 5000 data has been loaded in target table remaining are not loaded and your job going to be aborted then.. How can short out the problem? Answer: Suppose job sequencer synchronies or control 4 job but job 1 have problem, in this condition should go director and check it what type of problem showing either data type problem, warning massage, job fail or job aborted, If job fail means data type problem or missing column action .So u should go Run window ->Click-> Tracing->Performance or In your target table ->general -> action-> select this option here two option (i) On Fail -- Commit , Continue (ii) On Skip -- Commit, Continue. First u check how much data already load after then select on skip option then continue and what remaining position data not loaded then select On Fail , Continue ...... Again Run the job defiantly u gets successful massage

68. What happens if RCP is disable? In such case OSH has to perform Import and export every time when the job runs and the processing time job is also increased...

69. What are Sequencers? Sequencers are job control programs that execute other jobs with preset Job parameters.

70.How did you handle an 'Aborted' sequencer? In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.

71. What is the difference between the Filter stage and the Switch stage?

There are two main differences, and probably some minor ones as well. The two main differences are as follows. 1) The Filter stage can send one input row to more than one output link. The Switch stage can not - the C switch construct has an implicit break in every case. 2) The Switch stage is limited to 128 output links; the Filter stage can have a theoretically unlimited number of output links. (Note: this is not a challenge!)

72. How can i achieve constraint based loading using datastage7.5.My target tables have inter dependencies i.e. Primary key foreign key constraints. I want my primary key tables to be loaded first and then my foreign key tables and also primary key tables should be committed before the foreign key tables are executed. How can I go about it?

1) Create a Job Sequencer to load you tables in Sequential mode In the sequencer Call all Primary Key tables loading Jobs first and followed by Foreign key tables, when triggering the Foreign tables load Job trigger them only when Primary Key load Jobs run Successfully ( i.e. OK trigger) 2) To improve the performance of the Job, you can disable all the constraints on the tables and load them. Once loading done, check for the integrity of the data. Which does not meet raise exceptional data and cleanse them. This only a suggestion, normally when loading on constraints are up, will drastically performance will go down. 3) If you use Star schema modeling, when you create physical DB from the model, you can delete all constraints and the referential integrity would be maintained in the ETL process by referring all your dimension keys while loading fact tables. Once all dimensional keys are assigned to a fact then dimension and fact can be loaded together. At the same time RI is being maintained at ETL process level.

73. How do you merge two files in DS? Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one, if the metadata is different.

74.How do you eliminate duplicate rows? Data Stage provides us with a stage Remove Duplicates in Enterprise edition. Using that stage we can eliminate the duplicates based on a key column.

75. How do you pass filename as the parameter for a job? While job development we can create a parameter 'FILE_NAME' and the value can be passed while

76. Is there a mechanism available to export/import individual DataStage ETL jobs from the UNIX command line? Try dscmdexport and dscmdimport. Won't handle the "individual job" requirement. You can only export full projects from the command line. You can find the export and import executables on the client machine usually someplace like: C:\Program Files\Ascential\DataStage.

77. Diff. between JOIN stage and MERGE stage. JOIN: Performs join operations on two or more data sets input to the stage and then outputs the resulting dataset. MERGE: Combines a sorted master data set with one or more sorted updated data sets. The columns from the records in the master and update data set s are merged so that the out put

record contains all the columns from the master record plus any additional columns from each update record that required.

A master record and an update record are merged only if both of them have the same values for the merge key column(s) that we specify .Merge key columns are one or more columns that exist in both the master and update records.

78. How to deconstruct the shared container? To deconstruct the shared container, first u have to convert the shared container to local container. And then deconstruct the container.

79. I am getting input value like X = Iconv(?31 DEC 1967?,?D?)? What is the X value? X value is Zero. Iconv Function Converts a string to an internal storage format.It takes 31 dec 1967 as zero and counts days from that date(31-dec-1967).

80. What is the use use of Nested condition activity? Nested Condition. Allows you to further branch the execution of a sequence depending on a condition.

81. I have three jobs A,B,C . Which are dependent on each other? I want to run A & C jobs daily and B job runs only on Sunday. How can u do it? First you have to schedule A & C jobs Monday to Saturday in one sequence. Next take three jobs according to dependency in one more sequence and schedule that job only Sunday.

DATASTAGE NOTES ________________________________________

DataStage Tips: 1. Aggregator stage does not support more than one source, if you try to do this you will get error, ?The destination stage cannot support any more stream input links?. 2. You can give N number input links to transformer stage, but you can give sequential file stage as reference link. You can give only one sequential file stage as primary link and number other links as reference link. If you try to give sequential file stage as reference link you will get error as, ?The destination stage cannot support any more stream input links? because reference link represent a lookup table, but sequential file does not use as lookup table, Hashed file can be use as lookup table.

Sequential file stage: ? Sequential file stage is provided by datastage to access data from sequential file. (Text file) ? The access mechanism of a sequential file is sequence order. ? We cannot use a sequential file as a lookup. ? The problem with sequential file we cannot directly ?filter rows? and query is not supported.

Update actions in sequential file: ? Over write existing file (radio button). ? Append to existing file (radio button). ? Backup existing file (check box).

Hashed file stage: ? Hashed file is used to store data in hash file. ? A hash file is similar to a text file but the data will be organized using ?hashing algorithm?. ? Basically hashed file is used for lookup purpose. ? The retrieval of data in hashed file faster because it uses ?hashing algorithm?.

Update actions in Hashed file: ? Clear file before waiting ? Backup existing file. ? Sequential file (all are check boxes).

DWH FAQ:

Conformed dimension: ? A dimension table connects to more than one fact table. We present this same dimension table in both schemes and we refer to dimension table as conformed dimension.

Conformed fact: ? Definitions of measurements (facts) are highly consistent we call them as conformed fact.

Junk dimension: ? It is convenient grouping of random flags and aggregates to get them out of a fact table and into a useful dimensional framework.

Degenerated dimension: ? Usually occur in line item oriented fact table designs. Degenerate dimensions are normal, expected and useful. ? The degenerated dimension key should be the actual production order of number and should set in the fact table without a join to anything.

Time dimension: ? It contains a number of useful attributes for describing calendars and navigating. ? An exclusive time dimension is required because the SQL date semantics and functions cannot generate several important features, attributes required for analytical purposes. ? Attributes like week days, week ends, holidays, physical periods cannot be generated by SQL statements.

Fact less fact table: ? Fact table which do not have any facts are called fact less fact table. ? They may consist of keys; these two kinds of fact tables do not have any facts at all. ? The first type of fact less fact table records an ?event?. ? Many event tracking tables in dimensional data warehouses turn out to be factless. Ex: A student tracking system that details each ?student attendance? event each day. ? The second type of fact less fact table is coverage. The coverage tables are frequently needed when a primary fact table in dimensional DWH is sparse. Ex: The sales fact table that records the sales of products in stores on particular days under each promotion condition

Types of facts: ? Additive: facts involved in the calculations for deriving summarized data.

? Semi additive: facts that involved in the calculations at a particular context of time. ? Non additive: facts that cannot involved in the calculations at every point of time.

Data stage:

How many types of loading-techniques are Available?

Before going to design jobs in Data stage what are preceding-steps in Data stage?

What is a pivotstage? Can u explain on scenario which situation used in your project?

What is the difference between clearlog-file? Clearstage-file?

How to scehudle jobs without using Data stage?

How to do error handling in data stage?

What is the difference between Active stage and passive stage? What are the Active and passive stages?

How to set Environment variables in datastge?

How to do Auto-purge in Data stage?

How do you import your source and targets? What are the types of sources and targets?

How you decide when to go for join stage and lookup stage?

What is IPC Stage?

What is audit table?

If there is a large hash file and a smaller oracle table and if you are looking up from transformer in different jobs which will be faster?

Tell me about SCD?s?

What are derivations in transformer?

What is job scheduler? Have you used it? How did you do?

Have you used datastage parallel extender?

What is the Link Partitioner and link collector stage?

How do constraint in transformer work?

How will you declare a constraint in datastage?

How will you handle rejected data?

Where the data stored in datastage?

Give me some performance tips in datastage?

Can we use sequential file as a lookup?

What is the difference between SCD Type2 and SCD Type3?

What is materialized view?

Datastage Questions

Documents

Transcript of Datastage Questions