NIIT_ Data Warehousing & Data Mining Code 53

8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

1/21

8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more

Follow @

Welcome to..www.sslifehacker.blogspot.in

Get more stuff like.. Assignments, Documents, PC tips and tricks, Widescreen wallpapers,Learning videos,Open source software,Latest co

packs,Important web links and whatever you want...

Home

Monday, 27 May 2013

Data Warehousing & Data Mining Code 53

FAQ

1. What is the disadvantage of File Management System over DBMS?

Ans:

Some of the disadvantages of file management system over database management system are:

n Data redundancy and inconsis tency

n Difficulty in accessing

data

n Difficulty in integrating data into new enterprise level applications because of varying formats

n Lack of support for concurrent updates by multiple users.

n Lack of inherent security

2. Are relational databases the only possible type of database models?

Ans:

No. Apart from relational, other models include network and hierarchical models. However, these two models are obsolete.

Nowadays, relational and object-oriented models are preferred.

3. What is referential integrity and how it is achieved in a relational database?

Ans:

Referential integrity is a feature of DBMS that prevents the user from entering inconsistent data. This is mainly achieved by having a

foreign key constraint on a table.

4. What are the higher normal forms?

Ans:

A normal form considered higher than 3NF is the Boyce Codd Normal Form (BCNF). The BCNF differs from the 3 NF when there are

more than one composite and disjoint candidate keys.

------------------------------------------------------------------------------------------------------------

5. On which layer of application architecture does data warehouse operate?

January(2)

December(7)

November(2)

October(1)

August(3)

May(4)

April(9)

Blog Archive

Sea

4

6

A

C(

G(

s-

ME
http://sslifehacker.blogspot.in/search/label/solved%20Answers%20Bsc.IT%20-51http://sslifehacker.blogspot.in/search/label/PRACTICALhttp://sslifehacker.blogspot.in/search/label/Notificationhttp://sslifehacker.blogspot.in/search/label/Icons%20Packhttp://sslifehacker.blogspot.in/search/label/ibpshttp://sslifehacker.blogspot.in/search/label/GNIIT%20Sem%20C%20Elective%20.http://sslifehacker.blogspot.in/search/label/Downloadshttp://sslifehacker.blogspot.in/search/label/Documentshttp://sslifehacker.blogspot.in/search/label/Cracking%20Passwordhttp://sslifehacker.blogspot.in/search/label/Assignmentshttp://sslifehacker.blogspot.in/search/label/6%20Sem%20KU%20Lab%20Practicalhttp://sslifehacker.blogspot.in/search/label/4th%20semhttp://sslifehacker.blogspot.in/2013_04_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_05_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_08_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_10_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_11_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_12_01_archive.htmlhttp://sslifehacker.blogspot.in/2014_01_01_archive.htmlhttp://sslifehacker.blogspot.in/http://sslifehacker.blogspot.in/https://twitter.com/intent/follow?original_referer=http%3A%2F%2Fsslifehacker.blogspot.in%2F2013%2F05%2Fdata-warehousing-data-mining-code-53.html&region=follow_link&screen_name=ss731382&tw_p=followbutton&variant=2.0


2/21


3/21


http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 3

20. What are the different OLAP storage models?

Ans:

Following are the different OLAP storage models: MOLAP, ROLAP and HOLAP.

21. A data a nalysis has to be done in the fastest possible means on data stored in Multi-dimensional format. Which

storage model is best suited in this case?

Ans:

MOLAP.

---------------------------------------------------------------------------------------------------------

FAQ

22. Name the 2 important parameters that decide the granularity of partitions.

Ans:

Two important factors that decide the granularity of partitions are the overall size and manageability of the system. Both parameters

are to be balanced against each other while deciding on a partitioning strategy. Suppose a data containing information about the

population is partitioned on the basis of state, the two maintenance related issues that could be faced by the administrator are:

n The query needs the information of all the s tates, such as particular languages spoken in the s tates, when all the s tates have

to be scanned.

n If the definition of state changes (state is redefined), the entire fact table needs to be built again.

23. Are there any disadvantages of data partitioning?

Ans:

Data partitioning is by and large an advantageous technique for improving performance. However, it increases the implementation

complexity and imposes constraints in query design.

24. Can partitions be indexed?

Ans:

Yes, partitions can be indexed if supported by the platform. For example, in Oracle 9i, you can create various types of partitions in

indexes.

25. If you have a huge amount of historical data, which is too old to be useful often but cannot be discarded, then can

partitioning help?

Ans:

Essentially the answer to this question depends on various factors such as availability of resources and design st rategies. However,

you can partition data on the basis of the date that it was last accessed and keep the historical data on a separate partition. In fact,

you can use Stripping to keep it on a separate disk to improve access speed to the more useful data.

------------------------------------------------------------------------------------------------------------

FAQ

26. Are there any risks associated w ith aggrega tion?

Ans:

The main risk associated with aggregates is that of increase in disk storage space.

27. Once created, is an aggregate permanent?

Ans:

No, aggregates keep changing as per the need of the business. In fact, they can be taken offline or put online anytime by the

administrator. Aggregates, which have become obsolete, can also be deleted to free up disk space.

28. Can operations such as MIN and MAX value be determined once a summary table has been created?

Ans:

Operations such as MIN and MAX cannot be determined correctly once the summary table has been created. To determine their

value they must be calculated and stored at the time that the summary table was derived from the base table.

29. How much storage increase might be required in the data warehouse system when using aggregates?

Ans:

The storage needs typically increase by a factor of 1 or sometimes even 2 for aggregates.

---------------------------------------------------------------------------------------------------------

FAQ

30. What are conformed dimensions?

Ans:

A conformed dimension is the one whose meaning is independent of the fact table from which it is being referred to.

31. What are virtual data marts?

Ans:Virtual data marts are logical views of multiple physical data marts based on user requirement.

32. Which tool supports data mart based data w arehouse architectures?

Ans:

Informtica is commonly used for implementing data mart based data warehouse architectures.

33. Is the data in data marts also historical like in data warehouses?

Ans:

The data in data marts is historical only to some extent. In fact, it is not the same as the data in data warehouse because of the

difference in the purpose and approaches of the two.

------------------------------------------------------------------------------------------------------------

FAQ

34. How can you classify metadata?

Ans:


4/21



You can classify metadata according the use of metadata as:

n Administrative metadata: Metadata that describes the data used for managing the data in terms of statistics such as t ime of

creation, access rights, and last access time.

n Structural metadata: Metadata that describes the structure of the data.

n Descriptive metadata: Metadata that describes the purpose or functionality of the data.

35. What is backroom metadata?

Ans:

Backroom metadata is the metadata related to the process of extracting, cleaning, and loading. It is of use to the DBA and

business users but not t o the end-user.

36. What is a metadata catalogue?

Ans:A metadata catalogue is the same as a metadata repository. It is also called metadatabase.

37. Are there any tools for metadata management?

Ans:

Yes, there are various tools that facilitate metadata management. One such Windows based tool is Saphir. SQL Server 2000 also

enables metadata management to some extent.

-----------------------------------------------------------------------------------------------------------

FAQ

38. Are the system and process management devoid of a ny manual intervention considering that process manager is a

tool and not a person?

Ans:

No. Although the system and process manager are themselves tools that automate system and process management in data

warehouses, they must be configured and sometimes handled through manual intervention at times. These tasks may be done by

the Database Administrator.

39. Does SQL Server also provide system ma nagers?

Ans:Yes. SQL Server includes various components that enable system management through management and security services:

40. What is Oracle Warehouse Builder(OWB)?

Ans:

It is one of the commonly used data warehouse development tools with various advanced features such as support for large

databases, automated summary management, and embedded multidimensional OLAP engine. Unlike SQL Server, which is only for

the Windows platform, OWB can be used on all platforms. It is also more fast, reliable, and scaleable than SQL Server.

41. What is replication?

Ans:

Replication is the process of creating multiple copies of data on the same or different platform and keeping the copies in sync.

------------------------------------------------------------------------------------------------------------

FAQ

42. What is KDD Process?

Ans:

The unifying goal of the KDD process is to extract knowledge from data in the context of large databases. It does this by using data

mining methods (algorithms) to extract (identify) what is deemed knowledge, according to t he specifications of measures and

thresholds, using a database along with any required preprocessing, subsampling, and transformations of that database.

43. What is Data Visualization?

Ans:

Data Visualization presents data in three dimensions and colors to help users view complex patterns. They also provide advanced

manipulation capabilities to slice, rotate or zoom the objects to identify patterns.

44. What a re the constituents of Multidimensional objects?

Ans:

Dimensions and Measures.

45. What does level specify within dimension?

Ans:

Levels specify the contents and s tructure of the dimension's hierarchy.

46. What is data mining?

Ans:

Data Mining is the process of finding new and potentially useful knowledge from data

47. What does Data Mining Software do?

Ans:

A Data Mining Software searches large volume of data, looking for patterns that accurately predict behavior, such as customers

most likely to maintain relationship with the company etc. Common techniques employed by Data Mining Software include Neural

Networks, Decision Trees and standard statistical modeling.

48. What is Oracle Data Mining?

Ans:

Oracle Data Mining is enterprise data mining software that combines the ease of a Windows-based client with the power of a fully

scalable, multi-algorithmic, UNIX server-based solution. Oracle Data Mining provides comprehensive predictive modeling capabilities

that take advantage of parallel computing techniques to rapidly extract valuable customer intelligence information. Oracle Data

Mining can optionally generate deployable models in C, C++, or Java code, delivering the "power of prediction" to call center,

campaign management, and Web-based applications enterprise-wide.


5/21



49. How does data mining differ from OLAP?

Ans:

Simply put, OLAP compares and data mining predicts. OLAP performs roll-ups, aggregations, and calculations, and it compares

multiple results in a clearly organized graphical or tabular display. Data mining analyzes data on historical cases to discover

patterns and uses the patterns to make predictions or estimates of outcomes for unknown cases. An analyst may use OLAP to

discover a business problem, and then apply data mining to make the predictions necessary for a solution. An OLAP user can

apply data mining to discover meaningful dimensions that should be compared. OLAP may be used to perform roll-ups and

aggregations needed by the data mining tool. Finally, OLAP can compare data mining predictions or values derived from

predictions.

50. What are some typical data mining applications?

Ans:

Following are some of the data mining applications:n Customer retention

n Cross selling

n Response modeling / target marketing

n Profitability analysis

n Product affinity analysis

n Fraud detection

------------------------------------------------------------------------------------------------------------

FAQ

51. What is Noisy Data?

Ans:

Noise is a random error or variance in data. It can happen because of:

n Faulty data collection and data entry mistake such as a typing mistake

n Data transmission and storage problem

n Inconsistency in naming convention

Noise makes data inaccurate for predictions and renders it futile for mining systems.

52. Which are the major data mining tasks?

Ans:

The main data mining tasks include:

n Classification

n Clustering

nAssociations

n Prediction

n Characterization and Discrimination

n Evolution Analys is

53. What are some other Data Mining Languages and standardization of primitives apart from DMQL?

Ans:

Some other Data Mining Languages and standardizations of primitives apart from DMQL include:

n MSQL

n Mine Rulen Query flocks based on Data log syntax

n OLEDB for DM

n CRISP-DM

54. Which Data Mining tools are used commercially?

Ans:

Some Data Mining tools used commercially are:

n Clementine

n Darwin

n Enterprise Miner

n Intelligent Miner

n Mine Set

55. How can noisy data be smoothened?

Ans:

Noisy data can be smoothened using the following techniques:

n Binning

n Clustering

n Computer/Human inspection

n Regression

---------------------------------------------------------------------------------------------------------

FAQ

56. What are variations to Apriori algorithm?

Ans:

Following are some of the variations to Apriori algorithm that improves the efficiency of the original algorithm:

n Transaction reduction: Reducing the number of transaction scanned in future iterations

n Partitioning: Partitioning data to find candidate itemsets

n Sampling: Mining on a subset of the given data

n Dynamic itemset counting: Adding candidate itemsets at different points during the scan

57. Which is the best approach when we are interested in finding all possible interactions among a set of attributes?

Ans:


6/21



The best approach to find all possible interactions among a set of attributes is association rule mining.

58. What is over fitting in neural network?

Ans:

Over fitting is a common problem in neural network design. Over fitting occurs when a network has memorized the training set but

has not learned to generalize to new inputs. Over fitting produces a relatively small error on the training set but will produce a much

larger error when new data is presented to the network.

59. What is back propagation neural network?

Ans:

The back propagation is a neural network algorithm for classification that employs a method of gradient descent. It searches for a

set of weights that can model the data to minimize the mean squared distance between the network's class prediction and the

actual class label of data samples. Rules may be extracted from trained neural networks in order to help improve the interpretabilityof the learned network.

----------------------------------------------------------------------------------------------------------

FAQ

60. What is the difference between KDD and data mining?

Ans:

KDD refers to the overall process of discovering useful knowledge from data. It also includes the choice of encoding schemes,

preprocessing, sampling, and projections of the data prior to the data-mining step.

Data mining refers to the application of algorithms for extracting patterns from data without the additional steps of the KDD process.

It is essentially the modeling step in the KDD process.

61. Is data stored for data mining different from other operational systems?

Ans:

The two systems differ in the usage patterns associated with data mining. Some of the basic differences are:

n Operational systems support transactional processing whereas data mining systems support analytical processing.

n Operational sys tems are process-oriented whereas data mining systems are subject-oriented.

n Operational systems are concerned with current data whereas data mining systems deal with historical data.

n Operational systems are updated frequently and have volatile data whereas data mining systems have non-volatile data and are

rarely changed.

n Operational sys tems are optimized for fast updates whereas data mining systems are optimized for fast retrievals.

62. What a re the primary goals of the KDD process?

Ans:

The two primary goals of data mining are:

n Prediction of unknown or future values

n Description, which focuses on finding human-interpretable patterns

63. Does KDD need to be scalable?

Ans:

KDD should be scalable because of the ever-increasing data in enterprises. Limiting the size of a data mining system will affect the

accuracy of predictions.

------------------------------------------------------------------------------------------------------------

FAQ

64. What a re different types of Query Answering Mechanisms?

Ans:

Query answering can be classified into two categories based on their method of response:

n Direct Query answering: It means that query is answered by returning exactly what is being asked.

n Intelligent Query answering: It refers to analyzing the intent of the query and providing generalized, neighborhood, or

associated information relevant to the query.

65. Are there any social concerns related to data mining?

Ans:

The most important social issue is privacy of data pertaining to individual. KDD poses a threat to privacy. The discovered patterns

often classify individuals into categories, revealing their confidential personal information. Moreover, it raises very sensit ive and

controversial issues, such as those that involve race, gender or religion. In addition, it may correlate and disc lose confidential,

sensitive facts about individuals.

66. What is Market Basket Analysis?

Ans:Market Basket Analysis is one of the most common and useful techniques for data analysis for marketing. The purpose of market

basket analysis is to determine what products customers purchase together in a market. This can be helpful to a retailer, merchant

or manufacturer or any other type of organization interested in studying consumer buying patterns.

67. What is Visual Data Mining?

Ans:

Visual data mining integrates data mining and data visualization in order to discover implicit and useful knowledge from large data

sets.

------------------------------------------------------------------------------------------------------------

I. Note: Answer all the questions.

68. What is Normalization? What are the different forms of Normalization ?

The usual approach in normalization in database applications is to ensure that the data is divided into two or more tables, such

that when the data in one of them is updated, it does not lead to anamolies of data (The student is advised to refer any book on


7/21



data base management systems for details, if interested).

The idea is to ensure that when combined, the data available is consistent. However, in data warehousing, one may even tend

to break the large table into several denormalized smaller tables. This may lead to lots of extra space being used. But it helps

in an indirect way It avoids the overheads of joining the data during queries.

69. Define Data warehouse. What are roles of education in a data warehousing delivery process?

Data Warehouse:In its simplest form, a data ware house is a collection of key pieces of information used to manage and

direct the business for the most profitable outcome. It would decide the amount of inventory to be held, the no. of employees

to be hired, the amount to be procured on loan etc.,.

The above definition may not be precise - but that is how data ware house systems are. There are different definitions given by

different authors, but we have this idea in mind and proceed. It is a large collection of data and a set of process managers that

use this data to make information available. The data can be meta data, facts, dimensions and aggregations. The process

managers can be load managers, ware house managers or query managers. The information made available is such that they

allow the end users to make informed decisions.

Roles of education in a data warehousing delivery process:-

This has two roles to play - one to make people, specially top level policy makers, comfortable with the concept. The second

role is to aid the prototyping activity. To take care of the education concept, an initial (usually scaled down) prototype is

created and people are encouraged to interact with it. This would help achieve both the activities listed above. The users

became comfortable with the use of the system and the ware house developer becomes aware of the limitations of his

prototype which can be improvised upon.

70.What is process managers? What are the different types of process managers?

Process Managers:These are responsible for the smooth flow, maintainance and upkeep of data into and out of the

database.

The main types of process managers are:-

i). Load manager:to take case of source interaction, data transformation and data load.

ii). Ware house manger:to take care of data movement, meta data management and performance

monitoring.

iii). Query manager:to control query scheduling and monitoring.

We shall look into each of them briefly. Before that, we look at a schematic diagram that defines the boundaries of the three

types of managers.

71. Give the architectures of data mining systems.
http://4.bp.blogspot.com/-Enb7FqPtHXc/UH_GGiTR5FI/AAAAAAAAJUg/c8y0rlSXtBQ/s1600/part-a+d.jpghttp://4.bp.blogspot.com/-W0RkmcDG9CE/UH_FwbNvpHI/AAAAAAAAJUY/mzS-PSRdhg0/s1600/part-a+c.jpg


8/21



72.What are the guidelines for KDD environment ?

It is customary in the computer industry to formulate rules of thumb that help information technology (IT) specialists to apply

new developments. In setting up a reliable data mining environment we may follow the guidelines so that KDD system may

work in a manner we desire.

i).Support extremely large data sets

ii).Support hybrid learning

iii).Establish a data warehouse

iv).Introduce data cleaning facilities

v).Facilitate working with dynamic coding

vi).Integrate with decision support system

vii).Choose extendible architecture

viii).Support heterogeneous databases

ix).Introduce client/server architecture

x).Introduce cache optimization

PART - B

II. Answer any FIVE full questions.

73. With the help of a diagram explain architecture of data warehouse.

The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of

architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items

depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of

implementation.

Before looking into the details of each of the managers we could get a broad idea about their functionality by mapping the

processes that we studied in the previous chapter to the managers. The extracting and loading processes are taken care of by

the load manager. The processes of cleanup and transformation of data as also of back up and archiving are the duties of the

ware house manage, while the query manager, as the name implies is to take case of query management.

74. Indicate the important function of a Load Manager, Warehouse Manager.

Important function of Load Manager:

i) To extract data from the source (s)

ii) To load the data into a temporary storage device

iii) To perform simple transformations to map it to the structures of the data ware house.

Important function of Warehouse Manager:

i) Analyze the data to confirm data consistency and data integrity .

ii) Transform and merge the source data from the temporary data storage into the ware house.

iii) Create indexes, cross references, partition views etc.,.

iv) Check for normalizations.

v) Generate new aggregations, if needed.

vi) Update all existing aggregations

vii) Create backups of data.viii) Archive the data that needs to be archived.

75. Differentiate between vertical partitioning and horizontal partitioning.

In horizontal partitioning, we simply the first few thousand entries in one partition, the second few thousand in the next and

so on. This can be done by partitioning by time, where in all data pertaining to the first month / first year is put in the first

partition, the second one in the second partition and so on. The other alternatives can be based on different sized dimensions,

partitioning an other dimensions, petitioning on the size of the table and round robin partitions. Each of them have certain

advantages as well as disadvantages.

In vertical partitioning, some columns are stored in one partition and certain other columns of the same row in a different

partition. This can again be achieved either by normalization or row splitting. We will look into their relative trade offs.

76.What is schema? Distinguish between facts and dimensions.

A schema, by definition, is a logical arrangements of facts that facilitate ease of storage and retrieval, as described by the end
http://4.bp.blogspot.com/-QXykiWd2w7s/UH_HE-g2qlI/AAAAAAAAJUw/6v334-3Bk0g/s1600/part-b+i.jpg


9/21



users. The end user is not bothered about the overall arrangements of the data or the fields in it. For example, a sales

executive, trying to project the sales of a particular item is only interested in the sales details of that item where as a tax

practitioner looking at the same data will be interested only in the amounts received by the company and the profits made.

The star schema looks a good solution to the problem of ware housing. It simply states that one should identify the facts and

store it in the read-only area and the dimensions surround the area. Whereas the dimensions are liable to change, the facts are

not. But given a set of raw data from the sources, how does one identify the facts and the dimensions? It is not always easy,

but the following steps can help in that direction.

i)Look for the fundamental transactions in the entire business process. These basic entities

are the facts.

ii)Find out the important dimensions that apply to each of these facts. They are the candidates

for dimension tables.

iii)Ensure that facts do not include those candidates that are actually dimensions, with a set of

facts attached to it.

iv)Ensure that dimensions do not include these candidates that are actually facts.

77. What is an event in data warehousing? List any five events.

An event is defined as a measurable, observable occurrence of a defined action. If this definition is quite vague, it is because it

encompasses a very large set of operations. The event manager is a software that continuously monitors the system for the

occurrence of the event and then take any action that is suitable (Note that the event is a measurable and observable

occurrence). The action to be taken is also normally specific to the event.

A partial list of the common events that need to be monitored are as follows:

i). Running out of memory space.

ii). A process dying

iii). A process using excessing resource

iv). I/O errors

v). Hardware failure

78. What is summary table? Describe the aspects to be looked into while designing a summary table.

The main purpose of using summary tables is to cut down the time taken to execute a specific query.

The main methodology involves minimizing the volume of data being scanned each time the query is to be

answered. In other words, partial answers to the query are already made available. For example, in the

above cited example of mobile market, if one expects

i)the citizens above 18 years of age

ii)with salaries greater than 15,000 and

iii)with professions that involve traveling are the potential customers, then, every time the query is to be processed (may be

every month or every quarter), one will have to look at the entire data base to compute these values and then combine them

suitably to get the relevant answers. The other method is to prepare summary tables, which have the values pertaining toe ach

of these sub-queries, before hand, and then combine them as and when the query is raised.

Summary table are designed by following the steps given below:

i) Decide the dimensions along which aggregation is to be done.

ii) Determine the aggregation of multiple facts.

iii) Aggregate multiple facts into the summary table.

iv) Determine the level of aggregation and the extent of embedding.

v) Design time into the table.

vi) Index the summary table.

79 List the significant issues in automatic cluster detection.

Most of the issues related to automatic cluster detection are connected to the kinds of questions we want to be answered in

the data mining project, or data preparation for their successful application.

i). Distance measure

Most clustering techniques use for the distance measure the Euclidean distance formula (square root of the sum of the squares

of distances along each attribute axes).

Non-numeric variables must be transformed and scaled before the clustering can take place. Dependingon this transformations, the categorical variables may dominate clustering results or they may be even

completely ignored.

ii). Choice of the right number of clusters

If the number of clusters k in the K-means method is not chosen so to match the natural structure of the data, the results will

not be good. The proper way t alleviate this is to experiment with different values for k. In principle, the best k value will

exhibit the smallest intra-cluster distances and largest inter-cluster distances.

iii). Cluster interpretation

Once the clusters are discovered they have to be interpreted in order to have some value for the data mining project.

81.Define data marting. List the reasons for data marting.

The data mart stores a subset of the data available in the ware house, so that one need not always have to scan through the

entire content of the ware house. It is similar to a retail outlet. A data mart speeds up the queries, since the volume of data to

be scanned is much less. It also helps to have tail or made processes for different access tools, imposing control strategies


10/21



etc.,.

Following are the reasons for which data marts are created:

i)Since the volume of data scanned is small, they speed up the query processing.

ii)Data can be structured in a form suitable for a user access too

iii)Data can be segmented or partitioned so that they can be used on different platforms and also different control strategies

become applicable.

82. Explain how to categorize data mining system.

There are many data mining systems available or being developed. Some are specialized systems dedicated to a given data

source or are confined to limited data mining functionalities, other are more versatile and comprehensive. Data mining systems

can be categorized according to various criteria among other classification are the following:

a)Classification according to the type of data source mined: this classification categorizes data mining systems according to the

type of data handled such as spatial data, multimedia data, time-series data, text data, World Wide Web, etc.

b)Classification according to the data model drawn on: this classification categorizes data mining systems based on the data

model involved such as relational database, object-oriented database, data warehouse, transactional, etc.

c)Classification according to the king of knowledge discovered: this classification categorizes data mining systems based on

the kind of knowledge discovered or data mining functionalities, such as characterization, discrimination, association,

classification, clustering, etc. Some systems tend to be comprehensive systems offering several data mining functionalities

together.

d)Classification according to mining techniques used: Data mining systems employ and provide different techniques. This

classification categorizes data mining systems according to the data analysis approach used such as machine learning, neural

networks, genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, etc.

83. List and explain different kind of data that can be mined.

Different kind of data that can be mined are listed below:-

i). Flat files:Flat files are actually the most common data source for data mining algorithms, especially at the research level.

ii). Relational Databases:A relational database consists of a set of tables containing either values of entity attributes, or

values of attributes from entity relationships.

iii). Data Warehouses:A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often

heterogeneous) and is intended to be used as a whole under the same unified schema.

iv). Multimedia Databases:Multimedia databases include video, images, audio and text media. They can be stored on

extended object-relational or object-oriented databases, or simply on a file system.

v). Spatial Databases:Spatial databases are databases that in addition to usual data, store geographical information like

maps, and global or regional positioning.

vi). Time-Series Databases:Time-series databases contain time related data such stock marketdata or logged activities.

These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real

time analysis.

vii). World Wide Web:The World Wide Web is the most heterogeneous and dynamic repository available. A very large

number of authors and publishers are continuously contributing to its growth and metamorphosis and a massive number of

users are accessing its resources daily.

84. Give the syntax for task relevant data specification.

Syntax for tax-relevant data specification:-

The first step in defining a data mining task is the specification of the task-relevant data, that is, the data on which mining is to

be performed. This involves specifying the database and tables or data warehouse containing the relevant data, conditions for

selecting the relevant data, the relevant attributes or dimensions for exploration, and instructions regarding the ordering or

grouping of the data retrieved. DMQL provides clauses for the clauses for the specification of such information, as follows:-

i). use database (database_name) or use data warehouse (data_warehouse_name):The use clause directs the mining

task to the database or data warehouse specified.

ii). from (relation(s)/cube(s )) [where(condition)]:The from and where clauses respectively specify the database tables or

data cubes involved, and the conditions defining the data to be retrieved.

iii). in relevance to (attribute_or_dimension_list):This clause lists the attributes or dimensions for exploration.

iv). order by (order_list):The order by clause specifies the sorting order of the task relevant data.v). group by (grouping_list):the group by clause specifies criteria for grouping the data.

vi). having (conditions):The having cluase specifies the condition by which groups of data are considered relevant.

85. Explain the designing of GUI based on data mining query language.

A data mining query language provides necessary primitives that allow users to communicate with data mining systems. But

novice users may find data mining query language difficult to use and the syntax difficult to remember. Instead , user may prefer

to communicate with data mining systems through a graphical user interface (GUI). In relational database technology , SQL

serves as a standard core language for relational systems , on top of which GUIs can easily be designed. Similarly, a data

mining query language may serve as a core language for data mining system implementations, providing a basis for the

development of GUI for effective data mining.

A data mining GUI may consist of the following functional components:-

a) Data collection and data mining query composition -This component allows the user to specify task-relevant data sets

and to compose data mining queries. It is similar to GUIs used for the specification of relational queries.
http://vslifehacker.blogspot.in/2012/10/ku-5th-sem-assignment-bsit-tb-53-data.html#


11/21



b) Presentation of discovered patterns This component allows the display of the discovered patterns in various forms,

including tables, graphs, charts, curves and other visualization techniques.

c) Hierarchy specification and manipulation -This component allows for concept hierarchy specification , either manually

by the user or automatically. In addition , this component should allow concept hierarchies to be modified by the user or

adjusted automatically based on a given data set distribution.

d) Manipulation of data mining primitives This component may allow the dynamic adjustment of data mining thresholds,

as well as the selection, display and modification of concept hierarchies. It may also allow the modification of previous data

mining queries or conditions.

e) Interactive multilevel mining This component should allow roll-up or drill-down operations on discovered patterns.

f) Other miscellaneous information This component may include on-line help manuals, indexed search , debugging and

other interactive graphical facilities.

86. Explain how decision trees are useful in data mining.

Decision trees are powerful and popular tools for classification and prediction. The attractiveness of tree-based methods is due

in large part to the fact that, it is simple and decision trees represent rules. Rules can readily be expressed so that we humans

can understand them or in a database access language like SQL so that records falling into a particular category may be

retrieved.

87 Identify an application and also explain the techniques that can be incorporated in solving the problem using

data mining techniques.

Write yourself...

88.Write a short notes on :

i) Data Mining Querying Language

ii) Schedule Manager

iii) Data Formatting.

i) Data Mining Querying Language

A data mining language helps in effective knowledge discovery from the data mining systems. Designing

a comprehensive data mining language is challenging because data mining covers a wide spectrum of

tasks from data characterization to mining association rules, data classification and evolution analysis.

Each task has different requirements. The design of an effective data mining query language requires a

deep understanding of the power, limitation and underlying mechanism of the various kinds of data mining

tasks.

ii) Schedule manager

The scheduling is the key for successful warehouse management. Almost all operations in the ware

house need some type of scheduling. Every operating system will have its own scheduler and batch

control mechanism. But these schedulers may not be capable of fully meeting the requirements of a data

warehouse. Hence it is more desirable to have specially designed schedulers to manage the operations.

iii) Data formatting

Final data preparation step which represents syntactic modifications to the data that do not change its

meaning, but are required by the particular modelling tool chosen for the DM task. These include:

a). reordering of the attributes or records: some modelling tools require reordering of the attributes

(or records) in the dataset: putting target attribute at the beginning or at the end, randomizing

order of records (required by neural networks for example)

b). changes related to the constraints of modelling tools: removing commas or tabs, special

characters, trimming strings to maximum allowed number of characters, replacing special

characters with allowed set of special characters.

---------------------------------------------------------------------------------------------------------------

89.With neat diagram explain the main parts of the computer

A Computer will have 3 basic main parts

i). A central processing unit that does a ll the arithmetic and logical operations. This can be

thought of as the heart of any computer and computers are identified by the type of CPU

that they use.

ii). The memory is supposed to hold the programs and data. All the computers that we came

across these days are what are known as stored program computers. The programs are

to be stored before hand in the memory and the CPU accesses these programs line by line

and executes them.

iii). The Input/output devices: These devices facilitate the interaction of the users with the computer.

The input devices are used to send information to the computer, while the output devices

accept the processed information form the computer and make it available to the user.

Diagram:-


12/21



91. Briefly explain the types of memories.

There are two types of memories Primary memory, which is embedded in the computer

and which is the main source of data to the computer and the secondary memory like floppy disks, CDs etc., which can be

carried around and used in different computers. They cost much less than the primary memory, but the CPU can access data

only from the primary memory. The main advantage of computer memories, both primary and secondary, is that they can store

data indefinitely and accurately

92. Describe the basic concept of databases.

The Concept of Database :-

We have seen in the previous section how data can be stored in computer. Such stored data becomes

a database a collection of data. For example, if all the marks scored by all the students of a class are

stored in the computer memory, it can be called a database. From such a database, we can answer

questions like who has scored the highest marks? ; In which subject the maximum number of students

have failed?; Which students are weak in more than one subject? etc. Of course, appropriate programs

have to be written to do these computations. Also, as the database becomes too large and more and more

data keeps getting included at different periods of time, there are several other problems about maintaining

these data, which will not be dealt with here.

Since handling of such databases has become one of the primary jobs of the computer in recent years,

it becomes difficult for the average user to keep writing such programs. Hence, special languages

called database query languages- have been deviced, which makes such programming easy, there languages

help in getting specific queries answered easily.

93. With example e xplain the different vie ws of a data.

Data is normally stored in tabular form, unless storage in other formats becomes advantageous, we

store data in what are technically called relations or in simple terms as tables.

The views are Mainly 2 types .

i). Simple View

ii). Complex View

Simple view:

- It is created by selecting only one table. - It does not contains functions.

- it can perform DML (SELECT,INSERT,UPDATE,DELETE,MERGE, CALL,LOCK TABLE) operations through

simple view.

Complex view :

-It is created by selecting more than one table.

-It can performs functions.

-You can not perform always DML operations through

94 Briefly explain the concept of normalization.

Normalization is dealt with in several chapters of any books on database management systems. Here, we will take the simplest

definition, which suffices our purpose namely any field should not have subfields.

Again consider the following student table.

Here under the field marks, we have 3 sub fields: marks for subject1, marks for subject2 and subject3.

However, it is preferable split these subfields to regular fields as shown below
http://2.bp.blogspot.com/-_F3Ch0496Tk/UEMs9coBJ2I/AAAAAAAAJC0/UBVo0cnKuoM/s1600/Diagram2.pnghttp://1.bp.blogspot.com/-wMBR-FLi8WI/UEMs8g8V4gI/AAAAAAAAJCs/bV3oijBR1PQ/s1600/Diagram1.pnghttp://2.bp.blogspot.com/-zabAYk26qRQ/UEMmvmBGdiI/AAAAAAAAJBw/L6cqD9dh80U/s1600/Diagram.png


13/21



Quite often, the original table which comes with subfields will have to be modified suitable, by the

process of normalization.

95 Explain the concept of data ware house delivery process in detail.

The concept of data ware house delivery process :-

This section deals with the dataware house from a different view point - how the different components that go into it enable the

building of a data ware house. The study helps us in two ways:

i) to have a clear view of the data ware house building process.

ii) to understand the working of the data ware house in the context of the components.

Now we look at the concepts in details :-

i). IT Strategy :The company must and should have an overall IT strategy and the data ware housing has to be a part of the

overall strategy.

ii). Busines s case analysis :This looks at an obvious thing, but is most often misunderstood. The overall understanding of the

business and the importance of various components there in is a must. This will ensure that one can clearly justify the appropriate

level of investment that goes into the data ware house design and also the amount of returns accruing.

iii). Education : This has two roles to play - one to make people, specially top level policy makers, comfortable with

the concept. The second role is to aid the prototyping activity.

iv). Business Requirements :As has been discussed earlier, it is essential that the business requirements are fully

understood by the data ware house planner. This would ensure that the ware house is incorporated adequately in the overall setup

of the organization.

v). Technical blue prints : This is the stage where the overall architecture that satisfies the requirements is delivered.

vi). Building the vision :Here the first physical infrastructure becomes available. The major infrastructure components are

set up, first stages of loading and generation of data start up.

vii). History load :Here the system is made fully operational by loading the required history into the ware house - i.e. what

ever data is available over the previous years is put into the dataware house to make is fully operational.

viii). Adhoc Query :Now we configure a query tool to operate against the data ware house.

ix). Automation :This phase automates the various operational processes like -

a) Extracting and loading of data from the sources.

b) Transforming the data into a suitable form for analysis.

c) Backing up, restoration and archiving.

d) Generate aggregations.

e) Monitoring query profiles.

x). Extending Scope :There is not single mechanism by which this can be achieved. As and when needed, a new set of

data may be added, new formats may be included or may be even involve major changes.

xi). Requirement Evolution : Business requirements will constantly change during the life of the ware house. Hence, the

process that supports the ware house also needs to be constantly monitored and modified.

96. What are three major activities of data ware house ? Explain.Three major activities of data ware house are :-

i) Populating the ware house (i.e. inclusion of data)

ii) day-to-day management of the ware house.

iii) Ability to accommodate the changes.

i). The processes to populate the ware house have to be able to extract the data, clean it up, and make it available to the

analysis systems. This is done on a daily / weekly basis depending on the quantum of the data population to be incorporated.

ii). The day to day management of data ware house is not to be confused with maintenance and management of hardware and

software. When large amounts of data are stored and new data are being continually added at regular intervals, maintaince of the

quality of data becomes an important element.

iii). Ability to accommodate changes implies the system is structured in such a way as to be able to cope with future changes

without the entire system being remodeled. Based on these, we can view the processes that a typical data ware house scheme

should support as follows.
http://1.bp.blogspot.com/-CXMqy6qqR4o/UENyBaoXROI/AAAAAAAAJDw/Pf3MncD0rqg/s1600/1.png


14/21



97. Explain the extract and load process of data ware house.

Extract and Load Process : This forms the first stage of data ware house. External physical systems like the sales counters

which give the sales data, the inventory systems that give inventory levels etc. constantly feed data to the warehouse. Needless to

say, the format of these external data is to be monitored and modified before loading it into the ware house. The data ware house

must extract the data from the source systems, load them into their data bases, remove unwanted fields (either because they are

not needed or because they are already there in the data base), adding new fields / reference data and finally reconciling with the

other data. We shall see a few more details of theses broad actions in the subsequent paragraphs.

i).A mechanism should be evolved to control the extraction of data, check their consistency

etc. For example, in some systems, the data is not authenticated until it is audited.

ii).Having a set of consistent data is equally important. This especially happens when we are

having several online systems feeding the data.iii).Once data is extracted from the source systems, it is loaded into a temporary data storage

before it is Cleaned and loaded into the warehouse.

98. In what ways data needs to be cleaned up and checked? Explain briefly.

Data needs to be cleaned up and checked in the following ways :-

i) It should be consistent with itself.

ii) It should be consistent with other data from the same source.

iii) It should be consistent with other data from other sources.

iv) It should be consistent with the information already available in the data ware house.

While it is easy to list act the needs of a clean data, it is more difficult to set up systems that

automatically cleanup the data. The normal course is to suspect the quality of data, if it does not meet the normally standards of

commonsense or it contradicts with the data from other sources, data already available in the data ware house etc. Normal

intution doubts the validity of the new data and effective measures like rechecking, retransmission etc., are undertaken. Whennone of these are possible, one may even resort to ignoring the entire set of data and get on with next set of incoming data.

99. Explain the architecture of data warehouse .

The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of

architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items

depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of

implementation.

100. Briefly explain the functions of each manager of data warehouse .

The Warehouse Manager : The ware house manager is a component that performs all operations necessary to support the

ware house management process. Unlike the load manager, the warehouse management process is driven by the extent to which

the operational management of the data ware house has been automated.

The ware house manger can be easily termed to be the most complex of the ware house components, and performs a variety of

tasks. A few of them can be listed below.

i) Analyze the data to confirm data consistency and data integrity.
http://4.bp.blogspot.com/-763iLWOBeJI/UEN0YkpCUhI/AAAAAAAAJEA/EfpfpaolPCc/s1600/3.pnghttp://4.bp.blogspot.com/-rNXevEkrySQ/UENz8CeI6aI/AAAAAAAAJD4/BVqjIwNlSBE/s1600/2.png


15/21



ii) Transform and merge the source data from the temporary data storage into the ware house.

iii) Create indexes, cross references, partition views etc.,.

iv) Check for normalizations.

v) Generate new aggregations, if needed.

vi) Update all existing aggregations

vii) Create backups of data.

viii) Archive the data that needs to be archived.

101. Explain the star schema to represent the sales analysis.

Star schemes are data base schemas that structure the data to exploit a typical decision support

enquiry. When the components of typical enquirys a re examined, a few similarities stand out. i) The queries examine a set of factual transactions - sales for example.

ii) The queries analyze the facts in different ways - by aggregating them on different bases /

graphing them in different ways.

The central concept of most such transactions is a fact table. The surrounding references are called dimension tables. The

combination can be called a star schema.

102. What do you mean by partition of data? Explain briefly.

Partitioning of data :-

In most ware houses, the size of the fact data tables tends to become very large. This leads to several problems of management,

backup, processing etc. These difficulties can be over come by partitioning each fact table into separate partitions.

Data ware houses tend to exploit these ideas by partitioning the large volume of data into data sets. For example, data can be

partitioned on weekly / monthly basis, so as the minimize the amount of data scanned before answering a query. This technique

allows data to be scanned to be minimized, without the overhead of using an index. This improves the overall efficiency of the

system. However, having too many partitions can be counter productive and an optimal size of the partitions and the number of

such partitions is of vital importance.

Participating generally helps in the following ways.

i) Assists in better management of data

ii) Ease of backup / recovery since the volumes are less.

iii) The star schemes with partitions produce better performance.

iv) Since several hardware architectures opera te better in a partitioned environment, the overall

system performance improve.

103. Describe the terms data mart and Meta data.

Data mart :-

A data mart is a subset of information content of a data ware house, stored in its own data base. The data of a data ware house

may have been collected through a ware house or in some cases, directly from the source. In a crude sense, if you consider a

data ware house as a whole sale shop of data, a data mart can be thought of as a retailer.

Me ta data :-

Meta data is simply data about data. Data normally describe the objects, their

quantity, size, how they are stored etc. Similarly meta data stores data about how data (of objects) is stored, etc.

Meta data is useful in a number of ways. It can map data sources to the common view of information within the warehouse. It is

helpful in query management, to direct query to most appropriate source etc.,.

The structure of meta data is different for each process. It means for each volume of data, there are multiple sets of meta data

describing the same volume. While this is a very convenient way of managing data, managing meta data itself is not a very easy
http://2.bp.blogspot.com/-9m44Hm9SMGc/UEN1UpCp6JI/AAAAAAAAJEI/AC1G-KTqxAg/s1600/4.png


16/21



task.

104. Enlist the differences between fact and dimension.

This ensures that key dimensions are no fact tables.

Consider the following example :-

Let us elaborate a little on the example. Consider a customer A. If there is a situation, where the

warehouse is building the profiles of customer, then A becomes a fact - against the name A, we can list his address, purchases,

debts etc. One can ask questions like how many purchases has A made in the last 3 months etc. Then A is fact. On the other

hand, if it is likely to be used to answer questions like how many customers have made more than 10 purchases in the last 6

months, and one uses the data of A, as well as of other customers to give the answer, then it becomes a fact table. The rule is,

in such cases, avoid making A as a candidate key.

105. Explain the designing of star-flake schema in detail.

A star flake schema, as we have defined previously, is a schema that uses a combination of denormalised star and normalized

snow flake schemas. They are most appropriate in decision support data ware houses. Generally, the detailed transactions are

stored within a central fact table, which may be partitioned horizontally or vertically. A series of combinatory data base views are

created to allow the user to access tools to treat the fact table partitions as a single, large table.

The key reference data is structured into a set of dimensions. Theses can be referenced from the fact table. Each dimension is

stored in a series of normalized tables (snow flakes), with an additional denormalised star dimension table.

106. What is query redirection? Ex plain.

Query Redirection :-

One of the basic requirements for successful operation of star flake schema (or any schema, for that matter) is the ability to

direct a query to the most appropriate source. Note that once the available data grows beyond a certain size, partitioning becomes

essential. In such a scenario, it is essential that, in order to optimize the time spent on querying, the queries should be directed to

the appropriate partitions that store the date required by the query.

The basic method is to design the access tool in such away that it automatically defines the locality to which the query is to be

redirected.

107. In detail, explain the multidimensional schema.

Multidimensional schemas :-

Before we close, we see the interesting concept of multi dimensions. This is a very convenient

method of analyzing data, when it goes beyond the normal tabular relations.

For example, a store maintains a table of each item it sells over a month as a table, in each of its 10 outlets..
http://1.bp.blogspot.com/-AYfTwEqDI3w/UETjp82pSoI/AAAAAAAAJFE/1IVDuIX2mSo/s1600/5.png


17/21



This is a 2 dimensional table. One the other hand, if the company wants a data of all items sold by its outlets, it can be done by

simply by superimposing the 2 dimensional table for each of these items one behind the other. Then it becomes a 3 dimensional

view.

Then the query, instead of looking for a 2 dimensional rectangle of data, will look for a 3 dimensional cuboid of data.

There is no reason why the dimensioning should stop at 3 dimensions. In fact almost all queries can be thought of as approaching

a multi-dimensioned unit of data from a multidimensioned volume of the schema.

108. Why partitioning is nee ded in large data warehouse ?

Partitioning is needed in any large data ware house to ensure that the performance and manageability is improved. It can help the

query redirection to send the queries to the appropriate partition, thereby reducing the overall time taken for query processing.

109. Explain the types of partitioning in detail.

i). Horizontal partitioning :-

This is essentially means that the table is partitioned after the first few thousand entries, and the next

few thousand entries etc. This is because in most cases, not all the information in the fact table needed all the time. Thus

horizontal partitioning helps to reduce the query access time, by directly cutting down the amount of data to be scanned by the

queries.

ii). Vertical partitioning :-

As the name suggests, a vertical partitioning scheme divides the table vertically i.e. each row is

divided into 2 or more partitions.

iii). Hardware partitioning :-

Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to ensure

this is to try to optimize by designing the data base with respect to specific hardware architecture.

110. Explain the mechanism of row splitting.

Row Splitting :-

The method involved identifying the not so frequently used fields and putting them into another table.

This would ensure that the frequently used fields can be accessed more often, at much lesser computation time.

It can be noted that row splitting may not reduce or increase the overall storage needed, but normalization may involve a change

in the overall storage space needed. In row splitting, the mapping is 1 to 1 whereas normalization may produce one to many

relationships.
http://4.bp.blogspot.com/-iOjo0gUXrTg/UETkoKG63mI/AAAAAAAAJFM/0DGuOeriuHE/s1600/6.png


18/21


19/21



i). The cost of se tting up and operating data marts is quite high.

ii). Once a data strategy is put in place, the datamart formats become fixed. It may be fairly difficult to change the strategy later,

because the data marts formats also have to be changes.

117. What is role of access control issue in data mart design?

Role of access control issue in data mart design :-

This is one of the major constraints in data mart designs. Any data warehouse, with its huge volume

of data is, more often than not, subject to various access controls as to who could access which part of data. The easiest case is

where the data is partitioned so clearly that a user of each partition cannot access any other data. In such cases, each of these

can be put in a data mart and the user of each can access only his data .

In the data ware house, the data pertaining to all these marts are stored, but the partitioning are retained. If a super user wants toget an overall view of the data, suitable aggregations can be generated.

118. Explain the purpose of using metadata in detail.

Metadata will be used for the following purposes :-

i). data transformation and loading.

ii). data management.

iii). query generation.

119. Explain the concept of metadata management.

Meta data should be able to describe data as it resides in the data warehouse. This will help the warehouse manager to control

data movements. The purpose of the metadata is to describe the objects in the database. Some of the descriptions are listed

here.

Tables

- Columns

* Names

* Types

Indexes

- Columns

* Name

* Type

Views

- Columns

* Name

* Type

Constraints

- Name

- Type

- Table

* Columns

120. How the query manager uses the Meta data? Explain in detail.

Meta data is also required to generate queries. The query manger uses the metadata to build a history of all queries run and

generator a query profile for each user, or group of uses.

We simply list a few of the commonly used meta data for the query. The names are self explanatory.

o Query

o Table accessed

Column accessed

Name

Reference identifier

o Restrictions applied

o Column name o Table name

o Reference identifier

o Restrictions

o Join criteria applied

o Column name

o Table name


o Column name

o Table name


o Aggregate function used

o Column name


20/21




o Aggregate function

o Group by criteria

o Column name


o Sort direction

o Syntax

o Resources

o Disk

o Read

o Write o Temporary

121. Why we need different managers to a data ware house? Explain.

Need for managers to a data ware house :-

Data warehouses are not just large databases. They are complex environments that integrate many

technologies. They are not static, but will be continuously changing both contentwise and structurewise. Thus, there is a constant

need for maintenance and management. Since huge amounts of time, money and efforts are involved in the development of data

warehouses, sophisticated management tools are always justified in the case of data warehouses.

When the computer systems were in their initial stages of development, there used to be an army of

human managers, who went around doing all the administration and management. But such a scheme became both unvieldy and

prone to errors as the systems grew in size and complexity. Further most of the management principles were adhoc in nature and

were subject to human errors and fatigue.

122. With neat diagram explain the boundaries of process managers.

A schematic diagram that defines the boundaries of the three types of managers :-

123. Explain the responsibilities of each manager of data ware house.

Ware house Manager :-

The warehouse manager is responsible for maintaining data of the ware house. It should also create

and maintain a layer of meta data. Some of the responsibilities of the ware house manager are

o Data movement

o Meta data management

o Performance monitoring

o Archiving.

Data movement includes the transfer of data within the ware house, aggregation, creation and

maintenance of tables, indexes and other objects of importance. It should be able to create new aggregations as well as

remove the old ones. Creation of additional rows / columns, keeping track of the aggregation processes and creating meta data

are also its functions.

124. What are the different system management tools used for data warehouse?

The different system management tools used for data warehouse :-

i). Configuration managers

ii). schedule managers

iii). event managers

iv). database mangers

v). back up recovery managers

vi). resource and performance a monitors.

-----------------------------------------------------------------------------------------------------
http://1.bp.blogspot.com/-YsBRNO63x08/UEVxLMBzvuI/AAAAAAAAJHM/CJc6fmO3du8/s1600/pic.png


21/21


Newer Post Older PostHome

Subscribe to: Post Comments (Atom)

Posted by Satyendra Sharma at 22:19

Labels: solved Answ ers Bsc.IT -53

Recommend this on Google

Enter your comment...

Comment as: Google Accou

Publish

Preview

No comments:

Post a Comment

Subscribe To

Posts

Comments

Satyendra Sharma

10have me in circles

Add to circles

Google+ Followers

Copyright 2013 Created By - Satyendra Sharma. Simple template. Powered by Blogger.

Assignments(3) Cracking Password(2) Downloads(2) ibps(2) Icons Pack(2) PRACTICA

Labels
http://sslifehacker.blogspot.in/search/label/PRACTICALhttp://sslifehacker.blogspot.in/search/label/Icons%20Packhttp://sslifehacker.blogspot.in/search/label/ibpshttp://sslifehacker.blogspot.in/search/label/Downloadshttp://sslifehacker.blogspot.in/search/label/Cracking%20Passwordhttp://sslifehacker.blogspot.in/search/label/Assignmentshttp://www.blogger.com/https://plus.google.com/105262405286077301848https://plus.google.com/102066855548888917959https://plus.google.com/111199550385521280450https://plus.google.com/109931362660505788545https://plus.google.com/112354499610531326094https://plus.google.com/114700096026733923382https://plus.google.com/105221740964368678996https://plus.google.com/104293862079951853374https://plus.google.com/+PreetyUzlainsarkariNaukrihttps://plus.google.com/109931362660505788545http://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=pinteresthttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=facebookhttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=twitterhttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=bloghttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=emailhttp://sslifehacker.blogspot.in/search/label/solved%20Answers%20Bsc.IT%20-53http://www.blogger.com/email-post.g?blogID=6920930351978174734&postID=6475124884203484237http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.htmlhttps://plus.google.com/109931362660505788545http://sslifehacker.blogspot.com/feeds/6475124884203484237/comments/defaulthttp://sslifehacker.blogspot.in/http://sslifehacker.blogspot.in/2013/05/web-programming-bscit-code-52.htmlhttp://sslifehacker.blogspot.in/2013/05/software-quality-testing-code-54.html

NIIT_ Data Warehousing & Data Mining Code 53

Documents

Transcript of NIIT_ Data Warehousing & Data Mining Code 53