NIIT_ Data Warehousing & Data Mining Code 53

download NIIT_ Data Warehousing & Data Mining Code 53

of 21

Transcript of NIIT_ Data Warehousing & Data Mining Code 53

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    1/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more

    Follow @

    Welcome to..www.sslifehacker.blogspot.in

    Get more stuff like.. Assignments, Documents, PC tips and tricks, Widescreen wallpapers,Learning videos,Open source software,Latest co

    packs,Important web links and whatever you want...

    Home

    Monday, 27 May 2013

    Data Warehousing & Data Mining Code 53

    FAQ

    1. What is the disadvantage of File Management System over DBMS?

    Ans:

    Some of the disadvantages of file management system over database management system are:

    n Data redundancy and inconsis tency

    n Difficulty in accessing

    data

    n Difficulty in integrating data into new enterprise level applications because of varying formats

    n Lack of support for concurrent updates by multiple users.

    n Lack of inherent security

    2. Are relational databases the only possible type of database models?

    Ans:

    No. Apart from relational, other models include network and hierarchical models. However, these two models are obsolete.

    Nowadays, relational and object-oriented models are preferred.

    3. What is referential integrity and how it is achieved in a relational database?

    Ans:

    Referential integrity is a feature of DBMS that prevents the user from entering inconsistent data. This is mainly achieved by having a

    foreign key constraint on a table.

    4. What are the higher normal forms?

    Ans:

    A normal form considered higher than 3NF is the Boyce Codd Normal Form (BCNF). The BCNF differs from the 3 NF when there are

    more than one composite and disjoint candidate keys.

    ------------------------------------------------------------------------------------------------------------

    5. On which layer of application architecture does data warehouse operate?

    January(2)

    December(7)

    November(2)

    October(1)

    August(3)

    May(4)

    April(9)

    Blog Archive

    Sea

    4

    6

    A

    C(

    G(

    s-

    ME

    http://sslifehacker.blogspot.in/search/label/solved%20Answers%20Bsc.IT%20-51http://sslifehacker.blogspot.in/search/label/PRACTICALhttp://sslifehacker.blogspot.in/search/label/Notificationhttp://sslifehacker.blogspot.in/search/label/Icons%20Packhttp://sslifehacker.blogspot.in/search/label/ibpshttp://sslifehacker.blogspot.in/search/label/GNIIT%20Sem%20C%20Elective%20.http://sslifehacker.blogspot.in/search/label/Downloadshttp://sslifehacker.blogspot.in/search/label/Documentshttp://sslifehacker.blogspot.in/search/label/Cracking%20Passwordhttp://sslifehacker.blogspot.in/search/label/Assignmentshttp://sslifehacker.blogspot.in/search/label/6%20Sem%20KU%20Lab%20Practicalhttp://sslifehacker.blogspot.in/search/label/4th%20semhttp://sslifehacker.blogspot.in/2013_04_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_05_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_08_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_10_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_11_01_archive.htmlhttp://sslifehacker.blogspot.in/2013_12_01_archive.htmlhttp://sslifehacker.blogspot.in/2014_01_01_archive.htmlhttp://sslifehacker.blogspot.in/http://sslifehacker.blogspot.in/https://twitter.com/intent/follow?original_referer=http%3A%2F%2Fsslifehacker.blogspot.in%2F2013%2F05%2Fdata-warehousing-data-mining-code-53.html&region=follow_link&screen_name=ss731382&tw_p=followbutton&variant=2.0
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    2/21

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    3/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 3

    20. What are the different OLAP storage models?

    Ans:

    Following are the different OLAP storage models: MOLAP, ROLAP and HOLAP.

    21. A data a nalysis has to be done in the fastest possible means on data stored in Multi-dimensional format. Which

    storage model is best suited in this case?

    Ans:

    MOLAP.

    ---------------------------------------------------------------------------------------------------------

    FAQ

    22. Name the 2 important parameters that decide the granularity of partitions.

    Ans:

    Two important factors that decide the granularity of partitions are the overall size and manageability of the system. Both parameters

    are to be balanced against each other while deciding on a partitioning strategy. Suppose a data containing information about the

    population is partitioned on the basis of state, the two maintenance related issues that could be faced by the administrator are:

    n The query needs the information of all the s tates, such as particular languages spoken in the s tates, when all the s tates have

    to be scanned.

    n If the definition of state changes (state is redefined), the entire fact table needs to be built again.

    23. Are there any disadvantages of data partitioning?

    Ans:

    Data partitioning is by and large an advantageous technique for improving performance. However, it increases the implementation

    complexity and imposes constraints in query design.

    24. Can partitions be indexed?

    Ans:

    Yes, partitions can be indexed if supported by the platform. For example, in Oracle 9i, you can create various types of partitions in

    indexes.

    25. If you have a huge amount of historical data, which is too old to be useful often but cannot be discarded, then can

    partitioning help?

    Ans:

    Essentially the answer to this question depends on various factors such as availability of resources and design st rategies. However,

    you can partition data on the basis of the date that it was last accessed and keep the historical data on a separate partition. In fact,

    you can use Stripping to keep it on a separate disk to improve access speed to the more useful data.

    ------------------------------------------------------------------------------------------------------------

    FAQ

    26. Are there any risks associated w ith aggrega tion?

    Ans:

    The main risk associated with aggregates is that of increase in disk storage space.

    27. Once created, is an aggregate permanent?

    Ans:

    No, aggregates keep changing as per the need of the business. In fact, they can be taken offline or put online anytime by the

    administrator. Aggregates, which have become obsolete, can also be deleted to free up disk space.

    28. Can operations such as MIN and MAX value be determined once a summary table has been created?

    Ans:

    Operations such as MIN and MAX cannot be determined correctly once the summary table has been created. To determine their

    value they must be calculated and stored at the time that the summary table was derived from the base table.

    29. How much storage increase might be required in the data warehouse system when using aggregates?

    Ans:

    The storage needs typically increase by a factor of 1 or sometimes even 2 for aggregates.

    ---------------------------------------------------------------------------------------------------------

    FAQ

    30. What are conformed dimensions?

    Ans:

    A conformed dimension is the one whose meaning is independent of the fact table from which it is being referred to.

    31. What are virtual data marts?

    Ans:Virtual data marts are logical views of multiple physical data marts based on user requirement.

    32. Which tool supports data mart based data w arehouse architectures?

    Ans:

    Informtica is commonly used for implementing data mart based data warehouse architectures.

    33. Is the data in data marts also historical like in data warehouses?

    Ans:

    The data in data marts is historical only to some extent. In fact, it is not the same as the data in data warehouse because of the

    difference in the purpose and approaches of the two.

    ------------------------------------------------------------------------------------------------------------

    FAQ

    34. How can you classify metadata?

    Ans:

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    4/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 4

    You can classify metadata according the use of metadata as:

    n Administrative metadata: Metadata that describes the data used for managing the data in terms of statistics such as t ime of

    creation, access rights, and last access time.

    n Structural metadata: Metadata that describes the structure of the data.

    n Descriptive metadata: Metadata that describes the purpose or functionality of the data.

    35. What is backroom metadata?

    Ans:

    Backroom metadata is the metadata related to the process of extracting, cleaning, and loading. It is of use to the DBA and

    business users but not t o the end-user.

    36. What is a metadata catalogue?

    Ans:A metadata catalogue is the same as a metadata repository. It is also called metadatabase.

    37. Are there any tools for metadata management?

    Ans:

    Yes, there are various tools that facilitate metadata management. One such Windows based tool is Saphir. SQL Server 2000 also

    enables metadata management to some extent.

    -----------------------------------------------------------------------------------------------------------

    FAQ

    38. Are the system and process management devoid of a ny manual intervention considering that process manager is a

    tool and not a person?

    Ans:

    No. Although the system and process manager are themselves tools that automate system and process management in data

    warehouses, they must be configured and sometimes handled through manual intervention at times. These tasks may be done by

    the Database Administrator.

    39. Does SQL Server also provide system ma nagers?

    Ans:Yes. SQL Server includes various components that enable system management through management and security services:

    40. What is Oracle Warehouse Builder(OWB)?

    Ans:

    It is one of the commonly used data warehouse development tools with various advanced features such as support for large

    databases, automated summary management, and embedded multidimensional OLAP engine. Unlike SQL Server, which is only for

    the Windows platform, OWB can be used on all platforms. It is also more fast, reliable, and scaleable than SQL Server.

    41. What is replication?

    Ans:

    Replication is the process of creating multiple copies of data on the same or different platform and keeping the copies in sync.

    ------------------------------------------------------------------------------------------------------------

    FAQ

    42. What is KDD Process?

    Ans:

    The unifying goal of the KDD process is to extract knowledge from data in the context of large databases. It does this by using data

    mining methods (algorithms) to extract (identify) what is deemed knowledge, according to t he specifications of measures and

    thresholds, using a database along with any required preprocessing, subsampling, and transformations of that database.

    43. What is Data Visualization?

    Ans:

    Data Visualization presents data in three dimensions and colors to help users view complex patterns. They also provide advanced

    manipulation capabilities to slice, rotate or zoom the objects to identify patterns.

    44. What a re the constituents of Multidimensional objects?

    Ans:

    Dimensions and Measures.

    45. What does level specify within dimension?

    Ans:

    Levels specify the contents and s tructure of the dimension's hierarchy.

    46. What is data mining?

    Ans:

    Data Mining is the process of finding new and potentially useful knowledge from data

    47. What does Data Mining Software do?

    Ans:

    A Data Mining Software searches large volume of data, looking for patterns that accurately predict behavior, such as customers

    most likely to maintain relationship with the company etc. Common techniques employed by Data Mining Software include Neural

    Networks, Decision Trees and standard statistical modeling.

    48. What is Oracle Data Mining?

    Ans:

    Oracle Data Mining is enterprise data mining software that combines the ease of a Windows-based client with the power of a fully

    scalable, multi-algorithmic, UNIX server-based solution. Oracle Data Mining provides comprehensive predictive modeling capabilities

    that take advantage of parallel computing techniques to rapidly extract valuable customer intelligence information. Oracle Data

    Mining can optionally generate deployable models in C, C++, or Java code, delivering the "power of prediction" to call center,

    campaign management, and Web-based applications enterprise-wide.

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    5/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 5

    49. How does data mining differ from OLAP?

    Ans:

    Simply put, OLAP compares and data mining predicts. OLAP performs roll-ups, aggregations, and calculations, and it compares

    multiple results in a clearly organized graphical or tabular display. Data mining analyzes data on historical cases to discover

    patterns and uses the patterns to make predictions or estimates of outcomes for unknown cases. An analyst may use OLAP to

    discover a business problem, and then apply data mining to make the predictions necessary for a solution. An OLAP user can

    apply data mining to discover meaningful dimensions that should be compared. OLAP may be used to perform roll-ups and

    aggregations needed by the data mining tool. Finally, OLAP can compare data mining predictions or values derived from

    predictions.

    50. What are some typical data mining applications?

    Ans:

    Following are some of the data mining applications:n Customer retention

    n Cross selling

    n Response modeling / target marketing

    n Profitability analysis

    n Product affinity analysis

    n Fraud detection

    ------------------------------------------------------------------------------------------------------------

    FAQ

    51. What is Noisy Data?

    Ans:

    Noise is a random error or variance in data. It can happen because of:

    n Faulty data collection and data entry mistake such as a typing mistake

    n Data transmission and storage problem

    n Inconsistency in naming convention

    Noise makes data inaccurate for predictions and renders it futile for mining systems.

    52. Which are the major data mining tasks?

    Ans:

    The main data mining tasks include:

    n Classification

    n Clustering

    nAssociations

    n Prediction

    n Characterization and Discrimination

    n Evolution Analys is

    53. What are some other Data Mining Languages and standardization of primitives apart from DMQL?

    Ans:

    Some other Data Mining Languages and standardizations of primitives apart from DMQL include:

    n MSQL

    n Mine Rulen Query flocks based on Data log syntax

    n OLEDB for DM

    n CRISP-DM

    54. Which Data Mining tools are used commercially?

    Ans:

    Some Data Mining tools used commercially are:

    n Clementine

    n Darwin

    n Enterprise Miner

    n Intelligent Miner

    n Mine Set

    55. How can noisy data be smoothened?

    Ans:

    Noisy data can be smoothened using the following techniques:

    n Binning

    n Clustering

    n Computer/Human inspection

    n Regression

    ---------------------------------------------------------------------------------------------------------

    FAQ

    56. What are variations to Apriori algorithm?

    Ans:

    Following are some of the variations to Apriori algorithm that improves the efficiency of the original algorithm:

    n Transaction reduction: Reducing the number of transaction scanned in future iterations

    n Partitioning: Partitioning data to find candidate itemsets

    n Sampling: Mining on a subset of the given data

    n Dynamic itemset counting: Adding candidate itemsets at different points during the scan

    57. Which is the best approach when we are interested in finding all possible interactions among a set of attributes?

    Ans:

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    6/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 6

    The best approach to find all possible interactions among a set of attributes is association rule mining.

    58. What is over fitting in neural network?

    Ans:

    Over fitting is a common problem in neural network design. Over fitting occurs when a network has memorized the training set but

    has not learned to generalize to new inputs. Over fitting produces a relatively small error on the training set but will produce a much

    larger error when new data is presented to the network.

    59. What is back propagation neural network?

    Ans:

    The back propagation is a neural network algorithm for classification that employs a method of gradient descent. It searches for a

    set of weights that can model the data to minimize the mean squared distance between the network's class prediction and the

    actual class label of data samples. Rules may be extracted from trained neural networks in order to help improve the interpretabilityof the learned network.

    ----------------------------------------------------------------------------------------------------------

    FAQ

    60. What is the difference between KDD and data mining?

    Ans:

    KDD refers to the overall process of discovering useful knowledge from data. It also includes the choice of encoding schemes,

    preprocessing, sampling, and projections of the data prior to the data-mining step.

    Data mining refers to the application of algorithms for extracting patterns from data without the additional steps of the KDD process.

    It is essentially the modeling step in the KDD process.

    61. Is data stored for data mining different from other operational systems?

    Ans:

    The two systems differ in the usage patterns associated with data mining. Some of the basic differences are:

    n Operational systems support transactional processing whereas data mining systems support analytical processing.

    n Operational sys tems are process-oriented whereas data mining systems are subject-oriented.

    n Operational systems are concerned with current data whereas data mining systems deal with historical data.

    n Operational systems are updated frequently and have volatile data whereas data mining systems have non-volatile data and are

    rarely changed.

    n Operational sys tems are optimized for fast updates whereas data mining systems are optimized for fast retrievals.

    62. What a re the primary goals of the KDD process?

    Ans:

    The two primary goals of data mining are:

    n Prediction of unknown or future values

    n Description, which focuses on finding human-interpretable patterns

    63. Does KDD need to be scalable?

    Ans:

    KDD should be scalable because of the ever-increasing data in enterprises. Limiting the size of a data mining system will affect the

    accuracy of predictions.

    ------------------------------------------------------------------------------------------------------------

    FAQ

    64. What a re different types of Query Answering Mechanisms?

    Ans:

    Query answering can be classified into two categories based on their method of response:

    n Direct Query answering: It means that query is answered by returning exactly what is being asked.

    n Intelligent Query answering: It refers to analyzing the intent of the query and providing generalized, neighborhood, or

    associated information relevant to the query.

    65. Are there any social concerns related to data mining?

    Ans:

    The most important social issue is privacy of data pertaining to individual. KDD poses a threat to privacy. The discovered patterns

    often classify individuals into categories, revealing their confidential personal information. Moreover, it raises very sensit ive and

    controversial issues, such as those that involve race, gender or religion. In addition, it may correlate and disc lose confidential,

    sensitive facts about individuals.

    66. What is Market Basket Analysis?

    Ans:Market Basket Analysis is one of the most common and useful techniques for data analysis for marketing. The purpose of market

    basket analysis is to determine what products customers purchase together in a market. This can be helpful to a retailer, merchant

    or manufacturer or any other type of organization interested in studying consumer buying patterns.

    67. What is Visual Data Mining?

    Ans:

    Visual data mining integrates data mining and data visualization in order to discover implicit and useful knowledge from large data

    sets.

    ------------------------------------------------------------------------------------------------------------

    I. Note: Answer all the questions.

    68. What is Normalization? What are the different forms of Normalization ?

    The usual approach in normalization in database applications is to ensure that the data is divided into two or more tables, such

    that when the data in one of them is updated, it does not lead to anamolies of data (The student is advised to refer any book on

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    7/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 7

    data base management systems for details, if interested).

    The idea is to ensure that when combined, the data available is consistent. However, in data warehousing, one may even tend

    to break the large table into several denormalized smaller tables. This may lead to lots of extra space being used. But it helps

    in an indirect way It avoids the overheads of joining the data during queries.

    69. Define Data warehouse. What are roles of education in a data warehousing delivery process?

    Data Warehouse:In its simplest form, a data ware house is a collection of key pieces of information used to manage and

    direct the business for the most profitable outcome. It would decide the amount of inventory to be held, the no. of employees

    to be hired, the amount to be procured on loan etc.,.

    The above definition may not be precise - but that is how data ware house systems are. There are different definitions given by

    different authors, but we have this idea in mind and proceed. It is a large collection of data and a set of process managers that

    use this data to make information available. The data can be meta data, facts, dimensions and aggregations. The process

    managers can be load managers, ware house managers or query managers. The information made available is such that they

    allow the end users to make informed decisions.

    Roles of education in a data warehousing delivery process:-

    This has two roles to play - one to make people, specially top level policy makers, comfortable with the concept. The second

    role is to aid the prototyping activity. To take care of the education concept, an initial (usually scaled down) prototype is

    created and people are encouraged to interact with it. This would help achieve both the activities listed above. The users

    became comfortable with the use of the system and the ware house developer becomes aware of the limitations of his

    prototype which can be improvised upon.

    70.What is process managers? What are the different types of process managers?

    Process Managers:These are responsible for the smooth flow, maintainance and upkeep of data into and out of the

    database.

    The main types of process managers are:-

    i). Load manager:to take case of source interaction, data transformation and data load.

    ii). Ware house manger:to take care of data movement, meta data management and performance

    monitoring.

    iii). Query manager:to control query scheduling and monitoring.

    We shall look into each of them briefly. Before that, we look at a schematic diagram that defines the boundaries of the three

    types of managers.

    71. Give the architectures of data mining systems.

    http://4.bp.blogspot.com/-Enb7FqPtHXc/UH_GGiTR5FI/AAAAAAAAJUg/c8y0rlSXtBQ/s1600/part-a+d.jpghttp://4.bp.blogspot.com/-W0RkmcDG9CE/UH_FwbNvpHI/AAAAAAAAJUY/mzS-PSRdhg0/s1600/part-a+c.jpg
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    8/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 8

    72.What are the guidelines for KDD environment ?

    It is customary in the computer industry to formulate rules of thumb that help information technology (IT) specialists to apply

    new developments. In setting up a reliable data mining environment we may follow the guidelines so that KDD system may

    work in a manner we desire.

    i).Support extremely large data sets

    ii).Support hybrid learning

    iii).Establish a data warehouse

    iv).Introduce data cleaning facilities

    v).Facilitate working with dynamic coding

    vi).Integrate with decision support system

    vii).Choose extendible architecture

    viii).Support heterogeneous databases

    ix).Introduce client/server architecture

    x).Introduce cache optimization

    PART - B

    II. Answer any FIVE full questions.

    73. With the help of a diagram explain architecture of data warehouse.

    The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of

    architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items

    depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of

    implementation.

    Before looking into the details of each of the managers we could get a broad idea about their functionality by mapping the

    processes that we studied in the previous chapter to the managers. The extracting and loading processes are taken care of by

    the load manager. The processes of cleanup and transformation of data as also of back up and archiving are the duties of the

    ware house manage, while the query manager, as the name implies is to take case of query management.

    74. Indicate the important function of a Load Manager, Warehouse Manager.

    Important function of Load Manager:

    i) To extract data from the source (s)

    ii) To load the data into a temporary storage device

    iii) To perform simple transformations to map it to the structures of the data ware house.

    Important function of Warehouse Manager:

    i) Analyze the data to confirm data consistency and data integrity .

    ii) Transform and merge the source data from the temporary data storage into the ware house.

    iii) Create indexes, cross references, partition views etc.,.

    iv) Check for normalizations.

    v) Generate new aggregations, if needed.

    vi) Update all existing aggregations

    vii) Create backups of data.viii) Archive the data that needs to be archived.

    75. Differentiate between vertical partitioning and horizontal partitioning.

    In horizontal partitioning, we simply the first few thousand entries in one partition, the second few thousand in the next and

    so on. This can be done by partitioning by time, where in all data pertaining to the first month / first year is put in the first

    partition, the second one in the second partition and so on. The other alternatives can be based on different sized dimensions,

    partitioning an other dimensions, petitioning on the size of the table and round robin partitions. Each of them have certain

    advantages as well as disadvantages.

    In vertical partitioning, some columns are stored in one partition and certain other columns of the same row in a different

    partition. This can again be achieved either by normalization or row splitting. We will look into their relative trade offs.

    76.What is schema? Distinguish between facts and dimensions.

    A schema, by definition, is a logical arrangements of facts that facilitate ease of storage and retrieval, as described by the end

    http://4.bp.blogspot.com/-QXykiWd2w7s/UH_HE-g2qlI/AAAAAAAAJUw/6v334-3Bk0g/s1600/part-b+i.jpg
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    9/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 9

    users. The end user is not bothered about the overall arrangements of the data or the fields in it. For example, a sales

    executive, trying to project the sales of a particular item is only interested in the sales details of that item where as a tax

    practitioner looking at the same data will be interested only in the amounts received by the company and the profits made.

    The star schema looks a good solution to the problem of ware housing. It simply states that one should identify the facts and

    store it in the read-only area and the dimensions surround the area. Whereas the dimensions are liable to change, the facts are

    not. But given a set of raw data from the sources, how does one identify the facts and the dimensions? It is not always easy,

    but the following steps can help in that direction.

    i)Look for the fundamental transactions in the entire business process. These basic entities

    are the facts.

    ii)Find out the important dimensions that apply to each of these facts. They are the candidates

    for dimension tables.

    iii)Ensure that facts do not include those candidates that are actually dimensions, with a set of

    facts attached to it.

    iv)Ensure that dimensions do not include these candidates that are actually facts.

    77. What is an event in data warehousing? List any five events.

    An event is defined as a measurable, observable occurrence of a defined action. If this definition is quite vague, it is because it

    encompasses a very large set of operations. The event manager is a software that continuously monitors the system for the

    occurrence of the event and then take any action that is suitable (Note that the event is a measurable and observable

    occurrence). The action to be taken is also normally specific to the event.

    A partial list of the common events that need to be monitored are as follows:

    i). Running out of memory space.

    ii). A process dying

    iii). A process using excessing resource

    iv). I/O errors

    v). Hardware failure

    78. What is summary table? Describe the aspects to be looked into while designing a summary table.

    The main purpose of using summary tables is to cut down the time taken to execute a specific query.

    The main methodology involves minimizing the volume of data being scanned each time the query is to be

    answered. In other words, partial answers to the query are already made available. For example, in the

    above cited example of mobile market, if one expects

    i)the citizens above 18 years of age

    ii)with salaries greater than 15,000 and

    iii)with professions that involve traveling are the potential customers, then, every time the query is to be processed (may be

    every month or every quarter), one will have to look at the entire data base to compute these values and then combine them

    suitably to get the relevant answers. The other method is to prepare summary tables, which have the values pertaining toe ach

    of these sub-queries, before hand, and then combine them as and when the query is raised.

    Summary table are designed by following the steps given below:

    i) Decide the dimensions along which aggregation is to be done.

    ii) Determine the aggregation of multiple facts.

    iii) Aggregate multiple facts into the summary table.

    iv) Determine the level of aggregation and the extent of embedding.

    v) Design time into the table.

    vi) Index the summary table.

    79 List the significant issues in automatic cluster detection.

    Most of the issues related to automatic cluster detection are connected to the kinds of questions we want to be answered in

    the data mining project, or data preparation for their successful application.

    i). Distance measure

    Most clustering techniques use for the distance measure the Euclidean distance formula (square root of the sum of the squares

    of distances along each attribute axes).

    Non-numeric variables must be transformed and scaled before the clustering can take place. Dependingon this transformations, the categorical variables may dominate clustering results or they may be even

    completely ignored.

    ii). Choice of the right number of clusters

    If the number of clusters k in the K-means method is not chosen so to match the natural structure of the data, the results will

    not be good. The proper way t alleviate this is to experiment with different values for k. In principle, the best k value will

    exhibit the smallest intra-cluster distances and largest inter-cluster distances.

    iii). Cluster interpretation

    Once the clusters are discovered they have to be interpreted in order to have some value for the data mining project.

    81.Define data marting. List the reasons for data marting.

    The data mart stores a subset of the data available in the ware house, so that one need not always have to scan through the

    entire content of the ware house. It is similar to a retail outlet. A data mart speeds up the queries, since the volume of data to

    be scanned is much less. It also helps to have tail or made processes for different access tools, imposing control strategies

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    10/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 10

    etc.,.

    Following are the reasons for which data marts are created:

    i)Since the volume of data scanned is small, they speed up the query processing.

    ii)Data can be structured in a form suitable for a user access too

    iii)Data can be segmented or partitioned so that they can be used on different platforms and also different control strategies

    become applicable.

    82. Explain how to categorize data mining system.

    There are many data mining systems available or being developed. Some are specialized systems dedicated to a given data

    source or are confined to limited data mining functionalities, other are more versatile and comprehensive. Data mining systems

    can be categorized according to various criteria among other classification are the following:

    a)Classification according to the type of data source mined: this classification categorizes data mining systems according to the

    type of data handled such as spatial data, multimedia data, time-series data, text data, World Wide Web, etc.

    b)Classification according to the data model drawn on: this classification categorizes data mining systems based on the data

    model involved such as relational database, object-oriented database, data warehouse, transactional, etc.

    c)Classification according to the king of knowledge discovered: this classification categorizes data mining systems based on

    the kind of knowledge discovered or data mining functionalities, such as characterization, discrimination, association,

    classification, clustering, etc. Some systems tend to be comprehensive systems offering several data mining functionalities

    together.

    d)Classification according to mining techniques used: Data mining systems employ and provide different techniques. This

    classification categorizes data mining systems according to the data analysis approach used such as machine learning, neural

    networks, genetic algorithms, statistics, visualization, database oriented or data warehouse-oriented, etc.

    83. List and explain different kind of data that can be mined.

    Different kind of data that can be mined are listed below:-

    i). Flat files:Flat files are actually the most common data source for data mining algorithms, especially at the research level.

    ii). Relational Databases:A relational database consists of a set of tables containing either values of entity attributes, or

    values of attributes from entity relationships.

    iii). Data Warehouses:A data warehouse as a storehouse, is a repository of data collected from multiple data sources (often

    heterogeneous) and is intended to be used as a whole under the same unified schema.

    iv). Multimedia Databases:Multimedia databases include video, images, audio and text media. They can be stored on

    extended object-relational or object-oriented databases, or simply on a file system.

    v). Spatial Databases:Spatial databases are databases that in addition to usual data, store geographical information like

    maps, and global or regional positioning.

    vi). Time-Series Databases:Time-series databases contain time related data such stock marketdata or logged activities.

    These databases usually have a continuous flow of new data coming in, which sometimes causes the need for a challenging real

    time analysis.

    vii). World Wide Web:The World Wide Web is the most heterogeneous and dynamic repository available. A very large

    number of authors and publishers are continuously contributing to its growth and metamorphosis and a massive number of

    users are accessing its resources daily.

    84. Give the syntax for task relevant data specification.

    Syntax for tax-relevant data specification:-

    The first step in defining a data mining task is the specification of the task-relevant data, that is, the data on which mining is to

    be performed. This involves specifying the database and tables or data warehouse containing the relevant data, conditions for

    selecting the relevant data, the relevant attributes or dimensions for exploration, and instructions regarding the ordering or

    grouping of the data retrieved. DMQL provides clauses for the clauses for the specification of such information, as follows:-

    i). use database (database_name) or use data warehouse (data_warehouse_name):The use clause directs the mining

    task to the database or data warehouse specified.

    ii). from (relation(s)/cube(s )) [where(condition)]:The from and where clauses respectively specify the database tables or

    data cubes involved, and the conditions defining the data to be retrieved.

    iii). in relevance to (attribute_or_dimension_list):This clause lists the attributes or dimensions for exploration.

    iv). order by (order_list):The order by clause specifies the sorting order of the task relevant data.v). group by (grouping_list):the group by clause specifies criteria for grouping the data.

    vi). having (conditions):The having cluase specifies the condition by which groups of data are considered relevant.

    85. Explain the designing of GUI based on data mining query language.

    A data mining query language provides necessary primitives that allow users to communicate with data mining systems. But

    novice users may find data mining query language difficult to use and the syntax difficult to remember. Instead , user may prefer

    to communicate with data mining systems through a graphical user interface (GUI). In relational database technology , SQL

    serves as a standard core language for relational systems , on top of which GUIs can easily be designed. Similarly, a data

    mining query language may serve as a core language for data mining system implementations, providing a basis for the

    development of GUI for effective data mining.

    A data mining GUI may consist of the following functional components:-

    a) Data collection and data mining query composition -This component allows the user to specify task-relevant data sets

    and to compose data mining queries. It is similar to GUIs used for the specification of relational queries.

    http://vslifehacker.blogspot.in/2012/10/ku-5th-sem-assignment-bsit-tb-53-data.html#
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    11/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 1

    b) Presentation of discovered patterns This component allows the display of the discovered patterns in various forms,

    including tables, graphs, charts, curves and other visualization techniques.

    c) Hierarchy specification and manipulation -This component allows for concept hierarchy specification , either manually

    by the user or automatically. In addition , this component should allow concept hierarchies to be modified by the user or

    adjusted automatically based on a given data set distribution.

    d) Manipulation of data mining primitives This component may allow the dynamic adjustment of data mining thresholds,

    as well as the selection, display and modification of concept hierarchies. It may also allow the modification of previous data

    mining queries or conditions.

    e) Interactive multilevel mining This component should allow roll-up or drill-down operations on discovered patterns.

    f) Other miscellaneous information This component may include on-line help manuals, indexed search , debugging and

    other interactive graphical facilities.

    86. Explain how decision trees are useful in data mining.

    Decision trees are powerful and popular tools for classification and prediction. The attractiveness of tree-based methods is due

    in large part to the fact that, it is simple and decision trees represent rules. Rules can readily be expressed so that we humans

    can understand them or in a database access language like SQL so that records falling into a particular category may be

    retrieved.

    87 Identify an application and also explain the techniques that can be incorporated in solving the problem using

    data mining techniques.

    Write yourself...

    88.Write a short notes on :

    i) Data Mining Querying Language

    ii) Schedule Manager

    iii) Data Formatting.

    i) Data Mining Querying Language

    A data mining language helps in effective knowledge discovery from the data mining systems. Designing

    a comprehensive data mining language is challenging because data mining covers a wide spectrum of

    tasks from data characterization to mining association rules, data classification and evolution analysis.

    Each task has different requirements. The design of an effective data mining query language requires a

    deep understanding of the power, limitation and underlying mechanism of the various kinds of data mining

    tasks.

    ii) Schedule manager

    The scheduling is the key for successful warehouse management. Almost all operations in the ware

    house need some type of scheduling. Every operating system will have its own scheduler and batch

    control mechanism. But these schedulers may not be capable of fully meeting the requirements of a data

    warehouse. Hence it is more desirable to have specially designed schedulers to manage the operations.

    iii) Data formatting

    Final data preparation step which represents syntactic modifications to the data that do not change its

    meaning, but are required by the particular modelling tool chosen for the DM task. These include:

    a). reordering of the attributes or records: some modelling tools require reordering of the attributes

    (or records) in the dataset: putting target attribute at the beginning or at the end, randomizing

    order of records (required by neural networks for example)

    b). changes related to the constraints of modelling tools: removing commas or tabs, special

    characters, trimming strings to maximum allowed number of characters, replacing special

    characters with allowed set of special characters.

    ---------------------------------------------------------------------------------------------------------------

    89.With neat diagram explain the main parts of the computer

    A Computer will have 3 basic main parts

    i). A central processing unit that does a ll the arithmetic and logical operations. This can be

    thought of as the heart of any computer and computers are identified by the type of CPU

    that they use.

    ii). The memory is supposed to hold the programs and data. All the computers that we came

    across these days are what are known as stored program computers. The programs are

    to be stored before hand in the memory and the CPU accesses these programs line by line

    and executes them.

    iii). The Input/output devices: These devices facilitate the interaction of the users with the computer.

    The input devices are used to send information to the computer, while the output devices

    accept the processed information form the computer and make it available to the user.

    Diagram:-

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    12/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 12

    91. Briefly explain the types of memories.

    There are two types of memories Primary memory, which is embedded in the computer

    and which is the main source of data to the computer and the secondary memory like floppy disks, CDs etc., which can be

    carried around and used in different computers. They cost much less than the primary memory, but the CPU can access data

    only from the primary memory. The main advantage of computer memories, both primary and secondary, is that they can store

    data indefinitely and accurately

    92. Describe the basic concept of databases.

    The Concept of Database :-

    We have seen in the previous section how data can be stored in computer. Such stored data becomes

    a database a collection of data. For example, if all the marks scored by all the students of a class are

    stored in the computer memory, it can be called a database. From such a database, we can answer

    questions like who has scored the highest marks? ; In which subject the maximum number of students

    have failed?; Which students are weak in more than one subject? etc. Of course, appropriate programs

    have to be written to do these computations. Also, as the database becomes too large and more and more

    data keeps getting included at different periods of time, there are several other problems about maintaining

    these data, which will not be dealt with here.

    Since handling of such databases has become one of the primary jobs of the computer in recent years,

    it becomes difficult for the average user to keep writing such programs. Hence, special languages

    called database query languages- have been deviced, which makes such programming easy, there languages

    help in getting specific queries answered easily.

    93. With example e xplain the different vie ws of a data.

    Data is normally stored in tabular form, unless storage in other formats becomes advantageous, we

    store data in what are technically called relations or in simple terms as tables.

    The views are Mainly 2 types .

    i). Simple View

    ii). Complex View

    Simple view:

    - It is created by selecting only one table. - It does not contains functions.

    - it can perform DML (SELECT,INSERT,UPDATE,DELETE,MERGE, CALL,LOCK TABLE) operations through

    simple view.

    Complex view :

    -It is created by selecting more than one table.

    -It can performs functions.

    -You can not perform always DML operations through

    94 Briefly explain the concept of normalization.

    Normalization is dealt with in several chapters of any books on database management systems. Here, we will take the simplest

    definition, which suffices our purpose namely any field should not have subfields.

    Again consider the following student table.

    Here under the field marks, we have 3 sub fields: marks for subject1, marks for subject2 and subject3.

    However, it is preferable split these subfields to regular fields as shown below

    http://2.bp.blogspot.com/-_F3Ch0496Tk/UEMs9coBJ2I/AAAAAAAAJC0/UBVo0cnKuoM/s1600/Diagram2.pnghttp://1.bp.blogspot.com/-wMBR-FLi8WI/UEMs8g8V4gI/AAAAAAAAJCs/bV3oijBR1PQ/s1600/Diagram1.pnghttp://2.bp.blogspot.com/-zabAYk26qRQ/UEMmvmBGdiI/AAAAAAAAJBw/L6cqD9dh80U/s1600/Diagram.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    13/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 13

    Quite often, the original table which comes with subfields will have to be modified suitable, by the

    process of normalization.

    95 Explain the concept of data ware house delivery process in detail.

    The concept of data ware house delivery process :-

    This section deals with the dataware house from a different view point - how the different components that go into it enable the

    building of a data ware house. The study helps us in two ways:

    i) to have a clear view of the data ware house building process.

    ii) to understand the working of the data ware house in the context of the components.

    Now we look at the concepts in details :-

    i). IT Strategy :The company must and should have an overall IT strategy and the data ware housing has to be a part of the

    overall strategy.

    ii). Busines s case analysis :This looks at an obvious thing, but is most often misunderstood. The overall understanding of the

    business and the importance of various components there in is a must. This will ensure that one can clearly justify the appropriate

    level of investment that goes into the data ware house design and also the amount of returns accruing.

    iii). Education : This has two roles to play - one to make people, specially top level policy makers, comfortable with

    the concept. The second role is to aid the prototyping activity.

    iv). Business Requirements :As has been discussed earlier, it is essential that the business requirements are fully

    understood by the data ware house planner. This would ensure that the ware house is incorporated adequately in the overall setup

    of the organization.

    v). Technical blue prints : This is the stage where the overall architecture that satisfies the requirements is delivered.

    vi). Building the vision :Here the first physical infrastructure becomes available. The major infrastructure components are

    set up, first stages of loading and generation of data start up.

    vii). History load :Here the system is made fully operational by loading the required history into the ware house - i.e. what

    ever data is available over the previous years is put into the dataware house to make is fully operational.

    viii). Adhoc Query :Now we configure a query tool to operate against the data ware house.

    ix). Automation :This phase automates the various operational processes like -

    a) Extracting and loading of data from the sources.

    b) Transforming the data into a suitable form for analysis.

    c) Backing up, restoration and archiving.

    d) Generate aggregations.

    e) Monitoring query profiles.

    x). Extending Scope :There is not single mechanism by which this can be achieved. As and when needed, a new set of

    data may be added, new formats may be included or may be even involve major changes.

    xi). Requirement Evolution : Business requirements will constantly change during the life of the ware house. Hence, the

    process that supports the ware house also needs to be constantly monitored and modified.

    96. What are three major activities of data ware house ? Explain.Three major activities of data ware house are :-

    i) Populating the ware house (i.e. inclusion of data)

    ii) day-to-day management of the ware house.

    iii) Ability to accommodate the changes.

    i). The processes to populate the ware house have to be able to extract the data, clean it up, and make it available to the

    analysis systems. This is done on a daily / weekly basis depending on the quantum of the data population to be incorporated.

    ii). The day to day management of data ware house is not to be confused with maintenance and management of hardware and

    software. When large amounts of data are stored and new data are being continually added at regular intervals, maintaince of the

    quality of data becomes an important element.

    iii). Ability to accommodate changes implies the system is structured in such a way as to be able to cope with future changes

    without the entire system being remodeled. Based on these, we can view the processes that a typical data ware house scheme

    should support as follows.

    http://1.bp.blogspot.com/-CXMqy6qqR4o/UENyBaoXROI/AAAAAAAAJDw/Pf3MncD0rqg/s1600/1.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    14/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 14

    97. Explain the extract and load process of data ware house.

    Extract and Load Process : This forms the first stage of data ware house. External physical systems like the sales counters

    which give the sales data, the inventory systems that give inventory levels etc. constantly feed data to the warehouse. Needless to

    say, the format of these external data is to be monitored and modified before loading it into the ware house. The data ware house

    must extract the data from the source systems, load them into their data bases, remove unwanted fields (either because they are

    not needed or because they are already there in the data base), adding new fields / reference data and finally reconciling with the

    other data. We shall see a few more details of theses broad actions in the subsequent paragraphs.

    i).A mechanism should be evolved to control the extraction of data, check their consistency

    etc. For example, in some systems, the data is not authenticated until it is audited.

    ii).Having a set of consistent data is equally important. This especially happens when we are

    having several online systems feeding the data.iii).Once data is extracted from the source systems, it is loaded into a temporary data storage

    before it is Cleaned and loaded into the warehouse.

    98. In what ways data needs to be cleaned up and checked? Explain briefly.

    Data needs to be cleaned up and checked in the following ways :-

    i) It should be consistent with itself.

    ii) It should be consistent with other data from the same source.

    iii) It should be consistent with other data from other sources.

    iv) It should be consistent with the information already available in the data ware house.

    While it is easy to list act the needs of a clean data, it is more difficult to set up systems that

    automatically cleanup the data. The normal course is to suspect the quality of data, if it does not meet the normally standards of

    commonsense or it contradicts with the data from other sources, data already available in the data ware house etc. Normal

    intution doubts the validity of the new data and effective measures like rechecking, retransmission etc., are undertaken. Whennone of these are possible, one may even resort to ignoring the entire set of data and get on with next set of incoming data.

    99. Explain the architecture of data warehouse .

    The architecture for a data ware is indicated below. Before we proceed further, we should be clear about the concept of

    architecture. It only gives the major items that make up a data ware house. The size and complexity of each of these items

    depend on the actual size of the ware house itself, the specific requirements of the ware house and the actual details of

    implementation.

    100. Briefly explain the functions of each manager of data warehouse .

    The Warehouse Manager : The ware house manager is a component that performs all operations necessary to support the

    ware house management process. Unlike the load manager, the warehouse management process is driven by the extent to which

    the operational management of the data ware house has been automated.

    The ware house manger can be easily termed to be the most complex of the ware house components, and performs a variety of

    tasks. A few of them can be listed below.

    i) Analyze the data to confirm data consistency and data integrity.

    http://4.bp.blogspot.com/-763iLWOBeJI/UEN0YkpCUhI/AAAAAAAAJEA/EfpfpaolPCc/s1600/3.pnghttp://4.bp.blogspot.com/-rNXevEkrySQ/UENz8CeI6aI/AAAAAAAAJD4/BVqjIwNlSBE/s1600/2.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    15/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 15

    ii) Transform and merge the source data from the temporary data storage into the ware house.

    iii) Create indexes, cross references, partition views etc.,.

    iv) Check for normalizations.

    v) Generate new aggregations, if needed.

    vi) Update all existing aggregations

    vii) Create backups of data.

    viii) Archive the data that needs to be archived.

    101. Explain the star schema to represent the sales analysis.

    Star schemes are data base schemas that structure the data to exploit a typical decision support

    enquiry. When the components of typical enquirys a re examined, a few similarities stand out. i) The queries examine a set of factual transactions - sales for example.

    ii) The queries analyze the facts in different ways - by aggregating them on different bases /

    graphing them in different ways.

    The central concept of most such transactions is a fact table. The surrounding references are called dimension tables. The

    combination can be called a star schema.

    102. What do you mean by partition of data? Explain briefly.

    Partitioning of data :-

    In most ware houses, the size of the fact data tables tends to become very large. This leads to several problems of management,

    backup, processing etc. These difficulties can be over come by partitioning each fact table into separate partitions.

    Data ware houses tend to exploit these ideas by partitioning the large volume of data into data sets. For example, data can be

    partitioned on weekly / monthly basis, so as the minimize the amount of data scanned before answering a query. This technique

    allows data to be scanned to be minimized, without the overhead of using an index. This improves the overall efficiency of the

    system. However, having too many partitions can be counter productive and an optimal size of the partitions and the number of

    such partitions is of vital importance.

    Participating generally helps in the following ways.

    i) Assists in better management of data

    ii) Ease of backup / recovery since the volumes are less.

    iii) The star schemes with partitions produce better performance.

    iv) Since several hardware architectures opera te better in a partitioned environment, the overall

    system performance improve.

    103. Describe the terms data mart and Meta data.

    Data mart :-

    A data mart is a subset of information content of a data ware house, stored in its own data base. The data of a data ware house

    may have been collected through a ware house or in some cases, directly from the source. In a crude sense, if you consider a

    data ware house as a whole sale shop of data, a data mart can be thought of as a retailer.

    Me ta data :-

    Meta data is simply data about data. Data normally describe the objects, their

    quantity, size, how they are stored etc. Similarly meta data stores data about how data (of objects) is stored, etc.

    Meta data is useful in a number of ways. It can map data sources to the common view of information within the warehouse. It is

    helpful in query management, to direct query to most appropriate source etc.,.

    The structure of meta data is different for each process. It means for each volume of data, there are multiple sets of meta data

    describing the same volume. While this is a very convenient way of managing data, managing meta data itself is not a very easy

    http://2.bp.blogspot.com/-9m44Hm9SMGc/UEN1UpCp6JI/AAAAAAAAJEI/AC1G-KTqxAg/s1600/4.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    16/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 16

    task.

    104. Enlist the differences between fact and dimension.

    This ensures that key dimensions are no fact tables.

    Consider the following example :-

    Let us elaborate a little on the example. Consider a customer A. If there is a situation, where the

    warehouse is building the profiles of customer, then A becomes a fact - against the name A, we can list his address, purchases,

    debts etc. One can ask questions like how many purchases has A made in the last 3 months etc. Then A is fact. On the other

    hand, if it is likely to be used to answer questions like how many customers have made more than 10 purchases in the last 6

    months, and one uses the data of A, as well as of other customers to give the answer, then it becomes a fact table. The rule is,

    in such cases, avoid making A as a candidate key.

    105. Explain the designing of star-flake schema in detail.

    A star flake schema, as we have defined previously, is a schema that uses a combination of denormalised star and normalized

    snow flake schemas. They are most appropriate in decision support data ware houses. Generally, the detailed transactions are

    stored within a central fact table, which may be partitioned horizontally or vertically. A series of combinatory data base views are

    created to allow the user to access tools to treat the fact table partitions as a single, large table.

    The key reference data is structured into a set of dimensions. Theses can be referenced from the fact table. Each dimension is

    stored in a series of normalized tables (snow flakes), with an additional denormalised star dimension table.

    106. What is query redirection? Ex plain.

    Query Redirection :-

    One of the basic requirements for successful operation of star flake schema (or any schema, for that matter) is the ability to

    direct a query to the most appropriate source. Note that once the available data grows beyond a certain size, partitioning becomes

    essential. In such a scenario, it is essential that, in order to optimize the time spent on querying, the queries should be directed to

    the appropriate partitions that store the date required by the query.

    The basic method is to design the access tool in such away that it automatically defines the locality to which the query is to be

    redirected.

    107. In detail, explain the multidimensional schema.

    Multidimensional schemas :-

    Before we close, we see the interesting concept of multi dimensions. This is a very convenient

    method of analyzing data, when it goes beyond the normal tabular relations.

    For example, a store maintains a table of each item it sells over a month as a table, in each of its 10 outlets..

    http://1.bp.blogspot.com/-AYfTwEqDI3w/UETjp82pSoI/AAAAAAAAJFE/1IVDuIX2mSo/s1600/5.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    17/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 17

    This is a 2 dimensional table. One the other hand, if the company wants a data of all items sold by its outlets, it can be done by

    simply by superimposing the 2 dimensional table for each of these items one behind the other. Then it becomes a 3 dimensional

    view.

    Then the query, instead of looking for a 2 dimensional rectangle of data, will look for a 3 dimensional cuboid of data.

    There is no reason why the dimensioning should stop at 3 dimensions. In fact almost all queries can be thought of as approaching

    a multi-dimensioned unit of data from a multidimensioned volume of the schema.

    108. Why partitioning is nee ded in large data warehouse ?

    Partitioning is needed in any large data ware house to ensure that the performance and manageability is improved. It can help the

    query redirection to send the queries to the appropriate partition, thereby reducing the overall time taken for query processing.

    109. Explain the types of partitioning in detail.

    i). Horizontal partitioning :-

    This is essentially means that the table is partitioned after the first few thousand entries, and the next

    few thousand entries etc. This is because in most cases, not all the information in the fact table needed all the time. Thus

    horizontal partitioning helps to reduce the query access time, by directly cutting down the amount of data to be scanned by the

    queries.

    ii). Vertical partitioning :-

    As the name suggests, a vertical partitioning scheme divides the table vertically i.e. each row is

    divided into 2 or more partitions.

    iii). Hardware partitioning :-

    Needless to say, the dataware design process should try to maximize the performance of the system. One of the ways to ensure

    this is to try to optimize by designing the data base with respect to specific hardware architecture.

    110. Explain the mechanism of row splitting.

    Row Splitting :-

    The method involved identifying the not so frequently used fields and putting them into another table.

    This would ensure that the frequently used fields can be accessed more often, at much lesser computation time.

    It can be noted that row splitting may not reduce or increase the overall storage needed, but normalization may involve a change

    in the overall storage space needed. In row splitting, the mapping is 1 to 1 whereas normalization may produce one to many

    relationships.

    http://4.bp.blogspot.com/-iOjo0gUXrTg/UETkoKG63mI/AAAAAAAAJFM/0DGuOeriuHE/s1600/6.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    18/21

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    19/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 19

    i). The cost of se tting up and operating data marts is quite high.

    ii). Once a data strategy is put in place, the datamart formats become fixed. It may be fairly difficult to change the strategy later,

    because the data marts formats also have to be changes.

    117. What is role of access control issue in data mart design?

    Role of access control issue in data mart design :-

    This is one of the major constraints in data mart designs. Any data warehouse, with its huge volume

    of data is, more often than not, subject to various access controls as to who could access which part of data. The easiest case is

    where the data is partitioned so clearly that a user of each partition cannot access any other data. In such cases, each of these

    can be put in a data mart and the user of each can access only his data .

    In the data ware house, the data pertaining to all these marts are stored, but the partitioning are retained. If a super user wants toget an overall view of the data, suitable aggregations can be generated.

    118. Explain the purpose of using metadata in detail.

    Metadata will be used for the following purposes :-

    i). data transformation and loading.

    ii). data management.

    iii). query generation.

    119. Explain the concept of metadata management.

    Meta data should be able to describe data as it resides in the data warehouse. This will help the warehouse manager to control

    data movements. The purpose of the metadata is to describe the objects in the database. Some of the descriptions are listed

    here.

    Tables

    - Columns

    * Names

    * Types

    Indexes

    - Columns

    * Name

    * Type

    Views

    - Columns

    * Name

    * Type

    Constraints

    - Name

    - Type

    - Table

    * Columns

    120. How the query manager uses the Meta data? Explain in detail.

    Meta data is also required to generate queries. The query manger uses the metadata to build a history of all queries run and

    generator a query profile for each user, or group of uses.

    We simply list a few of the commonly used meta data for the query. The names are self explanatory.

    o Query

    o Table accessed

    Column accessed

    Name

    Reference identifier

    o Restrictions applied

    o Column name o Table name

    o Reference identifier

    o Restrictions

    o Join criteria applied

    o Column name

    o Table name

    o Reference identifier

    o Column name

    o Table name

    o Reference identifier

    o Aggregate function used

    o Column name

  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    20/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.html#more 20

    o Reference identifier

    o Aggregate function

    o Group by criteria

    o Column name

    o Reference identifier

    o Sort direction

    o Syntax

    o Resources

    o Disk

    o Read

    o Write o Temporary

    121. Why we need different managers to a data ware house? Explain.

    Need for managers to a data ware house :-

    Data warehouses are not just large databases. They are complex environments that integrate many

    technologies. They are not static, but will be continuously changing both contentwise and structurewise. Thus, there is a constant

    need for maintenance and management. Since huge amounts of time, money and efforts are involved in the development of data

    warehouses, sophisticated management tools are always justified in the case of data warehouses.

    When the computer systems were in their initial stages of development, there used to be an army of

    human managers, who went around doing all the administration and management. But such a scheme became both unvieldy and

    prone to errors as the systems grew in size and complexity. Further most of the management principles were adhoc in nature and

    were subject to human errors and fatigue.

    122. With neat diagram explain the boundaries of process managers.

    A schematic diagram that defines the boundaries of the three types of managers :-

    123. Explain the responsibilities of each manager of data ware house.

    Ware house Manager :-

    The warehouse manager is responsible for maintaining data of the ware house. It should also create

    and maintain a layer of meta data. Some of the responsibilities of the ware house manager are

    o Data movement

    o Meta data management

    o Performance monitoring

    o Archiving.

    Data movement includes the transfer of data within the ware house, aggregation, creation and

    maintenance of tables, indexes and other objects of importance. It should be able to create new aggregations as well as

    remove the old ones. Creation of additional rows / columns, keeping track of the aggregation processes and creating meta data

    are also its functions.

    124. What are the different system management tools used for data warehouse?

    The different system management tools used for data warehouse :-

    i). Configuration managers

    ii). schedule managers

    iii). event managers

    iv). database mangers

    v). back up recovery managers

    vi). resource and performance a monitors.

    -----------------------------------------------------------------------------------------------------

    http://1.bp.blogspot.com/-YsBRNO63x08/UEVxLMBzvuI/AAAAAAAAJHM/CJc6fmO3du8/s1600/pic.png
  • 8/10/2019 NIIT_ Data Warehousing & Data Mining Code 53

    21/21

    8/21/2014 NIIT: Data Warehousing & Data Mining Code 53

    Newer Post Older PostHome

    Subscribe to: Post Comments (Atom)

    Posted by Satyendra Sharma at 22:19

    Labels: solved Answ ers Bsc.IT -53

    Recommend this on Google

    Enter your comment...

    Comment as: Google Accou

    Publish

    Preview

    No comments:

    Post a Comment

    Subscribe To

    Posts

    Comments

    Satyendra Sharma

    10have me in circles

    Add to circles

    Google+ Followers

    Copyright 2013 Created By - Satyendra Sharma. Simple template. Powered by Blogger.

    Assignments(3) Cracking Password(2) Downloads(2) ibps(2) Icons Pack(2) PRACTICA

    Labels

    http://sslifehacker.blogspot.in/search/label/PRACTICALhttp://sslifehacker.blogspot.in/search/label/Icons%20Packhttp://sslifehacker.blogspot.in/search/label/ibpshttp://sslifehacker.blogspot.in/search/label/Downloadshttp://sslifehacker.blogspot.in/search/label/Cracking%20Passwordhttp://sslifehacker.blogspot.in/search/label/Assignmentshttp://www.blogger.com/https://plus.google.com/105262405286077301848https://plus.google.com/102066855548888917959https://plus.google.com/111199550385521280450https://plus.google.com/109931362660505788545https://plus.google.com/112354499610531326094https://plus.google.com/114700096026733923382https://plus.google.com/105221740964368678996https://plus.google.com/104293862079951853374https://plus.google.com/+PreetyUzlainsarkariNaukrihttps://plus.google.com/109931362660505788545http://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=pinteresthttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=facebookhttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=twitterhttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=bloghttp://www.blogger.com/share-post.g?blogID=6920930351978174734&postID=6475124884203484237&target=emailhttp://sslifehacker.blogspot.in/search/label/solved%20Answers%20Bsc.IT%20-53http://www.blogger.com/email-post.g?blogID=6920930351978174734&postID=6475124884203484237http://sslifehacker.blogspot.in/2013/05/data-warehousing-data-mining-code-53.htmlhttps://plus.google.com/109931362660505788545http://sslifehacker.blogspot.com/feeds/6475124884203484237/comments/defaulthttp://sslifehacker.blogspot.in/http://sslifehacker.blogspot.in/2013/05/web-programming-bscit-code-52.htmlhttp://sslifehacker.blogspot.in/2013/05/software-quality-testing-code-54.html