CHALAPATHI INSTITUTE OF ENGINEERING & TECHNOLOGY lab manual/cs451.pdfDATA ENGINEERING LABORATORY 4...

DATA ENGINEERING LABORATORY

1 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CHALAPATHI INSTITUTE OF ENGINEERING & TECHNOLOGY

CHALAPATHI NAGAR, LAM, GUNTUR-522034 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

Vision of the Institute To emerge as an Institute of Excellence for Engineering and Technology and provide world-class education and research opportunities to the students catering the needs of society. Mission of the Institute Establishing a state-of-the-art Engineering Institute with continuously improving infrastructure and produce students with innovative skills and global outlook. Department Vision To produce professionally competent, research oriented and socially sensitive engineers and technocrats in the emerging technologies. Department Mission DM 1: State of art laboratories to meet the needs of the continuous change. DM2: Provide a research environment to meet the societal issues. DM3: Facilitating collaborations/MOU’S towards emerging technologies.




CHALAPATHI NAGAR LAM,GUNTUR

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PEO’s:

PEO1: Graduates shall excel in computer industry profession or higher studies through quality education.

PEO2: Graduates shall analyse real life problems, design computing systems appropriate to its solutions by exhibiting professionalism and team work. PEO3: Graduates shall demonstrate their ability to a rapidly changing environment by engaging in lifelong learning.

PO’s:

1. ENGINEERING KNOWLEDGE: Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems.

2. PROBLEM ANALYSIS: Identify, formulate, research literature, and analyze complex engineering problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences.

3. DESIGN/DEVELOPMENT OF SOLUTIONS: Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and environmental considerations.

4. CONDUCT INVESTIGATIONS OF COMPLEX PROBLEMS: Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions.

5. MODERN TOOL USAGE: Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modelling to complex engineering activities with an understanding of the limitations.

6. THE ENGINEER AND SOCIETY: Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice.



7. ENVIRONMENT AND SUSTAINABILITY: Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development.

8. ETHICS: Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice.

9. INDIVIDUAL AND TEAM WORK: Function effectively as an individual, and as a member or leader in diverse teams, and in multidisciplinary settings.

10. COMMUNICATION: Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, give and receive clear instructions.

11. PROJECT MANAGEMENT AND FINANCE: Demonstrate knowledge and understanding of the engineering and management principles and apply these to one’s own work, as a member and leader in a team, to manage projects and in multidisciplinary environments.

12. LIFE-LONG LEARNING: Recognize the need for, and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change.

PSO’s :

PSO1: Professional Skills: The ability to understand, analyze and develop computer programs in the areas related to algorithms, system software, multimedia, web design, big data analytics, and networking for efficient design of computer-based systems of varying complexity.

PSO2: Problem-Solving Skills: The ability to apply standard practices and strategies in software project development using open-ended programming environments to deliver a quality product for business success.

PSO3: Successful Career and Entrepreneurship: The ability to employ modern computer languages, environments, and platforms in creating innovative career paths to be an entrepreneur, and a zest for higher studies.




CHALAPATHI NAGAR, LAM, GUNTUR-522034 DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

DATA ENGINEERING LAB OBJECTIVES:

1. Practical exposure on implementation of well known data mining tasks.

2. Exposure to real life data sets for analysis and prediction.

3. Learning performance evaluation of data mining algorithms in a supervised and an unsupervised setting.

4. Handling a small data mining project for a given practical domain.

DATA ENGINEERING LAB OUTCOMES:

1. The data mining process and important issues around data cleaning, pre-processing and integration.

2. The principle algorithms and techniques used in data mining, such as clustering, association mining, classification and prediction

3. Demonstrate understanding of the functionality of the various web mining and web search components and appreciate the strengths and limitations of various web mining and web search models.

4. Able to use the tools and techniques employed in data mining for different application domains.

5. Describe different types of research and understand alternative research paradigms.

CO’S PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 C451.1 3 3 2 - - - - - - - - - 1 1 C451.2 3 3 2 - - - - - - - - - - - -

C451.3 3 3 1 - - - - - - - - - - - -

C451.4 3 3 - - - - - - - - - - - - 1 C451.5 2 3 - - - - - - - - - - 1 - C452 2.8 3 1.6 - - - - - - - - - 1 - 1



SYLLABUS AS PER UNIVERSITY:

Expt. No Experiment Name

1 Rollup And Cube Operations On The Following Tables

2 Cube Slicing- Come With 2-D View Data

3 DDrriillll--ddoowwnn oorr RRoollll--ddoowwnn ggooiinngg ffrroomm ssuummmmaarryy ttoo mmoorree ddeettaaiilleedd ddaattaa

4 RRoolllluupp -- ssuummmmaarriizzee ddaattaa aalloonngg aa ddiimmeennssiioonn hhiieerraarrcchhyy

5 DDiicciinngg –– pprroojjeecctt 22--DD vviieeww ooff ddaattaa

6 Creating Star Schema and Snowflake Schema

7 Creating Fact Table

ADDITIONAL PROGRAMS

1 Write a Program to implement Apriori algorithm using WEKA

2 Write a program to implement FP Growth using WEKA

3 Write a program to implement DECISION TREE using WEKA



Expt. No

Experiment Name CO’s attained

Po’s attained

1 Rollup And Cube Operations On The Following Tables

CS461.1, CS461.2 PO3,PO4 2 Cube Slicing- Come With 2-D View Data CS461.1, CS461.2 PO3,PO4 3 DDrriillll--ddoowwnn oorr RRoollll--ddoowwnn ggooiinngg ffrroomm ssuummmmaarryy ttoo

mmoorree ddeettaaiilleedd ddaattaa CS461.1, CS461.2 PO3,PO4

4 RRoolllluupp -- ssuummmmaarriizzee ddaattaa aalloonngg aa ddiimmeennssiioonn hhiieerraarrcchhyy CS461.1, CS461.2 PO3,PO4 5 DDiicciinngg –– pprroojjeecctt 22--DD vviieeww ooff ddaattaa

CS461.1, CS461.2 PO3,PO4 6 Creating Star Schema and Snowflake Schema CS461.4 PO3,PO4 7 Creating Fact Table CS4613, CS4614 PO3,PO4 ADDITIONAL PROGRAMS

1 Write a Program to implement Apriori algorithm using WEKA

CS4613, CS4615 PO4,PO5 2 Write a program to implement FP Growth using

WEKA CS4613, CS4615 PO4,PO5

3 Write a program to implement DECISION TREE using WEKA

CS4613, CS4615 PO4,PO5



INDEX

S.NO CONTENT Page no I VISION AND MISSION 4 II COURSE OBJECTIVES AND OUTCOMES 5 III SYLLABUS 8 1 ROLLUP AND CUBE OPERATIONS ON THE FOLLOWING

TABLES 9 2 CUBE SLICING- COME WITH 2-D VIEW DATA 10 3 DDRRIILLLL--DDOOWWNN OORR RROOLLLL--DDOOWWNN GGOOIINNGG FFRROOMM SSUUMMMMAARRYY TTOO

MMOORREE DDEETTAAIILLEEDD DDAATTAA 11 4 RROOLLLLUUPP -- SSUUMMMMAARRIIZZEE DDAATTAA AALLOONNGG AA DDIIMMEENNSSIIOONN

HHIIEERRAARRCCHHYY 12 5 DDIICCIINNGG –– PPRROOJJEECCTT 22--DD VVIIEEWW OOFF DDAATTAA

13 6 CREATING STAR SCHEMA AND SNOWFLAKE SCHEMA 14 7 CREATING FACT TABLE 21 ADDITIONAL PROGRAMS

1 WRITE A PROGRAM TO IMPLEMENT APRIORI ALGORITHM USING WEKA 22

2 WRITE A PROGRAM TO IMPLEMENT FP GROWTH USING WEKA 23

3 WRITE A PROGRAM TO IMPLEMENT DECISION TREE USING WEKA 24



EXPERIMENT 1 : AIM: Implement Cube operations ROLLUP AND CUBE OPERATIONS ON THE FOLLOWING TABLES ALGORITHM: STEP1: 1. CREATE A TABLE 2. STORE THE DATA IN THE TABLESTEP STEP 2: WRITE A CONTROL FILE NAME WITH.CTL STEP 3: CREATE & WRITE FILANAME.CSV WITH THE DATA STEP4: AT DOS COMAND PROMPT EXECUTE THE COMMAND STEP5: AT SQL COMAND PROMPT EXECUTE THE COMMAND



EXPERIMENT 2: CUBE SLICING- COME WITH 2-D VIEW DATA

AIM: Implement Cube operation

Slice A slice is a subset of a multidimensional array corresponding to a single Value for one or more members of the dimensions not in the subset. To develop method of constraining the space requirements of the dynamic data cube of the full data cube size by deleting unnecessary data.

ALGORITHM: STEP1: SELECT ITEM_TUPE,PURCHASE_DATE,SOLD_QTY STEP2: SELECT BRANCH_CITY,PURCHASE_DATE,SOLD_QTY STEP3: EEXXCCEECCUUTTEE TTHHEE QQUUEERRYY IINN SSQQLL CCOOMMMMAANNDD GGIIVVEENN BBEELLOOWW



EExxppeerriimmeenntt33:: DDrriillll--ddoowwnn oorr RRoollll--ddoowwnn ggooiinngg ffrroomm ssuummmmaarryy ttoo mmoorree ddeettaaiilleedd ddaattaa AIM: Implement Cube operations AALLGGOORRIITTHHMM:: SSTTEEPP 11.. CCrreeaattee TThhee TTaabblleess wwiitthh IITTEEMM,,BBRRAANNCCHH,,CCIITTYY,,PPUURRCCHHAASSEESS,,WWOORRKKSS__AATT,,SSOOLLDD__QQTTYY SSTTEEPP 22.. MMEENNTTIIOONN IIDD==IITTEEMMSS__SSOOLLDD IITTEEMM__IIDD,,BBRRAANNCCHH__IIDD==WWOORRKKSS__AATT.. SSTTEEPP 33.. TTRRAANNSS__IIDD==IITTEEMMSS__SSOOLL,,TTRRAANNSS__IIDD SSTTEEPP 44..EEXXCCEECCUUTTEE TTHHEE QQUUEERRYY IINN SSQQLL CCOOMMMMAANNDD GGIIVVEENN BBEELLOOWW



EExxppeerriimmeenntt44:: RRoolllluupp -- ssuummmmaarriizzee ddaattaa aalloonngg aa ddiimmeennssiioonn hhiieerraarrcchhyy AIM: Implement Cube operations AALLGGOORRIITTHHMM:: SSTTEEPP 11.. SSEELLEECCTT IITTEEMM,,SSOOLLDD__QQTTYY FFRROOMM IITTEEMMSSCCRREEAATTEEDD SSTTEEPP 22.. SSEELLEECCTT PPUURRCCHHAASSEE__DDAATTEE,,OOLLDD__QQTTYY FFRROOMM PPUURRCCHHAASSEESS SSTTEEPP 33:: SSEELLEECCTT PPUURRCCHHAASSEE__CCUUSSTT__IIDD,,FFRROOMM PPUURRCCHHAASSEESS SSTTEEPP 44:: SSEELLEECCTT SSUUMM IITTEEMMSS__QQTTYY FFRROOMM IITTEEMMSS__SSOOLLDD.. SSTTEEPP 55:: EEXXCCEECCUUTTEE TTHHEE QQUUEERRYY IINN SSQQLL CCOOMMMMAANNDD GGIIVVEENN BBEELLOOWW



EExxppeerriimmeenntt55:: DDiicciinngg –– pprroojjeecctt 22--DD vviieeww ooff ddaattaa AIM: Implement Cube operations ALGORITHM: STEP1: SELECT ITEM_TUPE,PURCHASE_DATE,SOLD_QTY STEP2: SELECT BRANCH_CITY,PURCHASE_DATE,SOLD_QTY STEP3: EEXXCCEECCUUTTEE TTHHEE QQUUEERRYY IINN SSQQLL CCOOMMMMAANNDD GGIIVVEENN BBEELLOOWW



Experiment 6: Creating Star Schema and Snowflake Schema. AIM: Implementation of Fact Table Theory: Schema Modeling Techniques: Schemas in Data Warehouses 1. Third Normal Form 2. Star Schemas 3. Optimizing Star Queries A schema is a collection of database objects, including tables, views, indexes, and synonyms Star Schemas The star schema is the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them. A typical fact table contains keys and measures. For example, in the sh sample schema, the fact table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions. The product dimension table, for example, contains information about each product number that appears in the fact table. Implementation of k-means algorithm using ‘c’. A star join is a primary key to foreign key join of the dimension tables to a fact table. The main advantages of star schemas are that they:



1. Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. 2. Provide highly optimized performance for typical star queries. 3. Are widely supported by a large number of business intelligence tools, Experiment 6: Creating Star Schema and Snowflake Schema. 4. Theory: 5. Schema Modeling Techniques: 6. Schemas in Data Warehouses 7. Third Normal Form 8. Star Schemas 9. Optimizing Star Queries 10. A schema is a collection of database objects, including tables, views, indexes, and synonyms 11. Star Schemas 12. The star schema is the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. 13. A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table. 14. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them. 15. A typical fact table contains keys and measures. For example, in the sh sample schema, the fact table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions. The product dimension table, for example, contains information about each product number that appears in the fact table. 16. Implementation of k-means algorithm using ‘c’. 17. A star join is a primary key to foreign key join of the dimension tables to a fact table. 18. The main advantages of star schemas are that they: 19. Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. 20. Provide highly optimized performance for typical star queries. 21. Experiment 6: Creating Star Schema and Snowflake Schema.



22. Theory: 23. Schema Modeling Techniques: 24. Schemas in Data Warehouses 25. Third Normal Form 26. Star Schemas 27. Optimizing Star Queries 28. A schema is a collection of database objects, including tables, views, indexes, and synonyms 29. Star Schemas 30. The star schema is the simplest data warehouse schema. It is called a star schema because the entity-relationship diagram of this schema resembles a star, with points radiating from a central table. The center of the star consists of a large fact table and the points of the star are the dimension tables. 31. A star schema is characterized by one or more very large fact tables that contain the primary information in the data warehouse, and a number of much smaller dimension tables (or lookup tables), each of which contains information about the entries for a particular attribute in the fact table. 32. 33. A star query is a join between a fact table and a number of dimension tables. Each dimension table is joined to the fact table using a primary key to foreign key join, but the dimension tables are not joined to each other. The cost-based optimizer recognizes star queries and generates efficient execution plans for them. 34. A typical fact table contains keys and measures. For example, in the sh sample schema, the fact table, sales, contain the measures quantity_sold, amount, and cost, and the keys cust_id, time_id, prod_id, channel_id, and promo_id. The dimension tables are customers, times, products, channels, and promotions. The product dimension table, for example, contains information about each product number that appears in the fact table. 35. Implementation of k-means algorithm using ‘c’. 36. A star join is a primary key to foreign key join of the dimension tables to a fact table. 37. The main advantages of star schemas are that they: 38. Provide a direct and intuitive mapping between the business entities being analyzed by end users and the schema design. 39. Provide highly optimized performance for typical star queries. 40. Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables 41. Star schemas are used for both simple data marts and very large data warehouses. 42. Figure: presents a graphical representation of a star schema.



43. Snowflake Schemas 44. The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. 45. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. 46. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure presents a graphical representation of a snowflake schema. 47. Figure: Snowflake Schema

48. Note:



49. Oracle Corporation recommends you choose a star schema over a snowflake schema unless you have a clear reason not to 50. Are widely supported by a large number of business intelligence tools, which may anticipate or even require that the data-warehouse schema contain dimension tables 51. Star schemas are used for both simple data marts and very large data warehouses. 52. Figure: presents a graphical representation of a star schema. 53. Figure: Star Schema

54. Snowflake Schemas 55. The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. 56. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. 57. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure presents a graphical representation of a snowflake schema. 58. Figure: Snowflake Schema



59. Note: 60. Oracle Corporation recommends you choose a star schema over a snowflake schema unless you have a clear reason not to 61. which may anticipate or even require that the data-warehouse schema contain dimension tables 62. Star schemas are used for both simple data marts and very large data warehouses. Figure: presents a graphical representation of a star schema. Figure: Star Schema

Snowflake Schemas The snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake. Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table.



For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure presents a graphical representation of a snowflake schema. Figure: Snowflake schema



Experiment 7: Creating Fact Table. AIM: Implementation of Fact Table Theory: Fact Tables A fact table typically has two types of columns: those that contain numeric facts (often called measurements), and those that are foreign keys to dimension tables. A fact table contains either detail-level facts or facts that have been aggregated. Fact tables that contain aggregated facts are often called summary tables. A fact table usually contains facts with the same level of aggregation. Though most facts are additive, they can also be semi-additive or non-additive. Additive facts can be aggregated by simple arithmetical addition. A common example of this is sales. Non-additive facts cannot be added at all. An example of this is averages. Semi-additive facts can be aggregated along some of the dimensions and not along others. An example of this is inventory levels, where you cannot tell what a level means simply by looking at it. Creating a New Fact Table You must define a fact table for each star schema. From a modeling standpoint, the primary key of the fact table is usually a composite key that is made up of all of its foreign keys.



Figure is a common example of a sales fact table and dimension tables customers, products, promotions, times, and channels Experiment-1: Write a Program to implement Apriori algorithm using WEKA Procedure: Step1:- Choose WEKA Icon then WEKA GUI chooser will appear then choose and load dataset into WEKA. Step2:- Choose Associate Step3:- select Apriori from “choose” then click start Step4:- output can be viewed in Associator output frame



Experiment -2:Write a program to implement FP Growth using WEKA Procedure: Step1:- Choose WEKA Icon then WEKA GUI chooser will appear then choose and load dataset into WEKA Step2:-select associate tab Step3:-Click “choose”. Select FPGrowth in associations. Step4:-click Start for output



Experiment-3: Write a program to implement DECISION TREE using WEKA. Procedure: Step1:- Choose WEKA Icon then WEKA GUI chooser will appear then choose and load dataset into WEKA. Step2:-Choose classify Step3:- Select use training set in test options Step4:- Select choose in classfier Step5:- Select choose, it displays many attributes. Select Tree amongst them Step6:- Select J48 from that tree Step7:- Choose start Step8:- Right click and select result list Step9:- The result list contains 11:37:42 trees,J48-right click and select visualize tree.

CHALAPATHI INSTITUTE OF ENGINEERING & TECHNOLOGY lab manual/cs451.pdfDATA ENGINEERING LABORATORY 4...

Documents

Transcript of CHALAPATHI INSTITUTE OF ENGINEERING & TECHNOLOGY lab manual/cs451.pdfDATA ENGINEERING LABORATORY 4...