Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data...
Transcript of Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data...
![Page 1: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/1.jpg)
Roll No:____
Name:__________________
Sem:_______Section______
Data Warehouse & Mining Lab Manual
![Page 2: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/2.jpg)
Prof. Almas Ansari Page 1
Data Warehouse & Mining Lab Manual 2018
CERTIFICATE Certified that this file is submitted by
Shri/Ku.___________________________________________________________
Roll No. ________a student of VII Semester final year of the course Computer
Science & Engineering as a part of PRACTICAL as prescribed by the Rashtrasant
Tukadoji Maharaj Nagpur University for the subject Data Warehouse & Mining in
the laboratory of ___________________________________during the academic year
_________________________ and that I have instructed him/her for the said work,
from time to time and I found him/her to be satisfactory progressive.
And that I have accessed the said work and I am satisfied that the same is up to
that standard envisaged for the course.
Date: - Signature & Name Signature & Name
of Subject Teacher of HOD
![Page 3: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/3.jpg)
Prof. Almas Ansari Page 2
Data Warehouse & Mining Lab Manual 2018
Anjuman College of Engineering and Technology Vision
To be a centre of excellence for developing quality technocrats with moral and social
ethics, to face the global challenges for the sustainable development of society.
Mission
To create conducive academic culture for learning and identifying career goals.
To provide quality technical education, research opportunities and imbibe
entrepreneurship skills contributing to the socio-economic growth of the Nation.
To inculcate values and skills, that will empower our students towards development
through technology.
Vision and Mission of the Department
Vision:
To achieve excellent standards of quality education in the field of computer science
and engineering, aiming towards development of ethically strong technical experts
contributing to the profession in the global society.
Mission:
To create outcome based education environment for learning and identifying career
goals.
Provide latest tools in a learning ambience to enhance innovations, problem solving
skills, leadership qualities team spirit and ethical responsibilities.
Inculcating awareness through innovative activities in the emerging areas of
technology.
![Page 4: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/4.jpg)
Prof. Almas Ansari Page 3
Data Warehouse & Mining Lab Manual 2018
Program Educational Objectives (PEOs)
The graduates will have a strong foundation in mathematical, scientific and
engineering fundamentals necessary to formulate, solve and analyze engineering
problem in their career.
Graduates will be able to create and design computer support systems and impart
knowledge and skills to analyze, design, test and implement various software
applications.
Graduates will work productively as computer science engineers towards betterment
of society exhibiting ethical qualities.
Program Specific Outcomes (PSOs)
Foundation of mathematical concepts: To use mathematical methodologies and
techniques for computing and solving problem using suitable mathematical analysis,
data structures, database and algorithms as per the requirement.
Foundation of Computer System: The capability and ability to interpret and
understand the fundamental concepts and methodology of computer systems and
programming. Students can understand the functionality of hardware and software
aspects of computer systems, networks and security.
Foundations of Software development: The ability to grasp the software development
lifecycle and methodologies of software system and project development.
![Page 5: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/5.jpg)
Prof. Almas Ansari Page 4
Data Warehouse & Mining Lab Manual 2018
PROGRAM: CSE DEGREE: B.E
COURSE: DATA WAREHOUSING AND
MINING
SEMESTER: VII CREDITS: 2
COURSE CODE: BECSE401T COURSE TYPE: REGULAR
COURSE AREA/DOMAIN: DATA MINING CONTACT HOURS: 2 hours/Week.
CORRESPONDING LAB COURSE CODE :
BECSE401P
LAB COURSE NAME : DATA
WAREHOUSING AND MINING LAB
COURSE PRE-REQUISITES:
C.CODE COURSE NAME DESCRIPTION SEM
DATABASE MANAGEMENT SYSTEMS v
LAB COURSE OBJECTIVES:
To familiarize students with the basic concepts of Data mining and Warehousing.
To explain and demonstrate various mining algorithms on real world data.
To brief students about the future trends in the fields of data mining.
COURSE OUTCOMES: Data warehousing and mining lab
After completion of this course the students will be able -
SNO DESCRIPTION BLOOM‟S TAXONOMY
LEVEL
CO.1 Create a dataset for any application in the .arff format. LEVEL 6
CO.2 Describe various preprocessing techniques and statistical techniques
and apply those techniques on the given data set.
LEVEL 1,3
CO.3 Apply various association rule mining algorithms on the given data
set
LEVEL 3
CO.4 Apply various classification algorithms on the given data set. LEVEL 3
CO.5 Apply various clustering algorithms on the given data set. LEVEL 3
CO.6 Create an application using outlier analysis. LEVEL 6
![Page 6: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/6.jpg)
Prof. Almas Ansari Page 5
Data Warehouse & Mining Lab Manual 2018
Lab Instructions:
Make entry in the Log Book as soon as you enter the Laboratory.
All the students should sit according to their Roll Numbers.
All the students are supposed to enter the terminal number in the Log Book.
Do not change the terminal on which you are working.
Strictly observe the instructions given by the Faculty / Lab. Instructor.
Take permission before entering in the lab and keep your belongings in the
racks.
NO FOOD, DRINK, IN ANY FORM is allowed in the lab.
TURN OFF CELL PHONES! If you need to use it, please keep it in bags.
Avoid all horseplay in the laboratory. Do not misbehave in the computer
laboratory. Work quietly.
Save often and keep your files organized.
Don‟t change settings and surf safely.
Do not reboot, turn off, or move any workstation or PC.
Do not load any software on any lab computer (without prior permission of
Faculty and Technical Support Personnel). Only Lab Operators and Technical
Support Personnel are authorized to carry out these tasks.
Do not reconfigure the cabling/equipment without prior permission.
Do not play games on systems.
Turn off the machine once you are done using it.
Violation of the above rules and etiquette guidelines will result in disciplinary
action.
![Page 7: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/7.jpg)
Prof. Almas Ansari Page 6
Data Warehouse & Mining Lab Manual 2018
Continuous Assessment Practical Exp
No NAME OF EXPERIMENT Date Sign Remark
1.
Demonstration of preprocessing on .arff file using
student data .arff
2. To perform the statistical analysis of data
3.
Demonstration of association rule mining using
apriory algorithm on supermarket data.
4.
Demonstration of FP Growth algorithm on
supermarket data
5.
To perform the classification by decision tree
induction using weka tools.
6.
To perform classification using Bayesian
classification algorithm using R.
7.
To perform the cluster analysis by k-means method
using R.
8.
To perform the hierarchical clustering using R
programming.
9. Study of Regression Analysis using R programming.
10. Outlier detection using R programming.
Content Beyond Syllabus
11.
To Study and introduction to leading open-source
RapidMiner tool for data mining solution
![Page 8: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/8.jpg)
Prof. Almas Ansari Page 7
Data Warehouse & Mining Lab Manual 2018
CONTENTS Exp
No NAME OF EXPERIMENT
PAGE
NO.
1. Demonstration of preprocessing on .arff file using student data .arff
2. To perform the statistical analysis of data
3.
Demonstration of association rule mining using Apriory algorithm on
supermarket data.
4. Demonstration of FP Growth algorithm on supermarket data
5.
To perform the classification by decision tree induction using weka
tools.
6.
To perform classification using Bayesian classification algorithm
using R.
7. To perform the cluster analysis by k-means method using R.
8. To perform the hierarchical clustering using R programming.
9. Study of Regression Analysis using R programming.
10. Outlier detection using R programming.
Content Beyond Syllabus
11.
To Study and introduction to leading open-source RapidMiner tool for
data mining solution
![Page 9: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/9.jpg)
Prof. Almas Ansari Page 8
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 1
![Page 10: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/10.jpg)
Prof. Almas Ansari Page 9
Data Warehouse & Mining Lab Manual 2018
Aim: - Demonstration of preprocessing on .arff file uses student data.
The procedure for creating a ARFF File in Weka is quite simple.
Note: This is for a XLSX file/dataset containing alphanumeric values.
1) If you have a XLSX file then you need to convert it into a CSV (Comma Separated Values)
File.
2) Then Open the CSV File with a text editor eg .Notepad++
3) Append header relation e.g. @relation compile-weka.filters.unsupervised.attribute
4) After that append the file with headers equal to the number of instances in your XLSX file
e.g. @attribute max numeric @attribute min numeric @attribute mean numeric @attribute
median numeric. This means the file has four columns excluding the class label.
5) Add the class label relation eg. @attribute CLASS {0,1} This has 2 classes mainly 0 and after
that append the header with @data and then save the file as .arff
A complete example of the ARFF header can be as follows.
Dataset student .arff
@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
30, high, no, fair, no
30, high, no, excellent, no
30-40, high, no, fair, yes
40, medium, no, fair, yes
40, low, yes, fair, yes
40, low, yes, excellent, no
30-40, low, yes, excellent, yes
30, medium, no, fair, no
30, low, yes, fair, no
40, medium, yes, fair, yes
![Page 11: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/11.jpg)
Prof. Almas Ansari Page 10
Data Warehouse & Mining Lab Manual 2018
30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
40, medium, no, excellent, no
OUTPUT:
Paste Output Screenshot here
![Page 12: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/12.jpg)
Prof. Almas Ansari Page 11
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. What is preprocessing. Why it is necessary?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. How to create .arff, .csv file full form of .arff, .crv?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 13: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/13.jpg)
Prof. Almas Ansari Page 12
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 2
![Page 14: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/14.jpg)
Prof. Almas Ansari Page 13
Data Warehouse & Mining Lab Manual 2018
Aim: - To perform the statistical analysis of data. Discretization, Missing Values,
Numeric Transform)
Theory : This experiment illustrates some of the basic data preprocessing operations that can be
performed using WEKA-Explorer. The sample dataset used for this example is the student data
available in arff format.
Step1: Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.
Step2: Once the data is loaded, weka will recognize the attributes and during the scan of the data
weka will compute some basic strategies on each attribute. The left panel in the above figure shows
the list of recognized attributes while the top panel indicates the names of the base relation or table
and the current working relation (which are same initially).
Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes for the
categorical attributes the frequency of each attribute value is shown, while for continuous attributes
we can obtain min, max, mean, standard deviation and deviation etc.,
Step4: The visualization in the right button panel in the form of cross-tabulation across two
attributes.
Note: we can select another attribute using the dropdown list.
Step5: Selecting or filtering attributes
Removing an attribute-When we need to remove an attribute, we can do this by using the attribute
filters in weka. In the filter model panel, click on choose button, this will show a popup window with
a list of available filters.
Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.
Step 6:
a) Next click the textbox immediately to the right of the choose button. In the resulting dialog box
enter the index of the attribute to be filtered out.
b) Make sure that invert selection option is set to false. The click OK now in the filter box. You
will see “Remove-R-7”.
c) Click the apply button to apply filter to this data. This will remove the attribute and create new
working relation.
d) Save the new working relation as an arff file by clicking save button on the
top(button)panel.(student.arff)
![Page 15: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/15.jpg)
Prof. Almas Ansari Page 14
Data Warehouse & Mining Lab Manual 2018
Discretization
Sometimes association rule mining can only be performed on categorical data.This requires
performing discretization on numeric or continuous attributes.In the following example let us
discretize age attribute.
Let us divide the values of age attribute into three bins(intervals).First load
the dataset into weka(student.arff) Select the age attribute.
Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”fromthe list.
To change the defaults for the filters,click on the box immediately to the right of thechoose button.
We enter the index for the attribute to be discretized.In this case the attribute is age.So wemust
enter „1‟ corresponding to the age attribute.
Enter „3‟ as the number of bins.Leave the remaining field values as they are.
Click OK button.
Clicks apply in the filter panel. This will result in a new working relation with the selectedattribute
partition into 3 bins.
Save the new working relation in a file called student-data-discretized.arff
The following screenshot shows the effect of discretization.
Paste Screen shot of Discretization
![Page 16: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/16.jpg)
Prof. Almas Ansari Page 15
Data Warehouse & Mining Lab Manual 2018
Paste Screen shot of Missing Values
Paste Screen shot Numeric Transform
![Page 17: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/17.jpg)
Prof. Almas Ansari Page 16
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 18: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/18.jpg)
Prof. Almas Ansari Page 17
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 3
![Page 19: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/19.jpg)
Prof. Almas Ansari Page 18
Data Warehouse & Mining Lab Manual 2018
Aim: - Demonstration of association rule mining using apriory algorithm on
supermarket data.
NAME
weka.associations.Apriori
SYNOPSIS
Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds
the required number of rules with the given minimum confidence.
The algorithm has an option to mine class association rules. It is adapted as explained in the second
reference.
For more information see:
R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th
International Conference on Very Large Data Bases, 478-499, 1994.
Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In:
Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998.
OPTIONS
minMetric -- Minimum metric score. Consider only rules with scores higher than this value.
verbose -- If enabled the algorithm will be run in verbose mode.
numRules -- Number of rules to find.
lowerBoundMinSupport -- Lower bound for minimum support.
classIndex -- Index of the class attribute. If set to -1, the last attribute is taken as class attribute.
outputItemSets -- If enabled the itemsets are output as well.
car -- If enabled class association rules are mined instead of (general) association rules.
doNotCheckCapabilities -- If set, associator capabilities are not checked before associator is built
(Use with caution to reduce runtime).
removeAllMissingCols -- Remove columns with all missing values.
significanceLevel -- Significance level. Significance test (confidence metric only).
treatZeroAsMissing -- If enabled, zero (that is, the first value of a nominal) is treated in the same
way as a missing value.
![Page 20: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/20.jpg)
Prof. Almas Ansari Page 19
Data Warehouse & Mining Lab Manual 2018
delta -- Iteratively decrease support by this factor. Reduces support until min support is reached or
required number of rules has been generated.
metricType -- Set the type of metric by which to rank rules. Confidence is the proportion of the
examples covered by the premise that are also covered by the consequence (Class association rules
can only be mined using confidence). Lift is confidence divided by the proportion of all examples
that are covered by the consequence. This is a measure of the importance of the association that is
independent of support. Leverage is the proportion of additional examples covered by both the
premise and consequence above those expected if the premise and consequence were independent of
each other. The total number of examples that this represents is presented in brackets following the
leverage. Conviction is another measure of departure from independence. Conviction is given by
P(premise)P(!consequence) / P(premise, !consequence).
upperBoundMinSupport -- Upper bound for minimum support. Start iteratively decreasing minimum
support from this valueThis experiment illustrates some of the basic elements of association rule
mining using WEKA. The sample dataset used for this example is contactlenses.arff
Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is age attribute.
Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.
Step3: We will use apriori algorithm. This is the default algorithm.
Step4: In order to change the parameters for the run (example support, confidence etc) we click on
the text box immediately to the right of the choose button.
The following screenshot shows the association rules that were generated when apriori algorithm is
applied on the given dataset.
Paste Screen shot
![Page 21: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/21.jpg)
Prof. Almas Ansari Page 20
Data Warehouse & Mining Lab Manual 2018
Paste Screen shot
Paste Screen shot
![Page 22: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/22.jpg)
Prof. Almas Ansari Page 21
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. List all association mining technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief apriori algorithm steps?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 23: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/23.jpg)
Prof. Almas Ansari Page 22
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 4
![Page 24: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/24.jpg)
Prof. Almas Ansari Page 23
Data Warehouse & Mining Lab Manual 2018
Aim: Demonstration of FP Growth algorithm on supermarket data
NAME
weka.associations.FPGrowth
SYNOPSIS
Class implementing the FP-growth algorithm for finding large item sets without
candidate generation. Iteratively reduces the minimum support until it finds the
required number of rules with the given minimum metric. For more information see:
J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In:
Proceedings of the 2000 ACM-SIGMID International Conference on Management of
Data, 1-12, 2000.
OPTIONS
findAllRulesForSupportLevel -- Find all rules that meet the lower bound on minimum
support and the minimum metric constraint. Turning this mode on will disable the
iterative support reduction procedure to find the specified number of rules.
transactionsMustContain -- Limit input to FPGrowth to those transactions (instances)
that contain these items. Provide a comma separated list of attribute names.
numRulesToFind -- The number of rules to output
minMetric -- Minimum metric score. Consider only rules with scores higher than this
value.
rulesMustContain -- Only print rules that contain these items. Provide a comma
separated list of attribute names.
useORForMustContainList -- Use OR instead of AND for transactions/rules must
contain lists.
lowerBoundMinSupport -- Lower bound for minimum support as a fraction or number
of instances.
positiveIndex -- Set the index of binary valued attributes that is to be considered the
positive index. Has no effect for sparse data (in this case the first index (i.e. non-zero
values) is always treated as positive. Also has no effect for unary valued attributes
(i.e. when using the Weka Apriori-style format for market basket data, which uses
missing value "?" to indicate absence of an item.
![Page 25: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/25.jpg)
Prof. Almas Ansari Page 24
Data Warehouse & Mining Lab Manual 2018
doNotCheckCapabilities -- If set, associator capabilities are not checked before
associator is built (Use with caution to reduce runtime).
maxNumberOfItems -- The maximum number of items to include in frequent item
sets. -1 means no limit.
delta -- Iteratively decrease support by this factor. Reduces support until min support
is reached or required number of rules has been generated.
metricType -- Set the type of metric by which to rank rules. Confidence is the
proportion of the examples covered by the premise that are also covered by the
consequence(Class association rules can only be mined using confidence). Lift is
confidence divided by the proportion of all examples that are covered by the
consequence. This is a measure of the importance of the association that is
independent of support. Leverage is the proportion of additional examples covered by
both the premise and consequence above those expected if the premise and
consequence were independent of each other. The total number of examples that this
represents is presented in brackets following the leverage. Conviction is another
measure of departure from independence.
upperBoundMinSupport -- Upper bound for minimum support as a fraction or number
of instances. Start iteratively decreasing minimum support from this value.
Step1: open the data file in Weka Explorer. It is presumed that the required data fields
have been discretized .
Step2: Clicking on the associate tab will bring up the interface for association rule
algorithm.
Step3: We will use FP- Growth algorithm.
Step4: In order to change the parameters for the run (example support confidence etc).
we click on the text box immediately to the right of the choose button
![Page 26: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/26.jpg)
Prof. Almas Ansari Page 25
Data Warehouse & Mining Lab Manual 2018
Paste Screen shot
Paste Screen shot
![Page 27: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/27.jpg)
Prof. Almas Ansari Page 26
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. What is concept hierarchy?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function SUPPORT & CONFIDENCE?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 28: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/28.jpg)
Prof. Almas Ansari Page 27
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 5
![Page 29: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/29.jpg)
Prof. Almas Ansari Page 28
Data Warehouse & Mining Lab Manual 2018
Aim:- To perform the classification by decision tree induction using weka tools. Steps involved in this experiment:
NAME
weka.classifiers.trees.J48
SYNOPSIS
Class for generating a pruned or unpruned C4.5 decision tree. For more information, see
Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.
OPTIONS
seed -- The seed used for randomizing the data when reduced-error pruning is used.
unpruned -- Whether pruning is performed.
confidenceFactor -- The confidence factor used for pruning (smaller values incur more pruning).
numFolds -- Determines the amount of data used for reduced-error pruning. One fold is used for pruning, the rest for growing the tree.
numDecimalPlaces -- The number of decimal places to be used for the output of numbers in the model.
batchSize -- The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.
reducedErrorPruning -- Whether reduced-error pruning is used instead of C.4.5 pruning.
useLaplace -- Whether counts at leaves are smoothed based on Laplace.
doNotMakeSplitPointActualValue -- If true, the split point is not relocated to an actual data value. This can yield substantial speed-ups for large datasets with numeric attributes.
debug -- If set to true, classifier may output additional info to the console.
subtreeRaising -- Whether to consider the subtree raising operation when pruning.
saveInstanceData -- Whether to save the training data for visualization.
binarySplits -- Whether to use binary splits on nominal attributes when building the trees.
doNotCheckCapabilities -- If set, classifier capabilities are not checked before classifier is built (Use with caution to reduce runtime).
minNumObj -- The minimum number of instances per leaf.
useMDLcorrection -- Whether MDL correction is used when finding splits on numeric attributes.
collapseTree -- Whether parts are removed that do not reduce training error.
Step1. We begin the experiment by loading the data (employee.arff) into weka.
Step2: next we select the “classify” tab and click “choose” button to select the “id3”classifier.
Step3: now we specify the various parameters. These can be specified by clicking in the text box to
the right of the chose button. In this example, we accept the default values his default version does
perform some pruning but does not perform error pruning.
![Page 30: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/30.jpg)
Prof. Almas Ansari Page 29
Data Warehouse & Mining Lab Manual 2018
Step4: under the “text “options in the main panel. We select the 10-fold cross validation as our
evaluation approach. Since we don‟t have separate evaluation data set, this is necessary to get a
reasonable idea of accuracy of generated model.
Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as evaluation
statistic will appear in the right panel when the model construction is complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we may find
more work. (Either in preprocessing or in selecting current parameters for the classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can be done
by right clicking the last result set and selecting “visualize tree” from the pop-up menu.
Step-8: we will use our model to classify the new instances.
Step-9: In the main panel under “text “options click the “supplied test set” radio button and then
click the “set” button. This will show pop-up window which will allow you to open the file
containing test instances.
Paste Screen shot
![Page 31: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/31.jpg)
Prof. Almas Ansari Page 30
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist classification methods?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Difference between classification and prediction?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 32: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/32.jpg)
Prof. Almas Ansari Page 31
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 6
![Page 33: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/33.jpg)
Prof. Almas Ansari Page 32
Data Warehouse & Mining Lab Manual 2018
Aim: - To perform classification using Bayesian classification algorithm using R.
![Page 34: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/34.jpg)
Prof. Almas Ansari Page 33
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 35: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/35.jpg)
Prof. Almas Ansari Page 34
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 7
![Page 36: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/36.jpg)
Prof. Almas Ansari Page 35
Data Warehouse & Mining Lab Manual 2018
Aim:- To perform the cluster analysis by k-means method using R.
![Page 37: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/37.jpg)
Prof. Almas Ansari Page 36
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 38: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/38.jpg)
Prof. Almas Ansari Page 37
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 8
![Page 39: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/39.jpg)
Prof. Almas Ansari Page 38
Data Warehouse & Mining Lab Manual 2018
Aim:- To perform the hierarchical clustering using R programming.
![Page 40: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/40.jpg)
Prof. Almas Ansari Page 39
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 41: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/41.jpg)
Prof. Almas Ansari Page 40
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 9
![Page 42: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/42.jpg)
Prof. Almas Ansari Page 41
Data Warehouse & Mining Lab Manual 2018
Aim:- Study of Regression Analysis using R programming.
![Page 43: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/43.jpg)
Prof. Almas Ansari Page 42
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
Signature of Subject Teacher
![Page 44: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/44.jpg)
Prof. Almas Ansari Page 43
Data Warehouse & Mining Lab Manual 2018
EXPERIMENT NO – 10
![Page 45: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/45.jpg)
Prof. Almas Ansari Page 44
Data Warehouse & Mining Lab Manual 2018
Aim:- Outlier detection using R programming.
![Page 46: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS](https://reader033.fdocuments.in/reader033/viewer/2022041911/5e67b7a09efd5c6780477682/html5/thumbnails/46.jpg)
Prof. Almas Ansari Page 45
Data Warehouse & Mining Lab Manual 2018
Viva Voce Question
1. Enlist of preprocessing techniques?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
____________________________________________________________
2. Brief function of each preprocessing technique?
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
Signature of Subject Teacher