Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data...

46
Roll No:____ Name:__________________ Sem:_______Section______ Data Warehouse & Mining Lab Manual

Transcript of Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data...

Page 1: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Roll No:____

Name:__________________

Sem:_______Section______

Data Warehouse & Mining Lab Manual

Page 2: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 1

Data Warehouse & Mining Lab Manual 2018

CERTIFICATE Certified that this file is submitted by

Shri/Ku.___________________________________________________________

Roll No. ________a student of VII Semester final year of the course Computer

Science & Engineering as a part of PRACTICAL as prescribed by the Rashtrasant

Tukadoji Maharaj Nagpur University for the subject Data Warehouse & Mining in

the laboratory of ___________________________________during the academic year

_________________________ and that I have instructed him/her for the said work,

from time to time and I found him/her to be satisfactory progressive.

And that I have accessed the said work and I am satisfied that the same is up to

that standard envisaged for the course.

Date: - Signature & Name Signature & Name

of Subject Teacher of HOD

Page 3: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 2

Data Warehouse & Mining Lab Manual 2018

Anjuman College of Engineering and Technology Vision

To be a centre of excellence for developing quality technocrats with moral and social

ethics, to face the global challenges for the sustainable development of society.

Mission

To create conducive academic culture for learning and identifying career goals.

To provide quality technical education, research opportunities and imbibe

entrepreneurship skills contributing to the socio-economic growth of the Nation.

To inculcate values and skills, that will empower our students towards development

through technology.

Vision and Mission of the Department

Vision:

To achieve excellent standards of quality education in the field of computer science

and engineering, aiming towards development of ethically strong technical experts

contributing to the profession in the global society.

Mission:

To create outcome based education environment for learning and identifying career

goals.

Provide latest tools in a learning ambience to enhance innovations, problem solving

skills, leadership qualities team spirit and ethical responsibilities.

Inculcating awareness through innovative activities in the emerging areas of

technology.

Page 4: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 3

Data Warehouse & Mining Lab Manual 2018

Program Educational Objectives (PEOs)

The graduates will have a strong foundation in mathematical, scientific and

engineering fundamentals necessary to formulate, solve and analyze engineering

problem in their career.

Graduates will be able to create and design computer support systems and impart

knowledge and skills to analyze, design, test and implement various software

applications.

Graduates will work productively as computer science engineers towards betterment

of society exhibiting ethical qualities.

Program Specific Outcomes (PSOs)

Foundation of mathematical concepts: To use mathematical methodologies and

techniques for computing and solving problem using suitable mathematical analysis,

data structures, database and algorithms as per the requirement.

Foundation of Computer System: The capability and ability to interpret and

understand the fundamental concepts and methodology of computer systems and

programming. Students can understand the functionality of hardware and software

aspects of computer systems, networks and security.

Foundations of Software development: The ability to grasp the software development

lifecycle and methodologies of software system and project development.

Page 5: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 4

Data Warehouse & Mining Lab Manual 2018

PROGRAM: CSE DEGREE: B.E

COURSE: DATA WAREHOUSING AND

MINING

SEMESTER: VII CREDITS: 2

COURSE CODE: BECSE401T COURSE TYPE: REGULAR

COURSE AREA/DOMAIN: DATA MINING CONTACT HOURS: 2 hours/Week.

CORRESPONDING LAB COURSE CODE :

BECSE401P

LAB COURSE NAME : DATA

WAREHOUSING AND MINING LAB

COURSE PRE-REQUISITES:

C.CODE COURSE NAME DESCRIPTION SEM

DATABASE MANAGEMENT SYSTEMS v

LAB COURSE OBJECTIVES:

To familiarize students with the basic concepts of Data mining and Warehousing.

To explain and demonstrate various mining algorithms on real world data.

To brief students about the future trends in the fields of data mining.

COURSE OUTCOMES: Data warehousing and mining lab

After completion of this course the students will be able -

SNO DESCRIPTION BLOOM‟S TAXONOMY

LEVEL

CO.1 Create a dataset for any application in the .arff format. LEVEL 6

CO.2 Describe various preprocessing techniques and statistical techniques

and apply those techniques on the given data set.

LEVEL 1,3

CO.3 Apply various association rule mining algorithms on the given data

set

LEVEL 3

CO.4 Apply various classification algorithms on the given data set. LEVEL 3

CO.5 Apply various clustering algorithms on the given data set. LEVEL 3

CO.6 Create an application using outlier analysis. LEVEL 6

Page 6: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 5

Data Warehouse & Mining Lab Manual 2018

Lab Instructions:

Make entry in the Log Book as soon as you enter the Laboratory.

All the students should sit according to their Roll Numbers.

All the students are supposed to enter the terminal number in the Log Book.

Do not change the terminal on which you are working.

Strictly observe the instructions given by the Faculty / Lab. Instructor.

Take permission before entering in the lab and keep your belongings in the

racks.

NO FOOD, DRINK, IN ANY FORM is allowed in the lab.

TURN OFF CELL PHONES! If you need to use it, please keep it in bags.

Avoid all horseplay in the laboratory. Do not misbehave in the computer

laboratory. Work quietly.

Save often and keep your files organized.

Don‟t change settings and surf safely.

Do not reboot, turn off, or move any workstation or PC.

Do not load any software on any lab computer (without prior permission of

Faculty and Technical Support Personnel). Only Lab Operators and Technical

Support Personnel are authorized to carry out these tasks.

Do not reconfigure the cabling/equipment without prior permission.

Do not play games on systems.

Turn off the machine once you are done using it.

Violation of the above rules and etiquette guidelines will result in disciplinary

action.

Page 7: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 6

Data Warehouse & Mining Lab Manual 2018

Continuous Assessment Practical Exp

No NAME OF EXPERIMENT Date Sign Remark

1.

Demonstration of preprocessing on .arff file using

student data .arff

2. To perform the statistical analysis of data

3.

Demonstration of association rule mining using

apriory algorithm on supermarket data.

4.

Demonstration of FP Growth algorithm on

supermarket data

5.

To perform the classification by decision tree

induction using weka tools.

6.

To perform classification using Bayesian

classification algorithm using R.

7.

To perform the cluster analysis by k-means method

using R.

8.

To perform the hierarchical clustering using R

programming.

9. Study of Regression Analysis using R programming.

10. Outlier detection using R programming.

Content Beyond Syllabus

11.

To Study and introduction to leading open-source

RapidMiner tool for data mining solution

Page 8: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 7

Data Warehouse & Mining Lab Manual 2018

CONTENTS Exp

No NAME OF EXPERIMENT

PAGE

NO.

1. Demonstration of preprocessing on .arff file using student data .arff

2. To perform the statistical analysis of data

3.

Demonstration of association rule mining using Apriory algorithm on

supermarket data.

4. Demonstration of FP Growth algorithm on supermarket data

5.

To perform the classification by decision tree induction using weka

tools.

6.

To perform classification using Bayesian classification algorithm

using R.

7. To perform the cluster analysis by k-means method using R.

8. To perform the hierarchical clustering using R programming.

9. Study of Regression Analysis using R programming.

10. Outlier detection using R programming.

Content Beyond Syllabus

11.

To Study and introduction to leading open-source RapidMiner tool for

data mining solution

Page 9: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 8

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 1

Page 10: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 9

Data Warehouse & Mining Lab Manual 2018

Aim: - Demonstration of preprocessing on .arff file uses student data.

The procedure for creating a ARFF File in Weka is quite simple.

Note: This is for a XLSX file/dataset containing alphanumeric values.

1) If you have a XLSX file then you need to convert it into a CSV (Comma Separated Values)

File.

2) Then Open the CSV File with a text editor eg .Notepad++

3) Append header relation e.g. @relation compile-weka.filters.unsupervised.attribute

4) After that append the file with headers equal to the number of instances in your XLSX file

e.g. @attribute max numeric @attribute min numeric @attribute mean numeric @attribute

median numeric. This means the file has four columns excluding the class label.

5) Add the class label relation eg. @attribute CLASS {0,1} This has 2 classes mainly 0 and after

that append the header with @data and then save the file as .arff

A complete example of the ARFF header can be as follows.

Dataset student .arff

@relation student

@attribute age {<30,30-40,>40}

@attribute income {low, medium, high}

@attribute student {yes, no}

@attribute credit-rating {fair, excellent}

@attribute buyspc {yes, no}

@data

30, high, no, fair, no

30, high, no, excellent, no

30-40, high, no, fair, yes

40, medium, no, fair, yes

40, low, yes, fair, yes

40, low, yes, excellent, no

30-40, low, yes, excellent, yes

30, medium, no, fair, no

30, low, yes, fair, no

40, medium, yes, fair, yes

Page 11: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 10

Data Warehouse & Mining Lab Manual 2018

30, medium, yes, excellent, yes

30-40, medium, no, excellent, yes

30-40, high, yes, fair, yes

40, medium, no, excellent, no

OUTPUT:

Paste Output Screenshot here

Page 12: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 11

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. What is preprocessing. Why it is necessary?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. How to create .arff, .csv file full form of .arff, .crv?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 13: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 12

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 2

Page 14: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 13

Data Warehouse & Mining Lab Manual 2018

Aim: - To perform the statistical analysis of data. Discretization, Missing Values,

Numeric Transform)

Theory : This experiment illustrates some of the basic data preprocessing operations that can be

performed using WEKA-Explorer. The sample dataset used for this example is the student data

available in arff format.

Step1: Loading the data. We can load the dataset into weka by clicking on open button in

preprocessing interface and selecting the appropriate file.

Step2: Once the data is loaded, weka will recognize the attributes and during the scan of the data

weka will compute some basic strategies on each attribute. The left panel in the above figure shows

the list of recognized attributes while the top panel indicates the names of the base relation or table

and the current working relation (which are same initially).

Step3:Clicking on an attribute in the left panel will show the basic statistics on the attributes for the

categorical attributes the frequency of each attribute value is shown, while for continuous attributes

we can obtain min, max, mean, standard deviation and deviation etc.,

Step4: The visualization in the right button panel in the form of cross-tabulation across two

attributes.

Note: we can select another attribute using the dropdown list.

Step5: Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute, we can do this by using the attribute

filters in weka. In the filter model panel, click on choose button, this will show a popup window with

a list of available filters.

Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.

Step 6:

a) Next click the textbox immediately to the right of the choose button. In the resulting dialog box

enter the index of the attribute to be filtered out.

b) Make sure that invert selection option is set to false. The click OK now in the filter box. You

will see “Remove-R-7”.

c) Click the apply button to apply filter to this data. This will remove the attribute and create new

working relation.

d) Save the new working relation as an arff file by clicking save button on the

top(button)panel.(student.arff)

Page 15: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 14

Data Warehouse & Mining Lab Manual 2018

Discretization

Sometimes association rule mining can only be performed on categorical data.This requires

performing discretization on numeric or continuous attributes.In the following example let us

discretize age attribute.

Let us divide the values of age attribute into three bins(intervals).First load

the dataset into weka(student.arff) Select the age attribute.

Activate filter-dialog box and select “WEKA.filters.unsupervised.attribute.discretize”fromthe list.

To change the defaults for the filters,click on the box immediately to the right of thechoose button.

We enter the index for the attribute to be discretized.In this case the attribute is age.So wemust

enter „1‟ corresponding to the age attribute.

Enter „3‟ as the number of bins.Leave the remaining field values as they are.

Click OK button.

Clicks apply in the filter panel. This will result in a new working relation with the selectedattribute

partition into 3 bins.

Save the new working relation in a file called student-data-discretized.arff

The following screenshot shows the effect of discretization.

Paste Screen shot of Discretization

Page 16: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 15

Data Warehouse & Mining Lab Manual 2018

Paste Screen shot of Missing Values

Paste Screen shot Numeric Transform

Page 17: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 16

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 18: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 17

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 3

Page 19: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 18

Data Warehouse & Mining Lab Manual 2018

Aim: - Demonstration of association rule mining using apriory algorithm on

supermarket data.

NAME

weka.associations.Apriori

SYNOPSIS

Class implementing an Apriori-type algorithm. Iteratively reduces the minimum support until it finds

the required number of rules with the given minimum confidence.

The algorithm has an option to mine class association rules. It is adapted as explained in the second

reference.

For more information see:

R. Agrawal, R. Srikant: Fast Algorithms for Mining Association Rules in Large Databases. In: 20th

International Conference on Very Large Data Bases, 478-499, 1994.

Bing Liu, Wynne Hsu, Yiming Ma: Integrating Classification and Association Rule Mining. In:

Fourth International Conference on Knowledge Discovery and Data Mining, 80-86, 1998.

OPTIONS

minMetric -- Minimum metric score. Consider only rules with scores higher than this value.

verbose -- If enabled the algorithm will be run in verbose mode.

numRules -- Number of rules to find.

lowerBoundMinSupport -- Lower bound for minimum support.

classIndex -- Index of the class attribute. If set to -1, the last attribute is taken as class attribute.

outputItemSets -- If enabled the itemsets are output as well.

car -- If enabled class association rules are mined instead of (general) association rules.

doNotCheckCapabilities -- If set, associator capabilities are not checked before associator is built

(Use with caution to reduce runtime).

removeAllMissingCols -- Remove columns with all missing values.

significanceLevel -- Significance level. Significance test (confidence metric only).

treatZeroAsMissing -- If enabled, zero (that is, the first value of a nominal) is treated in the same

way as a missing value.

Page 20: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 19

Data Warehouse & Mining Lab Manual 2018

delta -- Iteratively decrease support by this factor. Reduces support until min support is reached or

required number of rules has been generated.

metricType -- Set the type of metric by which to rank rules. Confidence is the proportion of the

examples covered by the premise that are also covered by the consequence (Class association rules

can only be mined using confidence). Lift is confidence divided by the proportion of all examples

that are covered by the consequence. This is a measure of the importance of the association that is

independent of support. Leverage is the proportion of additional examples covered by both the

premise and consequence above those expected if the premise and consequence were independent of

each other. The total number of examples that this represents is presented in brackets following the

leverage. Conviction is another measure of departure from independence. Conviction is given by

P(premise)P(!consequence) / P(premise, !consequence).

upperBoundMinSupport -- Upper bound for minimum support. Start iteratively decreasing minimum

support from this valueThis experiment illustrates some of the basic elements of association rule

mining using WEKA. The sample dataset used for this example is contactlenses.arff

Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been

discretized. In this example it is age attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use apriori algorithm. This is the default algorithm.

Step4: In order to change the parameters for the run (example support, confidence etc) we click on

the text box immediately to the right of the choose button.

The following screenshot shows the association rules that were generated when apriori algorithm is

applied on the given dataset.

Paste Screen shot

Page 21: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 20

Data Warehouse & Mining Lab Manual 2018

Paste Screen shot

Paste Screen shot

Page 22: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 21

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. List all association mining technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief apriori algorithm steps?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 23: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 22

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 4

Page 24: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 23

Data Warehouse & Mining Lab Manual 2018

Aim: Demonstration of FP Growth algorithm on supermarket data

NAME

weka.associations.FPGrowth

SYNOPSIS

Class implementing the FP-growth algorithm for finding large item sets without

candidate generation. Iteratively reduces the minimum support until it finds the

required number of rules with the given minimum metric. For more information see:

J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In:

Proceedings of the 2000 ACM-SIGMID International Conference on Management of

Data, 1-12, 2000.

OPTIONS

findAllRulesForSupportLevel -- Find all rules that meet the lower bound on minimum

support and the minimum metric constraint. Turning this mode on will disable the

iterative support reduction procedure to find the specified number of rules.

transactionsMustContain -- Limit input to FPGrowth to those transactions (instances)

that contain these items. Provide a comma separated list of attribute names.

numRulesToFind -- The number of rules to output

minMetric -- Minimum metric score. Consider only rules with scores higher than this

value.

rulesMustContain -- Only print rules that contain these items. Provide a comma

separated list of attribute names.

useORForMustContainList -- Use OR instead of AND for transactions/rules must

contain lists.

lowerBoundMinSupport -- Lower bound for minimum support as a fraction or number

of instances.

positiveIndex -- Set the index of binary valued attributes that is to be considered the

positive index. Has no effect for sparse data (in this case the first index (i.e. non-zero

values) is always treated as positive. Also has no effect for unary valued attributes

(i.e. when using the Weka Apriori-style format for market basket data, which uses

missing value "?" to indicate absence of an item.

Page 25: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 24

Data Warehouse & Mining Lab Manual 2018

doNotCheckCapabilities -- If set, associator capabilities are not checked before

associator is built (Use with caution to reduce runtime).

maxNumberOfItems -- The maximum number of items to include in frequent item

sets. -1 means no limit.

delta -- Iteratively decrease support by this factor. Reduces support until min support

is reached or required number of rules has been generated.

metricType -- Set the type of metric by which to rank rules. Confidence is the

proportion of the examples covered by the premise that are also covered by the

consequence(Class association rules can only be mined using confidence). Lift is

confidence divided by the proportion of all examples that are covered by the

consequence. This is a measure of the importance of the association that is

independent of support. Leverage is the proportion of additional examples covered by

both the premise and consequence above those expected if the premise and

consequence were independent of each other. The total number of examples that this

represents is presented in brackets following the leverage. Conviction is another

measure of departure from independence.

upperBoundMinSupport -- Upper bound for minimum support as a fraction or number

of instances. Start iteratively decreasing minimum support from this value.

Step1: open the data file in Weka Explorer. It is presumed that the required data fields

have been discretized .

Step2: Clicking on the associate tab will bring up the interface for association rule

algorithm.

Step3: We will use FP- Growth algorithm.

Step4: In order to change the parameters for the run (example support confidence etc).

we click on the text box immediately to the right of the choose button

Page 26: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 25

Data Warehouse & Mining Lab Manual 2018

Paste Screen shot

Paste Screen shot

Page 27: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 26

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. What is concept hierarchy?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function SUPPORT & CONFIDENCE?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 28: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 27

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 5

Page 29: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 28

Data Warehouse & Mining Lab Manual 2018

Aim:- To perform the classification by decision tree induction using weka tools. Steps involved in this experiment:

NAME

weka.classifiers.trees.J48

SYNOPSIS

Class for generating a pruned or unpruned C4.5 decision tree. For more information, see

Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo, CA.

OPTIONS

seed -- The seed used for randomizing the data when reduced-error pruning is used.

unpruned -- Whether pruning is performed.

confidenceFactor -- The confidence factor used for pruning (smaller values incur more pruning).

numFolds -- Determines the amount of data used for reduced-error pruning. One fold is used for pruning, the rest for growing the tree.

numDecimalPlaces -- The number of decimal places to be used for the output of numbers in the model.

batchSize -- The preferred number of instances to process if batch prediction is being performed. More or fewer instances may be provided, but this gives implementations a chance to specify a preferred batch size.

reducedErrorPruning -- Whether reduced-error pruning is used instead of C.4.5 pruning.

useLaplace -- Whether counts at leaves are smoothed based on Laplace.

doNotMakeSplitPointActualValue -- If true, the split point is not relocated to an actual data value. This can yield substantial speed-ups for large datasets with numeric attributes.

debug -- If set to true, classifier may output additional info to the console.

subtreeRaising -- Whether to consider the subtree raising operation when pruning.

saveInstanceData -- Whether to save the training data for visualization.

binarySplits -- Whether to use binary splits on nominal attributes when building the trees.

doNotCheckCapabilities -- If set, classifier capabilities are not checked before classifier is built (Use with caution to reduce runtime).

minNumObj -- The minimum number of instances per leaf.

useMDLcorrection -- Whether MDL correction is used when finding splits on numeric attributes.

collapseTree -- Whether parts are removed that do not reduce training error.

Step1. We begin the experiment by loading the data (employee.arff) into weka.

Step2: next we select the “classify” tab and click “choose” button to select the “id3”classifier.

Step3: now we specify the various parameters. These can be specified by clicking in the text box to

the right of the chose button. In this example, we accept the default values his default version does

perform some pruning but does not perform error pruning.

Page 30: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 29

Data Warehouse & Mining Lab Manual 2018

Step4: under the “text “options in the main panel. We select the 10-fold cross validation as our

evaluation approach. Since we don‟t have separate evaluation data set, this is necessary to get a

reasonable idea of accuracy of generated model.

Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as evaluation

statistic will appear in the right panel when the model construction is complete.

Step-6: note that the classification accuracy of model is about 69%.this indicates that we may find

more work. (Either in preprocessing or in selecting current parameters for the classification)

Step-7: now weka also lets us a view a graphical version of the classification tree. This can be done

by right clicking the last result set and selecting “visualize tree” from the pop-up menu.

Step-8: we will use our model to classify the new instances.

Step-9: In the main panel under “text “options click the “supplied test set” radio button and then

click the “set” button. This will show pop-up window which will allow you to open the file

containing test instances.

Paste Screen shot

Page 31: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 30

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist classification methods?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Difference between classification and prediction?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 32: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 31

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 6

Page 33: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 32

Data Warehouse & Mining Lab Manual 2018

Aim: - To perform classification using Bayesian classification algorithm using R.

Page 34: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 33

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 35: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 34

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 7

Page 36: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 35

Data Warehouse & Mining Lab Manual 2018

Aim:- To perform the cluster analysis by k-means method using R.

Page 37: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 36

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 38: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 37

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 8

Page 39: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 38

Data Warehouse & Mining Lab Manual 2018

Aim:- To perform the hierarchical clustering using R programming.

Page 40: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 39

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 41: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 40

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 9

Page 42: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 41

Data Warehouse & Mining Lab Manual 2018

Aim:- Study of Regression Analysis using R programming.

Page 43: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 42

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

Signature of Subject Teacher

Page 44: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 43

Data Warehouse & Mining Lab Manual 2018

EXPERIMENT NO – 10

Page 45: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 44

Data Warehouse & Mining Lab Manual 2018

Aim:- Outlier detection using R programming.

Page 46: Data Warehouse & Mining Lab Manual - WordPress.com · 2018-09-27 · RapidMiner tool for data mining solution. Prof. Almas Ansari Page 7 Data Warehouse & Mining Lab Manual 2018 CONTENTS

Prof. Almas Ansari Page 45

Data Warehouse & Mining Lab Manual 2018

Viva Voce Question

1. Enlist of preprocessing techniques?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

____________________________________________________________

2. Brief function of each preprocessing technique?

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

________________________________________________________________

Signature of Subject Teacher