Rapid Miner.docx

13
Study of Data Mining Tool : RapidMiner RapidMiner, formerly YALE (Yet Another Learning Environment), is an environment for machine learning , data mining , text mining , predictive analytics , and business analytics . It is used for research, education, training, rapid prototyping , application development , and industrial applications. In a poll byKDnuggets, a data-mining newspaper, RapidMiner ranked second in data mining/analytic tools used for real projects in 2009 [1] and was first in 2010. [2] It is distributed under the AGPL open source license and has been hosted bySourceForge since 2004. The RapidMiner project was started in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the University of Dortmund . In 2006 Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that is now the main contributor out of more than 30 international developers further developing RapidMiner. Purpose RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML files and created in RapidMiner's graphical user interface (GUI). RapidMiner is written in the Java programming language. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project. The Community Edition of RapidMiner is a toolkit for data mining. It is able to define analytical steps (similar to R), and in generating graphs like MS Excel. It is also used for analyzing data generated by high-throughput instruments used in processes such as genotyping, proteomics, and mass spectrometry. Example applications:

description

Describes the Rapid miner tool for data mining

Transcript of Rapid Miner.docx

Page 1: Rapid Miner.docx

Study of Data Mining Tool : RapidMiner

RapidMiner, formerly YALE (Yet Another Learning Environment), is an environment for machine learning, data mining, text mining, predictive analytics, and business analytics. It is used for research, education, training, rapid prototyping, application development, and industrial applications. In a poll byKDnuggets, a data-mining newspaper, RapidMiner ranked second in data mining/analytic tools used for real projects in 2009[1] and was first in 2010.[2] It is distributed under the AGPL open source license and has been hosted bySourceForge since 2004.

The RapidMiner project was started in 2001 by Ralf Klinkenberg, Ingo Mierswa, and Simon Fischer at the Artificial Intelligence Unit of the University of Dortmund. In 2006 Ingo Mierswa and Ralf Klinkenberg founded the company Rapid-I that is now the main contributor out of more than 30 international developers further developing RapidMiner.

Purpose

RapidMiner provides data mining and machine learning procedures including: data loading and transformation (ETL), data preprocessing and visualization, modelling, evaluation, and deployment. The data mining processes can be made up of arbitrarily nestable operators, described in XML files and created in RapidMiner's graphical user interface (GUI). RapidMiner is written in the Java programming language. It also integrates learning schemes and attribute evaluators of the Weka machine learning environment and statistical modelling schemes of the R-Project.

The Community Edition of RapidMiner is a toolkit for data mining. It is able to define analytical steps (similar to R), and in generating graphs like MS Excel. It is also used for analyzing data generated by high-throughput instruments used in processes such as genotyping, proteomics, and mass spectrometry.

Example applications:

Bypassing its data mining functions and have RapidMiner generate figures. Exploring data in Microsoft Excel fashion ("knowledge discovery"). Constructing custom data analysis workflows. Calling RapidMiner functions from programs written in other languages/systems (e.g.

Perl).

Features:

Broad collection of data mining algorithms such as decision trees and self-organization maps.

Overlapping histograms, tree charts and 3D scatter plots. Many varied plugins, such as a text plugin for doing text analysis.

Page 2: Rapid Miner.docx

Applications

RapidMiner can be used for text mining, multimedia mining, feature engineering, data stream mining and tracking drifting concepts, development of ensemble methods, and distributed data mining. RapidMiner was rated as the fifth most used text mining software (6%) by Rexer's Annual Data Miner Survey in 2010.

RapidMiner is found in the: electronics industry, energy industry, automobile industry, commerce, aviation, telecommunications, banking and insurance, production, IT industry, market research, pharmaceutical industry and other fields.

Properties

Some properties of RapidMiner are:

written in Java knowledge discovery processes are modeled as operator trees internal XML representation ensures standardized interchange format of data mining

experiments scripting language allows for automatic large-scale experiments multi-layered data view concept ensures efficient and transparent data handling graphical user interface, command line mode (batch mode), and Java API for using

RapidMiner from other programs plugin and extension mechanisms, several plugins already exist plotting facility offering a large set of high-dimensional visualization schemes for data

and models applications include text mining, multimedia mining, feature engineering, data stream

mining and tracking drifting concepts, development of ensemble methods, and distributed data mining.

GUI

RapidMiner provides a GUI to design an analytical pipeline (the "operator tree"). The GUI generates an XML (eXtensible Markup Language) file that defines the analytical processes the user wishes to apply to the data. This file is then read by RapidMiner to run the analyses automatically.

While these are running the GUI can also be used to interactively control and inspect running processes.

Other uses can involve calling RapidMiner from other programs and processes, for example from a Perl program. The Java application programming interface (API) provides clear interfaces for applying operators individually, i.e. there is no need to create an operator tree, providing the ability to bypass the GUI and control analytical processes directly. Individual RapidMiner functions can be called directly from the command line.

Page 3: Rapid Miner.docx

Software Versions

RapidMiner is open-source and is offered free of charge as a Community Edition released under the GNU AGPL. There is also an Enterprise Edition offered under a proprietary commercial license, to allow integration into closed-source solutions.

Extensions

The Rapidminer can be extended with additional plugins. The program suite contains around 15 extensions which advance its applicability to: text mining, image processing, time series processing, web mining, statistics, visualization, semantics, paralleling of computation process, automatic process design (PaREn Automatic System Construction Wizard) and others.

Several of the extensions can be found directly in the application in an extension manager. The other extensions can be downloaded from their respective developers.

Page 4: Rapid Miner.docx

Starting User Interface....

Starting Home Screen..

It appears while you start you start the Rapid Miner..

Page 5: Rapid Miner.docx

Now click on new process..So that below screen appears..

Expand Repository Access and then select Retrieve. And then in parameters select repository entry and select the database..

Page 6: Rapid Miner.docx

Now select your operation. Here k-means clustering is selected from Modelling>Clustering and Segmentation>k-Means. And then they are connected as shown below.

Now the Output are connected..

Page 7: Rapid Miner.docx

Now click on Run to see the result..

Result for k-Means clustering is shown below..

Page 8: Rapid Miner.docx

Only two clusters are formed(Because we have selected K=2)..

Page 9: Rapid Miner.docx

In a similar way we also use decision tree for classification.

The output is

Page 10: Rapid Miner.docx

We can also do the stratified sampling..

The output is..

Page 11: Rapid Miner.docx

We can also have rule learning..

The output is: