Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of...

Post on 22-Dec-2015

228 views 3 download

Tags:

Transcript of Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of...

Knime: a data mining platform

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21

Department of Computer ScienceSchool of Electrical Engineering University of Belgrade

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

The problems we consider

2/21

Ability to access various data sources Data preprocessing capability Integration of different techniques Ability to operate on large datasets:

scalability Good data and model visualization Extensibility Interoperability with other systems Active development community Cost

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Importance of data mining

3/21

What is data mining? Data Mining is used for:

competition analysis market research economical trends consume behavior industry research

“One of the most revolutionary developments”

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

The future of data mining

“One of 10 technologies that will change the world”

Factors that affect growth of data mining: The explosive growth in data collection The storing of the data in data warehouses The availability of increased access to data from Web Wish to increase market share in a globalized

economy Off-the-shelf commercial data mining software Growth in computing power and storage capacity

4/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Tanagra

5/21

Data source aspect: weak No support for JDBC, Access, MySQL, Oracle,CSV Only medium data set size can be dealed with No support for Linux, MacOS.

Functionality aspect Data and model visualisation at a very low level

Usability aspect Human Interaction: manual No interoperability Low extensibility

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Rapid miner (YALE)

6/21

Data source aspect: Does not support ODBC and Access data

sources Usability aspect:

Does not support PMML Very little guidance in the data mining process Reported bugs by users

Data source characteristics

Usability characterstics

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Weka

7/21

Data source aspect: Does not support Excel, Access,ODBC,MySQL,Oracle

Functionality aspect:Supports most required algorithms It is not capable of multi-relational data

mining Usability aspect:

Does not support PMMLExtensibility allowed – a plus

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Knime as a solution

Better than others because: Uses simple and intuitive GUI Easy node configuration and

execution Based on Eclipse platform Many relevant examples Useful help – node description Good for begginers

8/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Integration of various Python,R,Perl,Java snippets Portability – PMML, XML KNIME Cluster Execution – gain in performance

KNIME allows users to: visually create data flows selectively execute analysis steps inspect results

Originality

9/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Time is on Knime’s side

More and more companies use it Intensive development of new SW

features KNIME Enterprise Server KNIME Cluster execution Open source – easily extensible Modules for text and image processing

10/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

11/21

Lista svih projekata

Radna površina trenutno

aktivnog projektaDetaljan opis selektovanog

čvora

Lista dostupnih projekata na

serveru

Konzola na kojoj se vide

obaveštenja i greške u projektu

Lista svih postojećih čvorova

grupisanih po funkcionalnosti

Paleta osnovnih funkcionalnosti

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

12/21

Da biste otvorili novi projekat iz

menija File izaberite New

Izaberite New KNIME Project i kliknite Next

Unesite ime projekta i kliknite

Finish

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

13/21

Kliknite na Browse da odaberete putanju do

fajla

Posle definisanja ulaznog fajla čvor prelazi u stanje ready

Izvršavanje čvora prelazi u treće

stanje

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

14/21

Posle povezivanja čvor je spreman

za izvršenje

Po izvršenju čvora dodaje se nova kolona u tabeli

Document

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

15/21

Vrsi se odabir kolona koje zelimo da filtriramo

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

16/21

Broj redova se smanjio usled filtracije

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

17/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Example

18/21

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

Conclusion

19/21

Data mining is not an automated processData mining needs appropriate SW toolsFrequently more than one SWKnime is an effective solution for

educational purposes Lot of space for improvements in: Supporting various data sources Providing high performance data mining Providing more domain-specific techniques Better support for business application

Q & A

Do you have any questions?

20/21

Stefan Jakšić - jaksamoowe@gmail.comNenad Ivanović - nenadpeuau@gmail.com

Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com

References

[1] Daniel T. Larose , “Discovering Knowledge In Data - An Introduction to Data Mining”, Wiley-Interscience, Hoboken, New Jersey,2005.[2] www.knime.org[3] Xiaojun Chen, Yunming Ye, Graham Williams and Xiaofei Xu, “A Survey of Open Source Data Mining Systems” ,Shenzhen Graduate School, Shenzhen 518055, China, Harbin Institute of Technology, Australian Taxation Office, Australia,2007.[4] www.wikipedia.org[5] Ela Hunt, “Workflow management:motivation and vision“, The Swiss Initiative in Systems Biology,2010[6] RapidMiner 5.0 User Manual

21/21