Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected] 1/21 Department of...
-
Upload
kelley-sheryl-mccarthy -
Category
Documents
-
view
228 -
download
3
Transcript of Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected] 1/21 Department of...
![Page 1: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/1.jpg)
Knime: a data mining platform
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected] 1/21
Department of Computer ScienceSchool of Electrical Engineering University of Belgrade
![Page 2: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/2.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
The problems we consider
2/21
Ability to access various data sources Data preprocessing capability Integration of different techniques Ability to operate on large datasets:
scalability Good data and model visualization Extensibility Interoperability with other systems Active development community Cost
![Page 3: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/3.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Importance of data mining
3/21
What is data mining? Data Mining is used for:
competition analysis market research economical trends consume behavior industry research
“One of the most revolutionary developments”
![Page 4: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/4.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
The future of data mining
“One of 10 technologies that will change the world”
Factors that affect growth of data mining: The explosive growth in data collection The storing of the data in data warehouses The availability of increased access to data from Web Wish to increase market share in a globalized
economy Off-the-shelf commercial data mining software Growth in computing power and storage capacity
4/21
![Page 5: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/5.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Tanagra
5/21
Data source aspect: weak No support for JDBC, Access, MySQL, Oracle,CSV Only medium data set size can be dealed with No support for Linux, MacOS.
Functionality aspect Data and model visualisation at a very low level
Usability aspect Human Interaction: manual No interoperability Low extensibility
![Page 6: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/6.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Rapid miner (YALE)
6/21
Data source aspect: Does not support ODBC and Access data
sources Usability aspect:
Does not support PMML Very little guidance in the data mining process Reported bugs by users
Data source characteristics
Usability characterstics
![Page 7: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/7.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Weka
7/21
Data source aspect: Does not support Excel, Access,ODBC,MySQL,Oracle
Functionality aspect:Supports most required algorithms It is not capable of multi-relational data
mining Usability aspect:
Does not support PMMLExtensibility allowed – a plus
![Page 8: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/8.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Knime as a solution
Better than others because: Uses simple and intuitive GUI Easy node configuration and
execution Based on Eclipse platform Many relevant examples Useful help – node description Good for begginers
8/21
![Page 9: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/9.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Integration of various Python,R,Perl,Java snippets Portability – PMML, XML KNIME Cluster Execution – gain in performance
KNIME allows users to: visually create data flows selectively execute analysis steps inspect results
Originality
9/21
![Page 10: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/10.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Time is on Knime’s side
More and more companies use it Intensive development of new SW
features KNIME Enterprise Server KNIME Cluster execution Open source – easily extensible Modules for text and image processing
10/21
![Page 11: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/11.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
11/21
Lista svih projekata
Radna površina trenutno
aktivnog projektaDetaljan opis selektovanog
čvora
Lista dostupnih projekata na
serveru
Konzola na kojoj se vide
obaveštenja i greške u projektu
Lista svih postojećih čvorova
grupisanih po funkcionalnosti
Paleta osnovnih funkcionalnosti
![Page 12: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/12.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
12/21
Da biste otvorili novi projekat iz
menija File izaberite New
Izaberite New KNIME Project i kliknite Next
Unesite ime projekta i kliknite
Finish
![Page 13: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/13.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
13/21
Kliknite na Browse da odaberete putanju do
fajla
Posle definisanja ulaznog fajla čvor prelazi u stanje ready
Izvršavanje čvora prelazi u treće
stanje
![Page 14: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/14.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
14/21
Posle povezivanja čvor je spreman
za izvršenje
Po izvršenju čvora dodaje se nova kolona u tabeli
Document
![Page 15: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/15.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
15/21
Vrsi se odabir kolona koje zelimo da filtriramo
![Page 16: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/16.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Example
16/21
Broj redova se smanjio usled filtracije
![Page 19: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/19.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
Conclusion
19/21
Data mining is not an automated processData mining needs appropriate SW toolsFrequently more than one SWKnime is an effective solution for
educational purposes Lot of space for improvements in: Supporting various data sources Providing high performance data mining Providing more domain-specific techniques Better support for business application
![Page 20: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/20.jpg)
Q & A
Do you have any questions?
20/21
Stefan Jakšić - [email protected] Ivanović - [email protected]
![Page 21: Stefan Jakšić - jaksamoowe@gmail.com; Nenad Ivanović - nenadpeuau@gmail.com 1/21 Department of Computer Science School of Electrical Engineering University.](https://reader038.fdocuments.in/reader038/viewer/2022102707/56649d825503460f94a6795a/html5/thumbnails/21.jpg)
Stefan Jakšić - [email protected]; Nenad Ivanović - [email protected]
References
[1] Daniel T. Larose , “Discovering Knowledge In Data - An Introduction to Data Mining”, Wiley-Interscience, Hoboken, New Jersey,2005.[2] www.knime.org[3] Xiaojun Chen, Yunming Ye, Graham Williams and Xiaofei Xu, “A Survey of Open Source Data Mining Systems” ,Shenzhen Graduate School, Shenzhen 518055, China, Harbin Institute of Technology, Australian Taxation Office, Australia,2007.[4] www.wikipedia.org[5] Ela Hunt, “Workflow management:motivation and vision“, The Swiss Initiative in Systems Biology,2010[6] RapidMiner 5.0 User Manual
21/21