Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME...
Transcript of Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME...
![Page 1: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/1.jpg)
Copyright © 2015 KNIME.com AG
Harnessing Big Data with KNIME
Tobias Kötter
KNIME.com
![Page 2: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/2.jpg)
Copyright © 2015 KNIME.com AG
Agenda
• The three V’s of Big Data
• Big Data Extension and Databases Nodes
• Demo
2
![Page 3: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/3.jpg)
Copyright © 2015 KNIME.com AG
Variety, Volume, Velocity
Variety:• integrating heterogeneous data (and tools)
Volume:• from small files...
• ...to distributed data repositories (Hadoop)
• bring the tools to the data
Velocity:• from distributing computationally heavy
computations...
• ...to real time scoring of millions of records/sec.
3
![Page 4: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/4.jpg)
Copyright © 2015 KNIME.com AG 4
Variety
![Page 5: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/5.jpg)
Copyright © 2015 KNIME.com AG
Variety
• Data Integration– Small (Ascii)
– Proprietary (XLS, SAS...)
– Medium (Databases)
– Large (Hive, Impala, ParStream, HP Vertica...)
– Diverse (Numbers, Texts, Images, Networks, Sequences...)
• Tool Integration– Native
– Legacy, Inhouse
– R, Python, Matlab, ...
![Page 6: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/6.jpg)
Copyright © 2015 KNIME.com AG 6
Volume
![Page 7: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/7.jpg)
Copyright © 2015 KNIME.com AG
Every Minute…
7
![Page 8: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/8.jpg)
Copyright © 2015 KNIME.com AG
IoT
8
![Page 9: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/9.jpg)
Copyright © 2015 KNIME.com AG
Big Data Support
• KNIME Database Nodes
– in database processing
– preconfigured connectors
• KNIME Big Data Extension
– package required drivers/libraries for specific HDFS, Hive, Impala access
• Spark MLlib integration (coming soon)
![Page 10: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/10.jpg)
Copyright © 2015 KNIME.com AG 10
Velocity
![Page 11: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/11.jpg)
Copyright © 2015 KNIME.com AG
Velocity
• Computationally Heavy Analytics:
– Distributed Execution of one workflow branch
– Parallel Execution of workflow branches
• Hosted Analytics/Prediction
– Web service Deployment of Workflows
• High Demand Scoring/Prediction:
– Continuous Execution of Workflow parts
– High Performance Scoring using generic Workflows
– High Performance Scoring of Predictive Models
![Page 12: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/12.jpg)
Copyright © 2015 KNIME.com AG
KNIME Cluster Execution: Distributed Data
![Page 13: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/13.jpg)
Copyright © 2015 KNIME.com AG
KNIME Cluster Execution: Distributed Analytics
![Page 14: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/14.jpg)
Copyright © 2015 KNIME.com AG
Deployed Workflows
Application Access• Custom API• WSDL/SOAP based
![Page 15: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/15.jpg)
Copyright © 2015 KNIME.com AG
Continuous Scoring using Workflows
• Exposes workflow fragment as RESTful web service
• Deployed on KNIME Server (v4.0 – 1H2015)
![Page 16: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/16.jpg)
Copyright © 2015 KNIME.com AG
High Performance Scoring via Workflows
• Streaming Executor
• Deployed via KNIME Server (v4.1 – 2H2015/2016)
• Record (or small batch) based processing
• Exposed as RESTful web service
![Page 17: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/17.jpg)
Copyright © 2015 KNIME.com AG
High Performance Scoring using Models
• Deployed on KNIME Server (v4.0 – 1H2015)
• KNIME PMML Scoring via compiled PMML
• Exposed as RESTful web service
• Partnership with Zementis
– ADAPA Real Time Scoring
– UPPI Big Data Scoring Engine
![Page 18: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/18.jpg)
Copyright © 2015 KNIME.com AG
Big Data, IoT, and the three V
Variety:– KNIME inherently well-suited: open platform
– broad data source/type support
– extensive tool integration
Volume:– Big Data Extensions cover Hadoop based data integration and
aggregation
– Big Data Executors will address model building and streaming execution
Velocity:– Distributed Execution of heavy workflows to...
– High Performance Scoring of predictive models.
![Page 19: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/19.jpg)
Copyright © 2015 KNIME.com AG 19
Big Data Extension and Database Nodes
![Page 20: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/20.jpg)
Copyright © 2015 KNIME.com AG
Database Port Types
20
Database JDBC Connection Port (light red)• Connection information
Database Connection Port (dark red)• Connection information• SQL statement
Database Connection Ports can be connected to
Database JDBC Connection Ports but not vice versa
![Page 21: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/21.jpg)
Copyright © 2015 KNIME.com AG
Database JDBC Connection Port View
21
![Page 22: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/22.jpg)
Copyright © 2015 KNIME.com AG
Database Connection Port View
22
Copy SQL statement
![Page 23: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/23.jpg)
Copyright © 2015 KNIME.com AG
Database Connectors
• Nodes to connect to specific Databases – Bundling necessary JDBC drivers
– Easy to use
– DB specific behavior/capability
• Hive and Impala connector part of the commercial Big Data Extension
• General Database Connector– Can connect to any JDBC source
– Register new JDBC driver viapreferences page
23
![Page 24: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/24.jpg)
Copyright © 2015 KNIME.com AG
Register JDBC Driver
24
Open KNIME and go toFile -> Preferences
Increase connection timeout forlong running database operations
![Page 25: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/25.jpg)
Copyright © 2015 KNIME.com AG
Reader/Writer
• Table selection
• Load data into KNIME
• Create table as select
• Insert/append data
• Delete rows from table
• Update values in table
25
![Page 26: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/26.jpg)
Copyright © 2015 KNIME.com AG
Hive/Impala Loader
26
• Upload a KNIME data table to Hive/Impala
• Part of the commercial Big Data Extension
![Page 27: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/27.jpg)
Copyright © 2015 KNIME.com AG
Hive/Impala Loader
27
Partitioning influencesperformance
![Page 28: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/28.jpg)
Copyright © 2015 KNIME.com AG
Manipulation
• Filter rows and columns
• Join tables/queries
• Sort your data
• Write your own query
• Aggregate your data
28
![Page 29: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/29.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – DB Specific Aggregation Methods
29
SQLite 7 aggregation functions
PostgreSQL 25 aggregation functions
![Page 30: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/30.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – Aggregation Method Description
30
![Page 31: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/31.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – Manual Aggregation
31
Returns number of rows per group
![Page 32: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/32.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – Pattern Based Aggregation
32
Tick this option if the search pattern is a
regular expression otherwise it is treated
as string with wildcards ('*' and '?')
![Page 33: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/33.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – Type Based Aggregation
33
Matches all cells
Matches all numericcells
![Page 34: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/34.jpg)
Copyright © 2015 KNIME.com AG
Database GroupBy – Custom Aggregation Function
34
![Page 35: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/35.jpg)
Copyright © 2015 KNIME.com AG
Utility
• Drop table
– missing table handling
– cascade option
• Execute any SQL statement e.g. DDL
• Manipulate existing queries
35
Executes severalqueries separatedby ; and new line
![Page 36: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/36.jpg)
Copyright © 2015 KNIME.com AG
In-Database Processing
36
Loads your pre-processeddata into KNIME
![Page 37: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/37.jpg)
Copyright © 2015 KNIME.com AG
HDFS File Handling
• KNIME & Extensions -> KNIME File Handling Nodes
• HDFS Connection and HDFS File Permission nodes part of the commercial Big Data Extension
37
![Page 38: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/38.jpg)
Copyright © 2015 KNIME.com AG
HDFS File Handling
38
![Page 39: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/39.jpg)
Copyright © 2015 KNIME.com AG
Virtual Machines
• Hortonworks:
http://hortonworks.com/products/hortonworks-sandbox/
• Cloudera:http://www.cloudera.com/content/cloudera/en/downloads/quickstart_vms.html
• Virtual Box
https://www.virtualbox.org/
• VMWare Player
https://www.virtualbox.org/
39
![Page 40: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/40.jpg)
Copyright © 2015 KNIME.com AG 40
Demo
![Page 41: Harnessing Big Data with KNIME · 2017-05-23 · Big Data, IoT, and the three V Variety: –KNIME inherently well-suited: open platform –broad data source/type support –extensive](https://reader035.fdocuments.in/reader035/viewer/2022070711/5ecb1736175edb27d35fd0d3/html5/thumbnails/41.jpg)
Copyright © 2015 KNIME.com AG
Resources
• KNIME (www.knime.org)• BLOG for news, tips and tricks(www.knime.org/blog)
• FORUM for questions and answers (tech.knime.org/forum)
• EXAMPLE SERVER for example workflows
• LEARNING HUB (www.knime.org/learning-hub)
• KNIME TV channel on
• KNIME on @KNIME
• KNIME on
https://www.facebook.com/KNIMEanalytics
41