Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO...
Transcript of Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO...
![Page 1: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/1.jpg)
512.231.6000 - 512.231.6010 fax - www.pervasive.com
Big Data & KNIME
Michael Hoskins, CTO Pervasive Software
KNIME User Conf, Zurich, 1 February 2012
![Page 2: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/2.jpg)
Big Data and the Digital Data Revolution
• Every two days we create as much
information as we did from the
dawn of civilization until 2003 – Eric Schmidt, Google, 2010
2
![Page 3: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/3.jpg)
How Big? Surging to Exabytes
3
![Page 4: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/4.jpg)
Data Inflation
4
![Page 5: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/5.jpg)
• Where is all this Big Data
coming from?
5
![Page 6: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/6.jpg)
The Internet is a Driver
6
![Page 7: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/7.jpg)
The Real Culprit: an Internet of Things aka: Machine Generated Data
7
![Page 8: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/8.jpg)
• What to do with all this Big
Data?
8
![Page 9: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/9.jpg)
9
Analyze it!
![Page 10: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/10.jpg)
Using Machine Learning Techniques
• Association rule learning
• Classification
• Cluster analysis
• Crowdsourcing
• Data fusion and data integration
• Data mining
• Ensemble learning
• Genetic algorithms
• Natural language processing (NLP)
• Neural networks
• Network analysis
• Optimization
• Pattern recognition
10
•Predictive modeling
•Regression
•Sentiment analysis
•Signal processing
•Spatial analysis
•Statistics
•Supervised learning
•Simulation
•Time series analysis
•Unsupervised learning
•Visualization
![Page 11: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/11.jpg)
To Predict the Future
11
![Page 12: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/12.jpg)
• What does Big Data mean to
you and KNIME?
12
![Page 13: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/13.jpg)
Big Data means a new Data Science
13
![Page 15: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/15.jpg)
• What is Pervasive doing
about this?
15
![Page 16: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/16.jpg)
Introducing Pervasive DataRush™
16
DataRush is a parallel dataflow platform that eliminates performance bottlenecks in your data-intensive applications
• Scalable: Performance dynamically scales with increased core/server
counts. No change to the code.
• High Throughput: Patented parallel dataflow technology enables fast,
deep analysis of large data sets with no limit on input data size.
• Cost Efficient: Fully exploit commodity multicore servers – save
significant capital and energy costs via efficient node utilization.
• Easy to Implement: DataRush takes care of complex parallel
processing issues at design time: hides threading complexity; no
deadlocks; runs on any platform – including Hadoop; etc..
• Extensible: DataRush is a component-based platform with an open API
so you can easily extend it for your own needs.
![Page 17: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/17.jpg)
Pervasive DataRush Plug-in for KNIME
17
DataRush
Plug-Ins
Drag and Drop
High performance
nodes
DataRush
for
KNIME
Predictive
Analytics
![Page 18: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/18.jpg)
Genomic Analysis: Align and Assemble
18
![Page 19: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/19.jpg)
Scalable Predictive Analytics
19
![Page 20: Big Data & KNIME · 512.231.6000 - 512.231.6010 fax - Big Data & KNIME Michael Hoskins, CTO Pervasive Software KNIME User Conf, Zurich, 1 February 2012 . Big Data and the Digital](https://reader033.fdocuments.in/reader033/viewer/2022042006/5e6fcec250dc3067576a69de/html5/thumbnails/20.jpg)
Demo of Big Data in DataRush for KNIME
• KNIME with distributed (nextgen v6)
DataRush, reading >120m historical airline
flight records at scale, from native HDFS on
our test Hadoop cluster; performing a
Linear Regression and Visualization.
Runtime = 47 seconds!
20