Big Data - Fast Machine Learning at Scale + Couchbase
-
Upload
fujio-turner -
Category
Technology
-
view
322 -
download
1
Transcript of Big Data - Fast Machine Learning at Scale + Couchbase
Fast Machine Learning with
by Fujio Turner
@FujioTurner
Current & Future ProblemsChurn Prediction Truth and Veracity
Recommendations Online Advertisement
News Aggregation
Scalability
Content Discovery/Search
Intelligent Learning Machine Learning for Medicine
Source: Abhishek Shivkumar
LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,
accounting and academic markets.
LexisNexis has been in business since 1977 with over 30,000 employees worldwide.
What is HPCC Systems?Who is ?
LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.
http://hpccsystems.com/
ProblemsData from 10,000+ Different Source
Different Needs for the Data
Different Levels of Proficiency
Lots of Data
Different Needs for the Data
Different Levels of Proficiency
Alot of Data
Normalized / Denormalized Structured / Unstructured
Data from 10,000+ Different Source
DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means
BETWEEN, GROUP, CASE, Custom
1 Easy Language (ECL) or
SQL , R , JAVA , Python , C++, SAS
Reliable Data Distribution & Processing System that scales to exabytes+
Solutions
Machine Learning Built-in
Regression Linear Regression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules
http://hpccsystems.com/ml
Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems.
http://youtu.be/s_HWlMwi6iI
“I’m sub-second fast.”
“I can query all or part of your
data.”
Thor Roxie
Single Threaded Hard Disk
Index(optional)
Multi-Threaded Hard Disk
Index(optional) In-memory
SSD
Either/Both
Cluster Architecture
Sort
Count
Group
Classification
(ROXIE) 0.27 seconds to (THOR) few hours
Country = ‘US’
Join
Index of ~/facebook_2013
Query is Completed in a Single JobAsynchronously
~/facebook_2013
Country = ‘US’
~/twitter_2013
SORTGROUPDEDUPJOINMERGEBETWEENLENGTHREGEXROUNDSUMCOUNTTRIMWHENAVECASENORMALIZEDENORMALIZEK-MEANSmore ….
+
http://www.youtube.com/watch?v=8SV43DCUqJg
Watch how to install HPCC Systems in 5 Minutes
Download HPCC Systems Open Source
Community Edition
or
Source Codehttps://github.com/hpcc-systems
http://hpccsystems.com/download/
+
Common Big Data Setup
What is Couchbase ?
Open Source
Memcached Built-InWhat is Couchbase ?
Open Source
Memcached Built-In w/ ReplicasWhat is Couchbase ?
Open Source
Memcached Built-InFlexible Schema (JSON)
w/ ReplicasWhat is Couchbase ?
Open Source
Memcached Built-In
Key/Value & DistributedFlexible Schema (JSON)
Cross Data Center Replication
w/ ReplicasWhat is Couchbase ?
Open Source
Memcached Built-InFlexible Schema (JSON)
SQL++ (N1QL)
w/ ReplicasWhat is Couchbase ?
Key/Value & DistributedCross Data Center Replication
Open Source
+
Sub-MillisecondSQL++(N1QL)
JSON
Distributed & Reliable
Distributed & Reliable
1 Language
Flexible Data Types
Ready for the Future
XDCR
Couchbase Mobile
.
.
.
.
.
Embedded JSON NoSQL Database
.
.
.
.
.
+ Sync Data Online / OfflineEmbedded JSON NoSQL Database
+ Sync & Channel Data Peer-To-Peer+ Sync Data Peer-To-Peer (directly)
Couchbase Mobile
Couchbase Mobile + HPCC Systems
.
.
.
.
.
Process & Store Data to Scale
INSTALL in 5 Minutes
Download
Source Code
Learning More - Couchbase Server & Lite
http://couchbase.com/download
https://github.com/couchbase
Mountain View, CA San Francisco ,CA
https://www.youtube.com/ user/CouchbaseVideo