Big Data - Fast Machine Learning at Scale + Couchbase

21
Fast Machine Learning with by Fujio Turner @FujioTurner

Transcript of Big Data - Fast Machine Learning at Scale + Couchbase

Page 1: Big Data - Fast Machine Learning at Scale + Couchbase

Fast Machine Learning with

by Fujio Turner

@FujioTurner

Page 2: Big Data - Fast Machine Learning at Scale + Couchbase

Current & Future ProblemsChurn Prediction Truth and Veracity

Recommendations Online Advertisement

News Aggregation

Scalability

Content Discovery/Search

Intelligent Learning Machine Learning for Medicine

Source: Abhishek Shivkumar

Page 3: Big Data - Fast Machine Learning at Scale + Couchbase

LexisNexis is a provider of legal, tax, regulatory, news, business information, and analysis to legal, corporate, government,

accounting and academic markets.

LexisNexis has been in business since 1977 with over 30,000 employees worldwide. 

What is HPCC Systems?Who is ?

LexisNexis Risk is the division of the LexisNexis which focuses on data, Big Data processing, linking and vertical expertise and supports HPCC Systems as an open source project under Apache 2.0 License.

http://hpccsystems.com/

Page 4: Big Data - Fast Machine Learning at Scale + Couchbase

ProblemsData from 10,000+ Different Source

Different Needs for the Data

Different Levels of Proficiency

Lots of Data

Page 5: Big Data - Fast Machine Learning at Scale + Couchbase

Different Needs for the Data

Different Levels of Proficiency

Alot of Data

Normalized / Denormalized Structured / Unstructured

Data from 10,000+ Different Source

DEDUP, JOIN , INDEX , COUNT , REGEX, K-Means

BETWEEN, GROUP, CASE, Custom

1 Easy Language (ECL) or

SQL , R , JAVA , Python , C++, SAS

Reliable Data Distribution & Processing System that scales to exabytes+

Solutions

Page 6: Big Data - Fast Machine Learning at Scale + Couchbase

Machine Learning Built-in

Regression Linear Regression Classification Naive Bayes Perceptron Decisions Trees Logistic Regression Clustering K-Means KD Trees Agglomerative/Hierarchical Association Analysis AprioriN EclatN Rules

http://hpccsystems.com/ml

Michael Payne ,of Clemson University, on high speed machine learning with PB-BLAS in HPCC Systems.

http://youtu.be/s_HWlMwi6iI

Page 7: Big Data - Fast Machine Learning at Scale + Couchbase

“I’m sub-second fast.”

“I can query all or part of your

data.”

Thor Roxie

Single Threaded Hard Disk

Index(optional)

Multi-Threaded Hard Disk

Index(optional) In-memory

SSD

Either/Both

Cluster Architecture

Page 8: Big Data - Fast Machine Learning at Scale + Couchbase

Sort

Count

Group

Classification

(ROXIE) 0.27 seconds to (THOR) few hours

Country = ‘US’

Join

Index of ~/facebook_2013

Query is Completed in a Single JobAsynchronously

~/facebook_2013

Country = ‘US’

~/twitter_2013

SORTGROUPDEDUPJOINMERGEBETWEENLENGTHREGEXROUNDSUMCOUNTTRIMWHENAVECASENORMALIZEDENORMALIZEK-MEANSmore ….

+

Page 9: Big Data - Fast Machine Learning at Scale + Couchbase

http://www.youtube.com/watch?v=8SV43DCUqJg

Watch how to install HPCC Systems in 5 Minutes

Download HPCC Systems Open Source

Community Edition

or

Source Codehttps://github.com/hpcc-systems

http://hpccsystems.com/download/

Page 10: Big Data - Fast Machine Learning at Scale + Couchbase

+

Common Big Data Setup

Page 11: Big Data - Fast Machine Learning at Scale + Couchbase

What is Couchbase ?

Open Source

Page 12: Big Data - Fast Machine Learning at Scale + Couchbase

Memcached Built-InWhat is Couchbase ?

Open Source

Page 13: Big Data - Fast Machine Learning at Scale + Couchbase

Memcached Built-In w/ ReplicasWhat is Couchbase ?

Open Source

Page 14: Big Data - Fast Machine Learning at Scale + Couchbase

Memcached Built-InFlexible Schema (JSON)

w/ ReplicasWhat is Couchbase ?

Open Source

Page 15: Big Data - Fast Machine Learning at Scale + Couchbase

Memcached Built-In

Key/Value & DistributedFlexible Schema (JSON)

Cross Data Center Replication

w/ ReplicasWhat is Couchbase ?

Open Source

Page 16: Big Data - Fast Machine Learning at Scale + Couchbase

Memcached Built-InFlexible Schema (JSON)

SQL++ (N1QL)

w/ ReplicasWhat is Couchbase ?

Key/Value & DistributedCross Data Center Replication

Open Source

Page 17: Big Data - Fast Machine Learning at Scale + Couchbase

+

Sub-MillisecondSQL++(N1QL)

JSON

Distributed & Reliable

Distributed & Reliable

1 Language

Flexible Data Types

Ready for the Future

XDCR

Page 18: Big Data - Fast Machine Learning at Scale + Couchbase

Couchbase Mobile

.

.

.

.

.

Embedded JSON NoSQL Database

Page 19: Big Data - Fast Machine Learning at Scale + Couchbase

.

.

.

.

.

+ Sync Data Online / OfflineEmbedded JSON NoSQL Database

+ Sync & Channel Data Peer-To-Peer+ Sync Data Peer-To-Peer (directly)

Couchbase Mobile

Page 20: Big Data - Fast Machine Learning at Scale + Couchbase

Couchbase Mobile + HPCC Systems

.

.

.

.

.

Process & Store Data to Scale

Page 21: Big Data - Fast Machine Learning at Scale + Couchbase

INSTALL in 5 Minutes

Download

Source Code

Learning More - Couchbase Server & Lite

http://couchbase.com/download

https://github.com/couchbase

Mountain View, CA San Francisco ,CA

https://www.youtube.com/ user/CouchbaseVideo