Big data (overview) - (MOSG)

Big data - Overview -

2016/03/04 Mulodo Vietnam Co., Ltd.

“Big data”

Types Science :

LHC: Large Hadron Collider

Medical : Gene analysis

Market (IT?): Business use

What is “Big data”?

Types Science :

LHC: Large Hadron Collider

Medical : Gene analysis

What is “Big data”?

Market (IT?): Business use

History of Data processing

50’s - “BI : Business Intelligence” (1958) 80’s - “DSS : Decision support system” (80’s) - “SQL86” (1986) - “Knowledge Discovery in Databases” (1989) - “BI (Redefinition)” (1989) 90’s - “Data Warehouse” (1990) - “OLAP: online analytical processing” (1993) - “Improvement of computing power” (90’s) - “Price reduction of storage” (90’s) - “Data Mining” (1996)

History of Data processing2000’s - “Spread of The Internet” (00’s) - ‘Google: Big data stack 1.0’ (00’s) - “MapReduce framework” (2004) - “Independence of Hadoop project from Nutch” (2006) - “Amazon: S3” (2006) - “Explosive prosperity of EC” (00’s)

2010’s - “Big data” in ‘The Economist(UK)’ (2010) - “Google: BigQuery” (2010) - “fluentd” (2011) - “Amazon: Redshift” (2012) - “DMP: data management platform” (10’s) - “Google: Big data stack 2.0-3.0” (10’s) - “Apache crunch, Implara, Prest,...” (10’s)

80's 90's 00's 10's

Let's look back on the history of Big data

(Especially storage and query engine)

80's 90's 00's 10's

SQL(86)

Easy to use, structured/ruled.

independent from storage

80's 90's 00's 10's

Map Reduce

SQL(86)

big data stack/GFS

use HUGE data batch like process (for huge logs)

But, Proprietary

Too Huge to treat on usual RDBMS

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

HBaseOpen source products!

We need source. We love freedom.

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

Easy to useE-commerce require huge data analysis.

M/R is too heavy to use......

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

pig Hive SQL -> (M/R) -> Result

Pig Original language <=> (M/R)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

SQL(86)

Hadoop

big data stack/GFS

Dremel

Google announced Dremel

for interactive analysis

of huge data

BigQuery

We want analyze huge data interactively.

80's 90's 00's 10's

Map Reduce

big data stack/CFS

SQL(86)

Hadoop

big data stack/GFS

Dremel

BigQuery

Dremel 1. divide SQL for shards 2. process them in parallel.

It’s Not a wrapper of M/R, but process SQL super parallel. (ie. full scan for each query with thousands servers w/o index)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

DremelPrestoImpala

pigOpen source products!

We need source. We love freedom.

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

DremelPrestoImpala

Add social circumstances on this figure.

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

HBaseHDFS

DremelPrestoImpala

RedshiftS3

DWHDataMining

BI BIDSS

computing powerImprovement of

StoragePrice reduction of Spread of The Internet

Explosive prosperity of EC

Many requests Many solutions...

But you can think which solution is better for your project. (I hope)

How to use Big dataA) How to aggregate data? - huge amount of data - too high frequency data

B) How to maintenance data? - Data will increase.... - Query engine cost, Storage cost. - Data check cost

C) How to analyze data? (what for?) - UI / UX — Understanding of business requirements

How to aggregate data<Libevent shock> parallel -> event driven. * similar to “parallel -> USB” Fluentd - Async - (Puseudo) realtime <-> Periodic Batch

other - logstash - Lamda and Kinesis (AWS) - ...

How to analyze dataUI / UX <solution set for log monitering> * ELK : logstash + Elastic search + Kibaa

* Fluentd + Norikra + GrowthForecast

Next : * Trying some storage

* Trying to build system design

* Diving to some solutions

Big data (overview) - (MOSG)

Technology

Transcript of Big data (overview) - (MOSG)

big data overview ppt

CI : the first_step: Auto Testing with CircleCI - (MOSG)

An Introduction of Big data; Big data for beginners; Overview of Big Data; Big data Tutorial

Overview of Big (Geospatial) Data Concepts and Technologies Data Workshop - Final... · Overview of Big (Geospatial) Data Concepts and Technologies ... + Overview of Big Geospatial

Big 12 Overview - Big 12 Conference

Hadoop and Big Data Overview

Big Geography - Overview Presentation

Big Data Executive Overview

Big Data Overview

Big Data and Hadoop Overview

THE BIG SKY OVERVIEW

HPE Big Data Platform_TA3 overview

BIG IoT Project Overview

Overview - IBM Big Data Platform

McKinsey Big Data Overview

DevOps(2) : Vagrant - (MOSG)

Big Fix Architectural Overview

Recruitment Overview By Big Pharma

Big Data Hadoop (Overview)

Big query the first step - (MOSG)