Alex Cheng of Baidu: "Big Data: A New Frontier"

13
Big Data: A New Frontier Alex Cheng, VP Baidu 2013-4-12

TAGS:

description

 

Transcript of Alex Cheng of Baidu: "Big Data: A New Frontier"

Page 1: Alex Cheng of Baidu: "Big Data: A New Frontier"

Big Data: A New Frontier

Alex Cheng, VP Baidu 2013-4-12

Page 2: Alex Cheng of Baidu: "Big Data: A New Frontier"
Page 3: Alex Cheng of Baidu: "Big Data: A New Frontier"

5 billion+ Search Queries

~4 million Posts on PostBar

~500 million Users

100 million+ Mobile Search Users

~500,000 Business

Clients

Everyday

at

Page 4: Alex Cheng of Baidu: "Big Data: A New Frontier"

Storage  

Processing  

Analy1cs  &  

Predic1on  

Data  Intelligence  Volume  

 Velocity  

 Variety    

 Value  

Page 5: Alex Cheng of Baidu: "Big Data: A New Frontier"

Web  Pages  &  Links  100+  PB   Logs  100+  PB  UGC  1  PB  

Web  

News  

PostBar   Encyclopedia  

Knows  

Searches,  Clicks,  Posts  etc.  

1 petabyte = 2x National Library of China

Page 6: Alex Cheng of Baidu: "Big Data: A New Frontier"

Logs  100+  PB  

UGC  1+  PB  

2005

2006

2007

20

08

2009

20

10

2011

20

12

100  PB   100  PB   100  PB  

100  PB   100  PB   100  PB   100  PB   100  PB  

•  95%  of  the  data  was  created  within  the  last  3  years  

•  100  PB  of  new  data  is  processed  everyday  

100  PB   100  PB   100  PB   100  PB   100  PB  

100  PB   100  PB   100  PB   100  PB  100  PB  

100  PB  100  PB  

Growth  :  100%+  YoY  

Page 7: Alex Cheng of Baidu: "Big Data: A New Frontier"

Hardware Innovations

•  Custom ARM-based

Servers

•  Gigabit Switches

•  Custom SSD/Flash Storage

TCO -25% Density +70%

PUE 1.18 / 1.37 (#1) Non-cooling hours 48%

Custom Rack Uptime Efficiency 10x

Performance 2x Cost -48%

Page 8: Alex Cheng of Baidu: "Big Data: A New Frontier"

Baidu Cloud IDC Yangquan, Shanxi, China

Page 9: Alex Cheng of Baidu: "Big Data: A New Frontier"

Software Innovations

•  Global Optimization •  Multiple Replication •  Data Distribution •  Partial Update

MONOLITHIC HW

TRADITIONAL RELATIONAL DATABASE

DIRECT RECORD ACCESS OR QUERIES

TRADITIONAL  SERVER  STACK  

MAPREDUCE

NOSQL DATABASE

PARALLEL RELATIONAL DATABASE

HADOOP

DISTRIBUTED HARDWARE

NEW  SERVER  STACK  

Page 10: Alex Cheng of Baidu: "Big Data: A New Frontier"

•  Real-time online learning •  Tens of billions training

samples •  Billions of complex features

Feature extraction

Model Training Models

Query Advanced

Search Module

CTR-server

Logs

Offline

Online

Big  Data   +   Web  Search  

Page 11: Alex Cheng of Baidu: "Big Data: A New Frontier"

•  Real-­‐Rme  DicRonary  Updates  •  Dynamic  Result  Modeling  •  High-­‐frequency  Inputs  

RecommendaRon      

Big  Data   +   IME  

User Input

NLP Module

Consolidated Search Result

On-Device Quick

Search

Cloud-based

Dictionary

Device-based

Dictionary

Output

Page 12: Alex Cheng of Baidu: "Big Data: A New Frontier"

Voice

Images

•  10+ Billions Training Examples •  Heterogeneous Features •  Intensive Computing

Deep Learning

Page 13: Alex Cheng of Baidu: "Big Data: A New Frontier"

The  Future  of  Big  Data   “Digital  Universe”      

2009   2010   2011   2012   2013   2014   2015   2016   2017   2018   2019   2020  

20,000  

40,000  

10,000  

30,000  

exabytes Machine-generated Sensor Data “Anytime, Anywhere, Any Devices” Smartphone Smart Home Wearable Devices Smart Car … …