Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring&...

37
Large Scale Mapreduce Data Processing at Quantcast 11/5/10 © 2010 Quantcast, Think Big AnalyCcs Ron Bodkin, Think Big AnalyCcs [email protected]

Transcript of Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring&...

Page 1: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Large  Scale  Mapreduce  Data  

Processing  at  Quantcast  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  

Ron  Bodkin,  Think  Big  AnalyCcs  [email protected]  

Page 2: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Outline  

•  An  Internet-­‐scale  business  

•  ApplicaCons  

•  Data  

•  Infrastructure  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  2  

Page 3: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

The  Personalized  Media  Economy  

Online  AdverCsing  Ecosystem  

Media  is  transiConing  from  a  “one  size  fits  all”  broadcast  model  to  dynamic  real-­‐Cme  choice  

©  2010  Quantcast,  Think  Big  AnalyCcs  3   11/5/10  

Page 4: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Publishers  

Online  AdverCsing  Ecosystem  

Marketers  

Media  Agencies   Demand  Side  PlaTorms  Trading  Desks  

Exchanges   Sell  Side  PlaTorms  Networks  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  4  

Page 5: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Globally,    hundreds  of  billions  of  

dollars  of  ad  spend  will  shiZ  

Money  Follows  Media  ConsumpCon  

$30B  opportunity  

?

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  5  

Page 6: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Enter  Quantcast  

•  Launched  September  2006  to  enable  addressable  adverCsing  at  scale  

•  First  we  had  to  fix  audience  measurement  

•  Launched  a  free  service  based  on  direct  measurement  of  media  consumpCon  

•  Use  machine  learning  to  infer  audience  characterisCcs  

t  ©  2010  Quantcast,  Think  Big  AnalyCcs  6   11/5/10  

Page 7: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Industry  AdopCon  of  Quantcast  World’s  Favorite  Audience  Measurement  Service  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  7  

Page 8: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

24x7 availability

latency

switch fabric  

procurement  

constant hardware failures

BGP engineering

monitoring  

alerting

bare-metal imaging

rollback  

logistics

recoverability/retry

scheduling

distributed profiling

data licensing

security

development APIs

IDEs, (online query & large batch)

data schema evolution

compression  

provenance  metadata management source control  

control systems privacy ip protection

capacity management

system visibility

auditing

data collection

storage

global operations  

maintenance tasks  

recoverability

software deployment  

metrics  

disaster recovery  

business continuity

regression testing  

branch management  

server specification  

Numerous  Challenges  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  8  

Page 9: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Data  Rich  Environment  

9  

300+ Billion / month

1.2 Billion

Every Day

Media consumption events

Internet users / month Millions of Sites Continually measured

Avg. Processing Load

300,000 Transactions / second

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  

Page 10: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Response  Times  

Milliseconds   Days  

Measurement  Profiles  

Petabytes  

Terabytes  

Gigabytes  

Megabytes  

Kilobytes  

Lookalike  Scoring  

Live  Traffic  Modeling  

Per  calculaCon   Lookalike  Modeling  

Planner  Query  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  10  

Page 11: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Organic  Data  Growth  Daily  Data  Volumes  

0

1

2

3

4

5

6

7

8

9

10

Apr '07 Jul '07 Oct '07 Jan '08 Apr '08 Jul '08 Oct '08 Jan '09 Apr '09 Jul '09 Oct '09 Jan '10 Apr '10 Jul '10

Bill

ions

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  11  

Page 12: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Divide  and  Conquer  Quantcast  FuncConal  Teams  

Publisher  

Pipeline  

Edge  

Marketer  

Inference  

Cluster  

Planner  

Lookalikes  

Ops  

Audience  Match  

Live  Traffic  Modeling  

Security  

Eng    

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  12  

Page 13: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Lookalike  Pipeline  

Model  

500  TB  

Scoring  

Trained  Models  

100s  of  Concurrent  Models  

10M  PotenCal  Converters  1  Billion  

Internet  Users  MulC  PB  

10  TB  /  Day  

Training  Sandbox  10,000  Converters  

Model  ConfiguraCon  

Sandbox  allows  comparisons  between  models  in  fixed  data  set    Challengers  to  the  producCon  “hero”  are  developed  in  the  sandbox,  and  can  be  rapidly  deployed  to  become  the  new  “hero”  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  13  

Page 14: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Measurement  Pipelines  

MulC  PB  

10  TB  /  Day  

StaCsCcal  Inference  

Profiles  Web  Site  

Reference  data  

Tsunami  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  14  

Page 15: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Live  Traffic  Modeling  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  

Bid  Agent  

RealCme  Opps  (1  MM/s)    

Refresh  O(hours)  

Refresh  O(ms)  

<20  ms  

Exchanges,  Publishers

…  Lookalike  Pipelines  

Short  Term  State  

15  

Page 16: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Technology  Stack  

HDFS  and  KFS  

Scheduling,  Workflow  MRFlow  

Log  Tone  

Greenplum  RDBMS  

RMR    Data  Format  

MapReduce  

API,  Dependencies  

Hadoop  

Hadoop  +  RMR  data  path  

RMR   API  

Reports,  Web  Access,  Response  

Indexed  Files  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  16  

Page 17: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Quantcast  ReporCng  Billion  users  on  millions  of  sites  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  17  

Page 18: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Learning  ∝  experimentaCon  

©  2010  Quantcast  

2  Days   Mins  

6  Hours  

New  model  development  

New  model  in  producCon  

To  process  100TB  with  new  Map-­‐Reduce  job  

Hours  Live  performance  assessment  

2  Weeks   To  influence  billions  of  real-­‐Cme  decisions  every  day  and  millions  of  dollars  of  adverCsing  spend    

©  2010  Quantcast,  Think  Big  AnalyCcs  11/5/10  18  

Page 19: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Analyst  Tools  

•  Greenplum  SQL  Database  used  for  analysis  –  Valuable  secondary  use  for  quick  iteraCve  exploraCon  –  Allows  parallel  import  for  reasonable  speed  (this  has  been  an  achilles  heel  for  tradiConal  SQL  databases)  

–  Used  to  manipulate  tables  up  to  1-­‐2  TB  

•  OZen  hard  to  load/unload  data  quickly  at  TB  scale  •  When  evaluaCng  BI/DB  

–  make  sure  you  test  requests  that  exceed  memory  cache  –  and  run  parallel  workload  

•  EvaluaCng  Datameer,  Hive,  PIG,  Goto  Metrics  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  19  

Page 20: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

P3:  home  page  brown  

Map-­‐Reduce  101  Map  

P1:  the  quick  brown  fox  

Sort   Reduce  (quick,  1)  (brown,  1)  (fox,  1)  

P2:  hi  fox   (hi,  1)  (fox,  1)  

(home,  1)  (page,  1)  (brown,  1)  

(hi,  1)  (page,  1)  

(brown,  2)  (quick,  1)  

(fox,  2)    (home,  1)  

(fox,  1)    (fox,  1)  (home,  1)  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  

20  

Page 21: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

…Distributed  Sort  ExecuCon  

•  Map-­‐Reduce  provides  a  programming  model  

•  The  heart  is  sorCng  data  and  grouping:  “the  dash  in  map-­‐reduce”  

•  Map-­‐Reduce  allows  for  efficient  sequenCal  disk  access  in  processing  

•  Not  as  abstract  as  SQL  

•  More  flexible  than  custom  data  flows  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  21  

Page 22: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Hadoop  

•  Open  source  Java-­‐based  map-­‐reduce  –  Scheduling  –  ExecuCon  – Monitoring  –  Recovery  

•  Core  includes  a  distributed  file  system  

•  Came  from  Nutch  project  (open  source  crawler)  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  22  

Page 23: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Custom  EvoluCon  at  Quantcast  

•  RelaConal  Map/Reduce  (RMR)  and  MRFlow:  provide  producCve  API  

•  Replaced  the  Hadoop  data  path,  including  sort  

•  Need  to  get  more  processing  out  of  hardware  

•  Hadoop  project  focused  on  scaling  to  10,000  machines  first,  then  addressed  performance  –  E.g.,  Yahoo  uses  10,000  cores  for  web  map  (450  TB  sort)  

•  Push  common  paterns  down  into  the  plaTorm  11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  23  

Page 24: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Power  Law  DistribuCons  Naïve  jobs  will  take  forever    to  process  these  values   •  Throw  out  “stop  words”  

•  Cache  common  values  

•  Break  into  stages  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  24  

Page 25: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Other  Compute  Lessons  •  Scale  up  gradually  

–  starCng  locally  –  it’s  easy  to  waste  massive  resources  in  a  big  cluster  

•  Job  chains  get  complex  –  Need  for  good  workflow,  dependency  management,  scheduling  

•  Managing  data  and  tracing  creaCon  –  Always  a  problem  in  any  data  system  –  With  a  more  universal  storage  cloud,  visibility  improves  –  We  keep  metadata  and  keep  refining  APIs  to  support  

•  Anomaly  detecCon  –  controls  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  25  

Page 26: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Sidebar:  Colossal  Pipe  

•  3rd  generaCon  open  source  map/reduce  framework  

•  Built  by  ThinkBigAnalyCcs  

•  Builds  on  API  lessons  learned  from  RMR  and  MRFlow  

•  Dependency  analysis  

•  POJO-­‐style  and  fluid  programming    

•  Avro  and  JSON  object  bindings  

•  htps://github.com/ThinkBigAnalyCcs/colossal-­‐pipe  11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  26  

Page 27: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Data  Storage  

•  You  can’t  back  up  100  TB  of  cluster  data  “offline”  

•  Data  corrupCon  from  distributed  file  system  bugs  is  a  major  risk  

•  Erroneous  jobs  and  operators  are  the  next  biggest  risks  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  27  

Page 28: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Distributed  Filesystems  

•  Quantcast  runs  two  different  filesystem  implementaCons  (on  the  same  machines)  

•  KFS  –  open  source  high  performance  distributed  FS  

•  Hadoop  DFS  –  open  source  component  in  Hadoop  

•  Both  support  data  replicaCon  to  n  machines  

•  Both  are  simple:  reliability  >  complex  “features”  

•  EvaluaCng  commercial  offerings  from  MapR,  Goto  Metrics,  Appistry…  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  28  

Page 29: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Data  Access  Paterns  

•  Map-­‐reduce  leverages  the  efficiency  of  sequenCal  access  (you  can  shard  to  speed  it  up)  

•  Random-­‐access  is  appropriate  for  near  real-­‐Cme  access  

•  Indexing  typically  required  for  random  reads,  can  be  used  to  speed  up  sequenCal  reads  too  

•  Flash  (SSD)  can  be  a  great  enabler  for  random  reads  

•  Join  access  paterns  vary  depending  on  strategy  (merge,  hash,  nested  loop)  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  29  

Page 30: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

High  Performance  PlaTorm  

225  Thousand  /  Second   Real-­‐Cme  events  

2PB  /  Day  AnalyCcal  throughput  

MulCple  Global  Datacenters  Ultra-­‐high  availability  with  advanced  traffic  management  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  30  

Page 31: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Compute  as  a  UClity  

PixelTone    Availability  &  latency  of  real-­‐Cme  architecture    Ensure  self-­‐healing,  rapid  failover  of  our  mulC-­‐master  design  (no  single  point  of  failure)  

LogTone    Management  of  data  consolidaCon  –  Freshness    Have  to  ‘close  the  books’  

ClusterTone    UClizaCon,  performance,  reliability  and  efficiency  of  our  massively  parallel  distributed  processing  plaTorm  

AppTone    Performance,  latency  &  user  experience  of  all  customer  facing  applicaCons  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  31  

Page 32: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Constantly  Scaling  Infrastructure  Whatever  you  have,  its  never  enough  

st  

Average  Terabytes  Processed  Per  Day  

500

750

1,000

1,250

Q3 '09 Q4 '09 Q1 '10 Q2 '10

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  32  

Page 33: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Quantcast  Compute  Environments  

•  Thousands  of  nodes  in  13  data  centers  globally  

•  Eight  people  to  manage  and  develop  code  

•  Compute  cluster:  3,000  cores,  3PB  storage  

•  Volume  commodity  components  and  use  smart  soZware  to  provide  reliability  

•  Costs  of  people,  machines,  and  hosCng  equal  –  Power  costs:  $.06/kwH  in  Seatle  vs.  $.15/kwH  in  SF  

 11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  33  

Page 34: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Cloud  Use:  Amazon  EC2  

•  Running  RMR  +  Hadoop,  about  50%  of  throughput  of  hosted  cluster  

•  Provides  disaster  recovery  capability  (back  up  copy  of  data)  

•  Available  for  surge  capacity  

•  Matured  a  lot  from  2008  tests  with  network  problems  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  34  

Page 35: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Comprehensive  Monitoring  350K  metrics  updated  every  15  seconds  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  35  

Page 36: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Management  Challenges  

•  Rare  events  happen  all  the  1me  –  Undetected  bit  errors  in  SATA  disks  (10-­‐14)    –  Buggy  network  controllers  (how  bad?)  –  Bit  errors  in  network  transfer  –  Memory  errors  –  Must  checksum  all  data    

(e.g.,  turn  on  memory  error  detecCon)  

•  Need  to  detect  and  avoid  slow  components…  –  Slow  for  an  instant?  (<1  s)  –  Intermitently  (minutes)  –  Chronically  (hours  to  forever)  

36   11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  

Page 37: Large&Scale&Mapreduce&Data Processing&atQuantcast · Lookalike&Pipeline& Model& 500TB Scoring& Trained&Models& 100s&of&ConcurrentModels& 10MPotenCal&Converters& 1Billion& MulC&PB&

Summary  

Quantcast  has  built  a  business  applying  big  data  and  analyCcs  to  change  online  adverCsing,  leveraging:  

•  Large  scale  map-­‐reduce  processing  for  large  scale  

•  Near-­‐Cme  response  for  responsiveness  

•  Metrics-­‐driven  data  science  

•  A  variety  of  data  sets  

11/5/10   ©  2010  Quantcast,  Think  Big  AnalyCcs  37