Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc ›...

32
ASTRI Proprietary Big data analytics Andrew WAT Director, Data Analytics Security and Data Sciences 26-Aug-2015

Transcript of Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc ›...

Page 1: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Big data analytics

Andrew WAT Director, Data Analytics Security and Data Sciences 26-Aug-2015

Page 2: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Introduction of ASTRI

•  Hong Kong Applied Science and Technology Research Institute •  Founded by HKSAR Government in 2000 •  Mission: enhancing HK’s competitiveness through applied researches R&D Competences • IC Design • Opto-electronics • Electronics Components • Software and Systems • Security and Data Sciences • Communication Technologies

Area of Applications • Financial technologies • Intelligent manufacturing • Next generation network • Medical and health

People •  Staff: 500,R&D: 420(~85%) •  PhD: 25%,Master Deg.: 50%

25%(105)  PhD    50%(210)  M.  Deg  25%(105)  B.  Deg  

Page 3: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

4 Kinds of Projects

         

3

Cash Rebate Scheme

30% cash rebate from ITC

ITF-funded Seed Project

•  Forward-looking/ exploratory work to provide foundation work for future projects

•  Capped at 2.8M HKD  

ITF-funded Platform Project

Source of fund Industry contribution: ≥ 10% (≥ 1 company) Funded by ITC: ≤ 90%

ASTRI owns all IP rights but industry partners can license the IP non-exclusively

Industry Collaborative Project

Source of fund Industry contribution: 30-50% Funded by ITC: 50-70%

30%:Industry partner can exclusively license the foreground IP for a period 50%:Industry partner can own the foreground IP

Contract Research

Source of fund Industry contribution: 100% Industry partner can own the foreground IP

30% : 70% 50% : 50%

100%

•  Two  types  of  funding:    •  Annual  recurrent  budget  of  140M  HKD  •  Project  based  funding  of  total  ~270M  HKD  (at  least  20%  contribuEon  from  

industry)  •  FY15/16  new  projects:  43  

•  4  kinds  of  projects        

Page 4: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Big Data refers to datasets whose size and complexity render them difficult or prohibitively expensive to process using prevailing solutions

Data size: order of petabytes (250 bytes, or ~1015 bytes)* * or more! Big data analytics (BDA) is the extraction and use of valuable information from

Big Data (structured or unstructured data-sets of the order of petabytes*, too large for prevailing tools to capture, store, search, transfer, analyze and visualize)

**  In  2013  

Big Data Analytics

4

Page 5: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Big Data Analytics Need

Source:  Astri  

End  Users  Domain  Experts  

SoUware    Developers  

Data  Scien7sts  •  Hard  to  find  •  Limited  domain  knowledge  •  Specialty  in  Data  Mining,  Machine  Learning,      StaEsEcs  

End  Users  •  Domain  Experts  •  Defines  the  requirements  

So@ware  Developers  

Time and Money needed for new Data Analytics requirement

Data  ScienEsts  

IT    Struggles  with  Big  Data    •  79%  of  businesses  with  501-­‐1000  employees,  55%  with  

more  than  3000  workers  say  their    IT  departments  view  big  data  as  a  “significant  challenge”,  

Page 6: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Challenges

Informa7on  Sources  

Mobile  TransacEonal  Data   Search  Texts  CRM,  SCM,  ERP  

$  €  ¥  

Images  Email   Social  Media  IT  Ops   Audio  Video  

Tradi7onal  RDB?  

Data  Warehouse?  

Business  Intelligence  Tools?  

Page 7: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Data Silos in big organizations

CRM  Data   Supply  Chain  Data  

Financial  Performance  

Data  Sales  Data   MarkeEng  Data   Call  Center  

Data  

Vendor  A  Vendor  A  Vendor  A   Vendor  B   Vendor  C   Vendor  D  

$ApplicaEons  

No  holis7c  view  of  all  data  

Vendor  lock-­‐in  and  interoperability  issue  

System  integra7on  issues  Need  skill  sets  for  proprietary  system    

Page 8: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Hadoop, containing an open-source implementation of MapReduce, has become a pseudo standard in processing Big Data across clusters of commodity servers. In MapReduce, the entire data processing is mapped into many small fragments of work, which are executed in parallel by multiple “workers”, and the partial results of each fragmented worker are reduced to get the final results.

Apache Hadoop

8

Page 9: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary 9  

What  Apache  Hadoop  Looks  Like

9

Page 10: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

What  Non-­‐technical  Users  Are  Looking  For  …

10  

Plug  &  play  soUware  appliance  with  GUI  

Users  input  data  source  URL  or  upload  data  from  browser  

Select  a  processing  type  

from  the  funcEon  list  

Fill  in  funcEon  parameters  

Define  job  execuEon  properEes  

Submit  job  to  Hadoop  job  system  

Receive  noEficaEon  about  job  progress  

Visualize  the  results  

Generate  report  

Modify  job  and  re-­‐run  job  if  necessary  

Define  post-­‐processing  

task  

Via  wizards    and  templates  

Loop  for  chained  jobs  

…  Enhanced  Hadoop-­‐in-­‐a-­‐Box!!  Commodity  hardware   BDA  job  set-­‐up  and  

progress  monitoring  BDA  results  

Page 11: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

•  >30  engineers  and  researchers  working  on  big  data  analyEcs  •  Project  Highlights  

•  ASTRI-­‐HP  Joint  Lab  developing  a  easy-­‐to-­‐use  and  versaEle  big  data  analyEcs  plaoorm  for  HP  big  data  product  

•  Establishing  a  cloud-­‐based  streaming  data  analy7cs  plaoorm  for  recommendaEon  system,  predicEve  analysis,  senEment  analysis,  etc.  

streaming  data  analy7cs  

Big Data Analytics in ASTRI

sen7ment  analysis   predic7ve  analysis   recommenda7on  

Page 12: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

What we do

•  Focus on 3 things •  Big Data Platform development

•  deep integration and orchestration of different big data components in ecosystem

•  make it more easy to use for non-technical business analysts with additional abstraction layer, application programming interface and user interface

•  Big Data Algorithm development •  supplement existing library with own algorithms or verified

university research results •  Domain applications

•  Apply the platform and algorithms in domain industries like retail, investment service, etc.

12

Page 13: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Big Data Analytics Platform Development

Data

CRM  

MarkeEng  

Social  Media  

TransacEonal  Data  

Analytics

Drag  &  Drop  to  design    AnalyEcs  Workflow    with  NO  coding  required  

Define  Analy*cs  Process   Execu*on  

AnalyEcs  Results  

Visualization

See  and  understand  analyEcs  results  and  your  data      Visual  AnalyEcs  in  a  few  clicks  

Parallel  Data  AnalyEcs  Engine  

AnalyEcs  library  for  knowledge  sharing  and  non-­‐technical  user    

Page 14: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Product Recommendation in Amazon.com

Data Import

Customer transaction record

Mahout ML

Collaborative filtering based

Recommendation

Recommended products

Source: RDBMS Destination: HDFS

Visualization

Personalized  recommenda7on  •  Recommend  products  based  on  customer’s  previous  transacEon    

•  CollaboraEve  Filtering  based  recommendaEon  by  Apache  Mahout  

Customer transaction history

 Data  source: Amazon.com  handles  millions  of  back-­‐end  

operaEons  every  day,  as  well  as  queries  from  more  than  half  a  million  third-­‐party  sellers.  

 

Objec7ve  :  Recommend  products  to  customers  

to  boost  sales.  

Big Data Analytics Example

Page 15: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Know your data Example – Import Data  

Drag  and  Drop  “Data  MigraEon”   Simple  ConfiguraEon  

Page 16: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Analyze your data

Analytics Library

Recommender  

Recommendation based on user profiles, movie info, and viewing history  

ClassificaEon  

SenEment  Analysis  

Clustering  

Page 17: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Recommendation based on user profiles, movie info, and viewing history  

An example of a Workflow  

Analyze your data

Page 18: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Running Data Analytics

Monitor running status Pause/Resume/Cancel a running Analytics

Page 19: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Visualize Analytics Results

See  and  understand  analyEcs  results  and  your  data      Visual  AnalyEcs  in  a  few  clicks  

Visualization

Page 20: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Media data analysis

We build a streaming data flow for analyzing the Tweets and Newsfeed and generating sentiment analytics results in real-time

•  Use Tweet APIs and Newsfeed handlers to collect data and specify parameters, such as keywords, topics, time duration, etc.

•  Analyze the media data for discovering hot topics and performing sentiment analysis

Potential Collaborators: Fintech, Retail, Marketing firms

20

SenEment  analysis  and  trend  predicEon  

What  is  the  senEment  of    financial  innovaEon,  bitcoin  and    e-­‐cheque  in  the  last  24  hours  and  how  about  next  hour?  

Page 21: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Log data analysis

We build a streaming data flow for analyzing system logs from different machines

•  Use log collectors to correlate data from different machines/applications/sensors on a streaming big data platform

•  Dig out important information such as system errors, and abnormal user or process activities, and generate alerts, suggestions, and provide insightful decisions

Potential Collaborators: Fintech, ISP, MSSP, equipment vendors

21

Sensor  log  

Perf  log  

App  log  

Security/  Performance  

analysis  Predict  system    failure  

Detect  system  error  

Alert  user  and  other  processing  

Security  log  

Page 22: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary 22

Low interaction honeypots

Corporate Network

High  interacEon  Honeypots  in  sandboxes  

Security  AnalyEcs  

Honeypot  Data  Management  

Cloud Service Provider

Honeypot data

Captured malware samples

Malware analysis

Page 23: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

 

Design Manufacturing

Virtual Prototyping

Electronics Components

Vision Solutions

Industrial BD Analytics

Industry 4.0

PrevenEve  maintenance,  StaEsEcal  Process  Control,  etc.  

Page 24: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Case Study: Marriage & Birth Prediction

Traditionally, Chinese like to present gold bracelet or jewelry accessories as gifts for celebrating new marriage or new born babies

Good prediction on coming years (months) new marriage / birth rate: => Better estimate on demand of luxury gift, leading to:

•  Better plan on logistics and supply chain •  Affect distribution channel / promotion strategies

Conduct an experiment to test how accurate we can achieve

•  Use HK data to do analysis and make prediction •  If it works, apply same methodology to other cities and review

Source:  Astri  

Page 25: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Prediction with HK data

HK Census data: •  Birth data: 1961 – 2013 •  Marriage data: 1976 – 2013 => not enough data received,

looking for more from Census Approach:

•  Analyze data characteristics •  Apply appropriate mathematics and statistical models •  Predict trend and do forecasting

Source:  Astri  

Page 26: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Birth Prediction - Result

Source:  Astri  

Page 27: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Birth Prediction – Result Analysis

Source:  Astri  

Our  predicaEon  of  crude  birth  rate  

Census  figure  of  crude  birth  rate  for  2014:  8.6

•  PredicEon    (7.957)  consistent  with  census  figure  (8.6)  

•  Can  further  improve  by  incorporaEng  more  data  •  Demographic  data  •  FerElity  rate  of  different  age  groups,  etc

Fiwed  Value

Page 28: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Prediction with mainland data

Collect mainland data through online •  Period: 1950 – 2012 •  Cities: Beijing (北京), Jilin (吉林), Zhejiang (浙江), Hubei (湖北) •  Data: crude birth rate

Approach:

•  Apply same analysis and prediction method •  Predict birth rate for 2013 •  Verify the prediction with another online source

Source:  Astri  

Page 29: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Case Study: Prediction Results

30  May  2014   PresentaEon  Etle  Source:  Astri  

Beijing:  2013:      9.59  (8.93)  2014:  10.10  (9.75)  

Jilin:  2013:      6.40  (5.36)  2014:      7.04  (6.62)  

Zhejiang:  2013:  10.65  (10.01)  2014:  11.14  (10.51)  

Hubei:  2013:  11.71  (11.08)  2014:  12.38  (11.89)  

•  Correct  trend  predicEon  •  80%  confidence  interval  

Page 30: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Some commercial projects

(1) Industry: Retail •  Project type: 50:50 Industrial Collaboration Program (ICP) Project •  Major deliverables:

•  Close integration to customer’s existing DBMS for data collection

•  Customer profiling, item recommendations, trend analysis/prediction, etc.

(2) Industry: Investment services •  Project type: Contract Service •  Major deliverables:

•  Loose integration to customer existing BI systems •  Company media analysis and sentiment analysis

30

Page 31: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

Disclaimer

The information contained in this presentation is intended solely for your reference and may be subject to change without further notice. Such information's truthfulness, accuracy or completeness is not guaranteed and it may not contain all the material information concerning Hong Kong Applied Science and Technology Research Institute Company Limited and/or its affiliates (collectively, "ASTRI"). ASTRI makes no representation or warranty regarding, and assumes no responsibility or liability for, the truthfulness, accuracy or completeness of any information contained herein. In addition, the information may contain projections and forward-looking statements that may reflect ASTRI’s current views with respect to future events and financial performance. These views are based on current assumptions which may change over time. ASTRI makes no assurance that such future events will occur, that such projections will be achieved, or that ASTRI’s assumptions are correct. Lastly, this presentation does not constitute an offer made by ASTRI whatsoever (including an offer relating to ASTRI's technologies and/or services).

31

Page 32: Big data analytics - The Hong Kong Institution of Engineersit.hkie.org.hk › Upload › Doc › 227c197a-a40c-4f56-8f3d... · 8/26/2015  · ASTRI Proprietary Big Data refers to

ASTRI Proprietary

End of Presentation Thank you. Questions are welcome.

Corporate website: www.astri.org Contact: Andrew Wat [email protected] 34062998

32