ASTRI Proprietary
Big data analytics
Andrew WAT Director, Data Analytics Security and Data Sciences 26-Aug-2015
ASTRI Proprietary
Introduction of ASTRI
• Hong Kong Applied Science and Technology Research Institute • Founded by HKSAR Government in 2000 • Mission: enhancing HK’s competitiveness through applied researches R&D Competences • IC Design • Opto-electronics • Electronics Components • Software and Systems • Security and Data Sciences • Communication Technologies
Area of Applications • Financial technologies • Intelligent manufacturing • Next generation network • Medical and health
People • Staff: 500,R&D: 420(~85%) • PhD: 25%,Master Deg.: 50%
25%(105) PhD 50%(210) M. Deg 25%(105) B. Deg
ASTRI Proprietary
4 Kinds of Projects
3
Cash Rebate Scheme
30% cash rebate from ITC
ITF-funded Seed Project
• Forward-looking/ exploratory work to provide foundation work for future projects
• Capped at 2.8M HKD
ITF-funded Platform Project
Source of fund Industry contribution: ≥ 10% (≥ 1 company) Funded by ITC: ≤ 90%
ASTRI owns all IP rights but industry partners can license the IP non-exclusively
Industry Collaborative Project
Source of fund Industry contribution: 30-50% Funded by ITC: 50-70%
30%:Industry partner can exclusively license the foreground IP for a period 50%:Industry partner can own the foreground IP
Contract Research
Source of fund Industry contribution: 100% Industry partner can own the foreground IP
30% : 70% 50% : 50%
100%
• Two types of funding: • Annual recurrent budget of 140M HKD • Project based funding of total ~270M HKD (at least 20% contribuEon from
industry) • FY15/16 new projects: 43
• 4 kinds of projects
ASTRI Proprietary
Big Data refers to datasets whose size and complexity render them difficult or prohibitively expensive to process using prevailing solutions
Data size: order of petabytes (250 bytes, or ~1015 bytes)* * or more! Big data analytics (BDA) is the extraction and use of valuable information from
Big Data (structured or unstructured data-sets of the order of petabytes*, too large for prevailing tools to capture, store, search, transfer, analyze and visualize)
** In 2013
Big Data Analytics
4
ASTRI Proprietary
Big Data Analytics Need
Source: Astri
End Users Domain Experts
SoUware Developers
Data Scien7sts • Hard to find • Limited domain knowledge • Specialty in Data Mining, Machine Learning, StaEsEcs
End Users • Domain Experts • Defines the requirements
So@ware Developers
Time and Money needed for new Data Analytics requirement
Data ScienEsts
IT Struggles with Big Data • 79% of businesses with 501-‐1000 employees, 55% with
more than 3000 workers say their IT departments view big data as a “significant challenge”,
ASTRI Proprietary
Challenges
Informa7on Sources
Mobile TransacEonal Data Search Texts CRM, SCM, ERP
$ € ¥
Images Email Social Media IT Ops Audio Video
Tradi7onal RDB?
Data Warehouse?
Business Intelligence Tools?
ASTRI Proprietary
Data Silos in big organizations
CRM Data Supply Chain Data
Financial Performance
Data Sales Data MarkeEng Data Call Center
Data
Vendor A Vendor A Vendor A Vendor B Vendor C Vendor D
$ApplicaEons
No holis7c view of all data
Vendor lock-‐in and interoperability issue
System integra7on issues Need skill sets for proprietary system
ASTRI Proprietary
Hadoop, containing an open-source implementation of MapReduce, has become a pseudo standard in processing Big Data across clusters of commodity servers. In MapReduce, the entire data processing is mapped into many small fragments of work, which are executed in parallel by multiple “workers”, and the partial results of each fragmented worker are reduced to get the final results.
Apache Hadoop
8
ASTRI Proprietary 9
What Apache Hadoop Looks Like
9
ASTRI Proprietary
What Non-‐technical Users Are Looking For …
10
Plug & play soUware appliance with GUI
Users input data source URL or upload data from browser
Select a processing type
from the funcEon list
Fill in funcEon parameters
Define job execuEon properEes
Submit job to Hadoop job system
Receive noEficaEon about job progress
Visualize the results
Generate report
Modify job and re-‐run job if necessary
Define post-‐processing
task
Via wizards and templates
Loop for chained jobs
… Enhanced Hadoop-‐in-‐a-‐Box!! Commodity hardware BDA job set-‐up and
progress monitoring BDA results
ASTRI Proprietary
• >30 engineers and researchers working on big data analyEcs • Project Highlights
• ASTRI-‐HP Joint Lab developing a easy-‐to-‐use and versaEle big data analyEcs plaoorm for HP big data product
• Establishing a cloud-‐based streaming data analy7cs plaoorm for recommendaEon system, predicEve analysis, senEment analysis, etc.
streaming data analy7cs
Big Data Analytics in ASTRI
sen7ment analysis predic7ve analysis recommenda7on
ASTRI Proprietary
What we do
• Focus on 3 things • Big Data Platform development
• deep integration and orchestration of different big data components in ecosystem
• make it more easy to use for non-technical business analysts with additional abstraction layer, application programming interface and user interface
• Big Data Algorithm development • supplement existing library with own algorithms or verified
university research results • Domain applications
• Apply the platform and algorithms in domain industries like retail, investment service, etc.
12
ASTRI Proprietary
Big Data Analytics Platform Development
Data
CRM
MarkeEng
Social Media
TransacEonal Data
Analytics
Drag & Drop to design AnalyEcs Workflow with NO coding required
Define Analy*cs Process Execu*on
AnalyEcs Results
Visualization
See and understand analyEcs results and your data Visual AnalyEcs in a few clicks
Parallel Data AnalyEcs Engine
AnalyEcs library for knowledge sharing and non-‐technical user
ASTRI Proprietary
Product Recommendation in Amazon.com
Data Import
Customer transaction record
Mahout ML
Collaborative filtering based
Recommendation
Recommended products
Source: RDBMS Destination: HDFS
Visualization
Personalized recommenda7on • Recommend products based on customer’s previous transacEon
• CollaboraEve Filtering based recommendaEon by Apache Mahout
Customer transaction history
Data source: Amazon.com handles millions of back-‐end
operaEons every day, as well as queries from more than half a million third-‐party sellers.
Objec7ve : Recommend products to customers
to boost sales.
Big Data Analytics Example
ASTRI Proprietary
Know your data Example – Import Data
Drag and Drop “Data MigraEon” Simple ConfiguraEon
ASTRI Proprietary
Analyze your data
Analytics Library
Recommender
Recommendation based on user profiles, movie info, and viewing history
ClassificaEon
SenEment Analysis
Clustering
…
ASTRI Proprietary
Recommendation based on user profiles, movie info, and viewing history
An example of a Workflow
Analyze your data
ASTRI Proprietary
Running Data Analytics
Monitor running status Pause/Resume/Cancel a running Analytics
ASTRI Proprietary
Visualize Analytics Results
See and understand analyEcs results and your data Visual AnalyEcs in a few clicks
Visualization
ASTRI Proprietary
Media data analysis
We build a streaming data flow for analyzing the Tweets and Newsfeed and generating sentiment analytics results in real-time
• Use Tweet APIs and Newsfeed handlers to collect data and specify parameters, such as keywords, topics, time duration, etc.
• Analyze the media data for discovering hot topics and performing sentiment analysis
Potential Collaborators: Fintech, Retail, Marketing firms
20
SenEment analysis and trend predicEon
What is the senEment of financial innovaEon, bitcoin and e-‐cheque in the last 24 hours and how about next hour?
ASTRI Proprietary
Log data analysis
We build a streaming data flow for analyzing system logs from different machines
• Use log collectors to correlate data from different machines/applications/sensors on a streaming big data platform
• Dig out important information such as system errors, and abnormal user or process activities, and generate alerts, suggestions, and provide insightful decisions
Potential Collaborators: Fintech, ISP, MSSP, equipment vendors
21
Sensor log
Perf log
App log
Security/ Performance
analysis Predict system failure
Detect system error
Alert user and other processing
Security log
ASTRI Proprietary 22
Low interaction honeypots
Corporate Network
High interacEon Honeypots in sandboxes
Security AnalyEcs
Honeypot Data Management
Cloud Service Provider
Honeypot data
Captured malware samples
Malware analysis
ASTRI Proprietary
Design Manufacturing
Virtual Prototyping
Electronics Components
Vision Solutions
Industrial BD Analytics
Industry 4.0
PrevenEve maintenance, StaEsEcal Process Control, etc.
ASTRI Proprietary
Case Study: Marriage & Birth Prediction
Traditionally, Chinese like to present gold bracelet or jewelry accessories as gifts for celebrating new marriage or new born babies
Good prediction on coming years (months) new marriage / birth rate: => Better estimate on demand of luxury gift, leading to:
• Better plan on logistics and supply chain • Affect distribution channel / promotion strategies
Conduct an experiment to test how accurate we can achieve
• Use HK data to do analysis and make prediction • If it works, apply same methodology to other cities and review
Source: Astri
ASTRI Proprietary
Prediction with HK data
HK Census data: • Birth data: 1961 – 2013 • Marriage data: 1976 – 2013 => not enough data received,
looking for more from Census Approach:
• Analyze data characteristics • Apply appropriate mathematics and statistical models • Predict trend and do forecasting
Source: Astri
ASTRI Proprietary
Birth Prediction - Result
Source: Astri
ASTRI Proprietary
Birth Prediction – Result Analysis
Source: Astri
Our predicaEon of crude birth rate
Census figure of crude birth rate for 2014: 8.6
• PredicEon (7.957) consistent with census figure (8.6)
• Can further improve by incorporaEng more data • Demographic data • FerElity rate of different age groups, etc
Fiwed Value
ASTRI Proprietary
Prediction with mainland data
Collect mainland data through online • Period: 1950 – 2012 • Cities: Beijing (北京), Jilin (吉林), Zhejiang (浙江), Hubei (湖北) • Data: crude birth rate
Approach:
• Apply same analysis and prediction method • Predict birth rate for 2013 • Verify the prediction with another online source
Source: Astri
ASTRI Proprietary
Case Study: Prediction Results
30 May 2014 PresentaEon Etle Source: Astri
Beijing: 2013: 9.59 (8.93) 2014: 10.10 (9.75)
Jilin: 2013: 6.40 (5.36) 2014: 7.04 (6.62)
Zhejiang: 2013: 10.65 (10.01) 2014: 11.14 (10.51)
Hubei: 2013: 11.71 (11.08) 2014: 12.38 (11.89)
• Correct trend predicEon • 80% confidence interval
ASTRI Proprietary
Some commercial projects
(1) Industry: Retail • Project type: 50:50 Industrial Collaboration Program (ICP) Project • Major deliverables:
• Close integration to customer’s existing DBMS for data collection
• Customer profiling, item recommendations, trend analysis/prediction, etc.
(2) Industry: Investment services • Project type: Contract Service • Major deliverables:
• Loose integration to customer existing BI systems • Company media analysis and sentiment analysis
30
ASTRI Proprietary
Disclaimer
The information contained in this presentation is intended solely for your reference and may be subject to change without further notice. Such information's truthfulness, accuracy or completeness is not guaranteed and it may not contain all the material information concerning Hong Kong Applied Science and Technology Research Institute Company Limited and/or its affiliates (collectively, "ASTRI"). ASTRI makes no representation or warranty regarding, and assumes no responsibility or liability for, the truthfulness, accuracy or completeness of any information contained herein. In addition, the information may contain projections and forward-looking statements that may reflect ASTRI’s current views with respect to future events and financial performance. These views are based on current assumptions which may change over time. ASTRI makes no assurance that such future events will occur, that such projections will be achieved, or that ASTRI’s assumptions are correct. Lastly, this presentation does not constitute an offer made by ASTRI whatsoever (including an offer relating to ASTRI's technologies and/or services).
31
ASTRI Proprietary
End of Presentation Thank you. Questions are welcome.
Corporate website: www.astri.org Contact: Andrew Wat [email protected] 34062998
32
Top Related