BSidesROC 2016 - Jaime Geiger - Android Application Function Hooking With Xposed
Big data e xposed from big data to smart data
-
Upload
motty-cohen -
Category
Technology
-
view
791 -
download
1
description
Transcript of Big data e xposed from big data to smart data
1© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
From Big data to Smart data
A journey into the
eXelate cloud
Motty Cohen,Chief Architect, eXelate
2© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
eXelate is the smart data company that powers smarter digital marketing decisions worldwide
Advertiser 1st Party
Data
Data Providers
OfflineData
Online Data
Media Platforms
ModelingScoring
Segmentation
AnalyticsDistributionMarketing
Data Exchange Platform
3© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
• Demographic• Age: 40-55• Urbanicity: Suburban• Income: High• Education: Graduate Plus• Employment: Management
• Interest• Sport• Travels• Wines• Gadgets
• Intent• Travel to Barcelona• 4-star resort
Smart Data:Accurate & actionable audience segmentation
4© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Our journey begins in the browser
The
Internet
5© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Inside eXelate Cloud:Real-time Serving & Smart data delivery
Get Event Info
Add History Data
Apply Rules & Models
Sell to buyers
200ms
100+ platforms
~500K Rules~20K Segments
5B Events/Day
~850M Unique Users
14TB Storage27GB daily
6© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Challenges
Big Data
Relevancy Access Time
On demand Analytics
7© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
big data = noisesmart data = signal
8© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Challenge 1: Relevancy
Grabbing the relevant audienceon site, on time
9© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Generating Models
Model
ModelModel
Data Mining
Analytics
Create Models
eXtream
Netezza tables
Running Analytics on
Amazon
Java Packages
10© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Real time segmentation: Running rules and models
Basic Rules
AssociationRules
Analytic Models
Model
Model
Model
Real-time scoring
Real-time learning
Can we run all these within the limited time frame?
~500K Rules
Complex Models
11© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Continuous Incremental Segmentation
Users Info
Serving ClusterSegmentation
Cluster
0MQ
Continuous Incremental Segmentation
12© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Challenge 2: Fast access to distributed big storage
13© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
User Object • User Info• Segments, Delivery info, Intermediate results• Object Size: x10 KB ~ x100 KB• ~ 850M UU
• Access time• Read / Write within a few ms
• Availability• For any machine in the cluster• For any cluster in every data center
14© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Aerospike: Frontend storage for fast access
Aerospike Cluster
Serving Cluster
XDR: Cross Data Center Replication
Optimized for SSD, Indexed in RAM
Smart Eviction Policy
Fast read/writes: 500K+ TPS
Key-value NoSQL distributed DB
15© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Replicated storage across data centers
US WEST CA
US CENRALTX
EUROPENL
US EASTNY
Aerospike XDR:Cross Datacenter Replication
16© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Challenge 3: On demand analytics
Show me the data, Now!
17© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
optiX:Interactive data analytics
On Demand Calculation
18© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
optiX:Interactive data analytics
On Demand Calculation
19© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Data Center
Elastic Search:Using search engine for counting.
NetezzaDWH Aggregator
ES Cluster(30 Nodes)
Reporter
S3
Loader
optiX
REST FTP
20© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
What did we have so far?
• Data relevancy• Real-time scoring• Parallel processing• Split processing over time
• Big data access time• Front end, Replicated, Aerospike cluster
• On-demand analytics• Change your schema to optimize query time• Move processing from querying to loading phase• Trade off: Space + Processing -> Performance
21© 2013 eXelate Inc. Confidential and Proprietary. #bdx2013
Thank YouQuestions?