Telco analytics at scale
-
Upload
datamantra -
Category
Data & Analytics
-
view
143 -
download
0
Transcript of Telco analytics at scale
www.subex.com
Telco Analytics @Scale
Harikumar, Director Platform & ArchitectureNov , 2016
1
Private & Confidentialwww.subex.com2
Subex Intro
Private & Confidentialwww.subex.com
Subex BSS/OSS Portfolio
3
www.subex.com
Data Crunching - Use Cases & Latency
Real Time (Milliseconds)
Near Real Time (Seconds)
Micro Batch(Minutes)
Batch(Hours-Days)
Latency
Algorithmic Complexity
Reporting
Aggregation
Rule Engine
Profiling
Machine Learning
Audits
Graph/Network Analysis
Text Search
Natural Language Processing
www.subex.com 5
Stream Processing & Complex Event Processing (CEP)Event Processing in the Eventful World• (Aggregated) Event
Data is combined/correlated with • Users• Assets• Threats• Vulnerabilities • Location• Historical
Techniques• Rule Engine
• Event filtering• Event aggregation and
transformation• Operate on stored and
streaming data
• SQL like semantics over stream data• Supervised/Unsupervised
machine learning• Applying known Models• Event Pattern Detection
• Detecting Event relationships
Areas• Real time fraud
detection.• Real time rating.• Security Information and
Event Management• Sensor Data/IOT • DPI Data – Metadata,
Content,Flow correlation.• M2M Data• Data Fraud – Malware • Transaction Risk Scoring
www.subex.com 6
Stream processing
• Keep the data Moving (Low Latency) – In Memory• Distributed Message Queues• Distributed In Memory Caches• Distributed In Memory Stores
• Scalable, Highly Available Distributed stream Processing(Partition Data & Scale, Data safety & Highly Available) • Handle Stream Imperfections( Delayed, Missing, Out-Of-Order Data)
Key considerations
www.subex.com 7
ETL @ Scale – In Memory Distributed Cache
• Problem Statement(s)• Scale the ETL
enrichment/lookups layer• High throughput +
Streaming Low latency• Support Multiple
Access mechanisms • GET/PUT• SQL• Views
JVMETL
JVMCache
JVMETL
JVMETL
JVMCache
JVMCache
RDBMS
Read Write Update
www.subex.com 8
Rule n
Rule Engine – In Memory AggregationEvent / I/P Data Record
Rule 2
Rule 1
Event Filters
Filtered Records
Aggregation Layer
Condition Evaluation
Actions
Shared Memory
8K Page Pool
16K Page Pool
32K Page Pool
..256 K
Page Pool
Key / Value
Byte Stream
SerDEIM Log
Shared Memory
8K Page Pool
16K Page Pool
32K Page Pool
..256 K
Page Pool
Key / Value
Byte Stream
SerDEIM Log
Shared Memory
8K Page Pool
16K Page Pool
32K Page Pool
..256 K
Page Pool
Key / Value
Byte Stream
SerDEIM Log
www.subex.com 9
Data placement Strategies
Application Data• Application configuration data– Rule libraries ,DNA
Configurations, Configurations – MySQL.• Application generated data – Alarms, Discrepancies –
MySQL• Operations Data (Application generated , Infra
Monitoring ) – Logs , Audit ,Metrics – Solr• Application Aggregations - Summary/Pre-aggregated
data – Hive Tables• Statistical Profiles, In Memory aggregation files –HDFS
Traditional Telco Data• Telco Entity Data – With Update Semantics –
HBase/MySQL• Telco Historic Transaction Data – Hive with ORC file
format Partitioned by Date Stored in HDFS• Switch Input Raw Files –HDFS
Other Sources• Social Media
• DPI Flow Data
• Location Data
• IOT Sensor Data
www.subex.com 10
Spark Streaming
Application Data
Data Flow
Landing DirectorySAN/HDFS Apache
Flume Flume –
Spark Sink
Apache Kafka
In Memory Rule Engine Analytics Application
s …
Apache Spark Streaming
ETL Adaptors
Flume – Dir Source
Message Queue
Flume –Kafka Source
DB SourcesSqoop/CDC
Tools
HDFS – Raw File Backup
HDFS Hive Tables Hbase Tables Solr - Search Indexes
Audits
MySQL– Ref DB
HDFS
Hive Tables
Hbase Tables
Dist Message Queue
Data Lake
Submit Spark Jobs
Data AccessHive/Presto
Distributed Cache
Operational Metrics
Data Load Stage
OM
Spark Streaming
OM
Pre-aggregation
Data Management
Data Platform – Business and Domain Packaging
11
Data Acquisition/Ingest
Data Federation F/W
Data Processing
Pre
Aggr
egati
on
Distributed Stream Processing Apache Spark
Data Visualization & Analysis
Mobile F/W
ROC View
Case Management
Standard APIs – EAI & WS
Analytics Engine
Reconciliation Engine BPM- Workflow
Engine
Flexible ETL Rule Processing - In
Memory
Common Data ModelDi
strib
uted
Cac
he
Control Panel
Operations & Admin
ResourceMgmt
Data Security
Audit & Logging
Scheduler
Network Analysis
ROC Insights
Real time Message based
Dist
ribut
ed
Mes
sage
Que
ue
Hadoop – HDFS, Hive , HBase
Multi -tenancy
Machine Learning
Enterprise Search
Real time Continuous Query - CEP
Document Store Graph Data Store
Authorization &
Authentication
Real time Rating
Profiling
Cloud Metering
Risk Scoring
Cloud connectors
API Mgmt
Infrastructure On premise OS/Servers/Network/StorageIaaS(Public /Private cloud)
ESB
Analytic Models