Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in...
-
Upload
planet-cassandra -
Category
Technology
-
view
278 -
download
7
description
Transcript of Cassandra Summit 2014: Turkcell Curio, Real-Time Targeted Mobile Marketing Platform Implemented in...
CURIO:
A Mobile Marketing Platform
Ülker ÖZGEN ÇİFTÇİ
TURKCELL
About Turkcell
9 COUNTRIES, 71.3 MILLION MOBILE SUBSCRIBERS
EUROPE’S SECOND LARGEST OPERATOR
400 DEVELOPERS
Content
• Mobile marketing platform : Curio
• Curio’s architecture (Storm + Kafka + Cassandra)
• Use cases about Cassandra
About Curio
• Mobile marketing platform
• Now serving 80+ mobile applications in production
• Nearly 100 million transactions/day
• Real time interaction with users (via Push Notifications)
Example Analytics Data
Apache Kafka
• Distributed publish-subscribe messaging system
• Open sourced by Linkedin
Apache Kafka – Features
• Fast
• Scalable
• Durable
• Distributed by Design
Apache Storm
• Distributed fault-tolerant realtime computation system
• Open sourced by Twitter
• Written in Java
Apache Storm – Features
• Runs "Topologies"
• Clustered Structure
• Master node Nimbus
• Worker node Supervisor
• State is kept in Zookeeper
Apache Storm – Features II
• Integrates with any queueing and database system
• Kestrel, RabbitMQ / AMQP, Kafka, JMS..
• Simply connect with your database
• Simple API / Trident API
Apache Storm – Features III
• Scalable
• Benchmarks clocked Storm at over 1.000.000 tuples/second/node
• Fault-tolerant
• Guarantees your data will be processed (exactly once is guaranteed by
Trident API)
Curio Topologies
• Visit Topology (heavy reads & writes to Cassandra)
• 24 parallel and partitioned tasks processing raw data
• 12 parallel consolidating tasks processing the pre-processed data
• Push Topology
• 5 paralel tasks for sending push notifications
Topology To Cassandra
• All stored in Cassandra
• Mobile application launching/closing (creating and ending session)
• Page navigations (creating and ending screen hits)
• Event triggers (creating events)
• Counted values (relevant summary tables)
Curio Use Cases
• Use Case – I
• Calculating online user counts in real time
• Use Case – II
• Calculating active user counts in real time
Use Case I:
Counting Online Users
Requirement
• Counting online users for each mobile application
• Within a session timeout duration a user is online if:
• Opens a session
• Navigates through screens
• Triggers events
First Implementation
• Store online requests into a single table
• Default compaction strategy for the table is :
SizeTieredCompactionStrategy
First Implementation (cont.)
• Insert with a TTL for each request that is encountered as "online"
• Do deletion for session end requests
• Use a count query when online counts are requested
First Problem
• Storm performs insertion, update and deletions to "visit_online" table
• The performance of these queries got 100 times worse than before.
• The cause is stated as "SizeTieredCompactionStrategy"
Solution I
• Use "LeveledCompactionStrategy"
• The storm queries returned back to normal values as 50msec/tuple
Second Problem
• Count queries started getting timeout under heavy traffic
• ex: the applications who has 500.000 transactions/day
The Solution
• Re-design online table :
• Add new column desc ordered to identify "online" status :
valid_through
• TTL duration is changed to 3 days (no session_timeout any more)
The Solution (cont)
• Do not perform manual deletions
• Insert single row for each transaction (session, screen, event)
• No more count query! Calculation performed in java by selecting less
records with:
Use Case II:
Counting Active Users
Requirement
• Users have unique "visitor_code" for each application
• The most required report is "active user count" requires performance
• Active users = new users + returning users
• New users can be summed up daily
• Returning users need calculation with time interval
Solution
• Create a new table for storing session intervals
• Use a column for storing previous session time
previous_timestamp
• This column is ordered DESC
Solution
• Calculate returning users by:
• Running query for each date in the selected date interval
• Select records whose previous login time is less than the start time
of query
View from GUI
Thank You
@ulkeroz