Using Cassandra for RTB systems
-
Upload
nader-ganayem -
Category
Technology
-
view
1.416 -
download
1
description
Transcript of Using Cassandra for RTB systems
![Page 1: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/1.jpg)
Real Time Bidding with Apache Cassandra
![Page 2: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/2.jpg)
Introducing RTBRTB @ Kenshoo:
- Concepts- Architecture- Challenges
![Page 3: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/3.jpg)
Real Time Bidding (RTB)
● Real-time bidding is a dynamic auction process where each
impression is a bid for in (near) real time versus a static auction
● Kenshoo is engaged In Facebook Exchange (FBX)
● In FBX, each bid has a life-time of 120ms. All transactions have to
complete within that period, and the winning ad is presented to the
user.
● Kenshoo employs ad re-targeting, where search engine campaigns
are extended to the social network, thus giving a much higher ROI for
our customers
![Page 4: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/4.jpg)
Flow
WebSite
![Page 5: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/5.jpg)
![Page 6: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/6.jpg)
RTB Logical Architecture
RTB
RTB Front
Bidder Win ErrorOpt Out Pixel Matcher
RTB BackendRTB Brain
RTB Reporter
Cassandra
Cookie to Segment(s)
Bid decision Trees
Campaigns Metadata
![Page 7: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/7.jpg)
Focus on RTB Cassandra RTB @ Kenshoo:
- Architecture- Challenges
![Page 8: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/8.jpg)
Requirements
● Handle 25K+ requests within the 120ms bid time-frame including network latencies
● Ability to scale up to 1M per minute requests while keeping the
current latency
● Handle ~10K writes/second with low latency
● Multi DC Configuration, all nodes must be sync-ed in real-time
● Seamless Operations: Compactions and Repairs
● High Security
![Page 9: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/9.jpg)
C* Physical Architecture
(US) West Region
App App
VPN
App Internet
(US) East Region
App App
VPN
App
FBX WEST FBX EAST
GRE
![Page 10: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/10.jpg)
C* Cluster Information
● Cassandra version 1.2.6● Oracle Java 7● Manual tokens, Vnodes Are Coming Soon● Multi-DC Configuration● Network Topology ● DC Connectivity between VPCs via Linux GRE● Amazon C3.2xlarge instance type● Ubuntu 13.10 with EXT4● SSD (Ephemeral)
The Ring
![Page 11: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/11.jpg)
C* Cluster Network Between Sites
● For security reasons we,
○ Do not use EC2Snitch or EC2MultiRegionSnitch
○ Connected the nodes via VPN (Linux GRE)
● Linux GRE is fast, reliable and provides high throughput
(~1Gb/s)
![Page 12: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/12.jpg)
C* Cluster Storage
● We started with Amazon EBS:
○ With small #nodes (up to 4 nodes): You want persistent storage; avoid running repairs if you lose a node
○ 4xEBS devices in RAID10 configuration: Provide up to 1000 IOPs and bursts of up to 2000 IOPS
○ Cheap in AWS
● 8 nodes with Ephemeral Devices:
○ Lower risk: if you lose a node, recovery isn’t as heavy on the whole cluster
○ We used RAID0○ Higher performance (double than EBS)○ Free, bundled within the instances
![Page 13: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/13.jpg)
C* Cluster Storage continued
● 16 nodes with Ephemeral Devices:
○ When load became heavy we grew to 16 nodes○ Compactions and repairs harmed the cluster latency○ We had to use Provisioned IOPs devices for C* maintenance
● C3 Instance type with SSD:○ Came just in time providing ephemeral SSD storage○ They solved our performance problems and enabled
seamless compactions and repairs○ Amazon currently has scarce deployment of this H/W and
nodes are not stable○ Not available yet in all regions○ C3 Nodes Deployment are not always a possiblity due to AWS
capacity issues○ Amazon promised to resolve the C3 issues next month
![Page 14: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/14.jpg)
C* Cluster Performance
![Page 15: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/15.jpg)
Monitoring
● We heavily rely on DataStax OpsCenter
● We grab OpsCenter Metrics out for graphings
● We wrote our own Read/Write Speed Test on separate dedicated KeySpace on
each node to detect bottlenecks and problematic nodes
● We Sample the data separately from the Application to detect if the problem
origins are C* or the application
![Page 16: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/16.jpg)
What have we learned
● Storage:○ Use SSD:
■ It provides high and stable disk performance■ Neutralizes Compaction and Repair effects on the cluster■ Worth the money
● Network:■ Use highest bandwidth VPN possible■ GRE is great (lacks encryption, but provides best bandwidth)
● Maintenance:○ Run Compact Daily: It does miracle to performance on heavy loads○ If you are not on SSD, disable thrift on the node before running compaction○ Do compactions in sequence, node by node○ On high-load systems, avoid repair as possible, it’s better to decommission
and recommission a node than to run repair!○ If you have to repair, always use “-pr” flag and if possible use the
incremental repair option (requires heavy scripting)● Monitoring:
○ Write a sampler and speed tester for each node to detect bottlenecks and performance issues sources
![Page 17: Using Cassandra for RTB systems](https://reader034.fdocuments.in/reader034/viewer/2022052307/555ec87fd8b42a74708b5490/html5/thumbnails/17.jpg)
Thank you