Real World Cassandra
-
Upload
gilttech -
Category
Technology
-
view
9.531 -
download
0
Transcript of Real World Cassandra
![Page 1: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/1.jpg)
|
the prospect engine for brands.
Cassandra in Online Advertising: Real Time Bidding
![Page 2: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/2.jpg)
Who are we?
Costa Sevdinoglou & Edward Capriolo
![Page 3: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/3.jpg)
Impressions look like…
![Page 4: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/4.jpg)
A High Level look at RTB
4. On behalf of the marketer, m6d bids the impressions via the
auction house. If m6d wins, we display our ad to the
browser.
3. Exchanges serve as auction houses for the impressions
1. Browsers visit Publishers and create impressions.
2. Publishers sell impressions via Exchanges.
![Page 5: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/5.jpg)
Performance and Data
• Billions and billions of bid requests a day
• A single request can result in multiple Cassandra Operations!
• One cluster is just under 10TB and growing
• Low latency requirement below 120 ms typical
• Limited data available to m6d via the exchange
![Page 6: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/6.jpg)
Segment Data
Segments are how we assign product or service
affinity to a group of users. User’s we consider to be
like minded with respect to a given brand will be
placed in the same segment.
Segment Data is just one component of our
overarching data model.
Segments help to reduce the number of calculations
we do in real time.
![Page 7: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/7.jpg)
Old Approach for Segment Data
Limitations
•Periodically updated.
•Only subsection of
the data.
•Cluster performance
is effected during a
data push.
Application Nodes (Tomcat + MySQL )
Event Logs
Hadoop Aggregation
MySQL Data Push
![Page 8: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/8.jpg)
Cassandra Approach for Segment Data
Better!
• Updating in real time now
possible
• Distributed not duplicated
• Less complexity to manage
• Storing more information
• We can now bid on users
sooner!
Application Nodes (Tomcat + Less MySQL Usage)
Cassandra
![Page 9: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/9.jpg)
One Ring to rule them all
http://askyyy.blog.163.com/blog/static/1234575992010428819399/
![Page 10: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/10.jpg)
Peer to Peer per operation replication
Fail fast, self-healing
Each write goes to all natural endpoints
Hinted handoff if destination is down
Repair on Read
No more: STOP SLAVE; SET GLOBAL
SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE;
![Page 11: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/11.jpg)
Multi Data Center
No designing and managing complex replication topologies
create keyspace world
with placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy'
and strategy_options={1:3, 2:3, 3:3};
The same process as single data center
No log shipping, or separate processes to run
![Page 12: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/12.jpg)
Monitoring & Management
Many Many things to monitor with JMX
Nice command line tools
Most values can be tweaked at run time
![Page 13: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/13.jpg)
Capacity Planning
How many
Rows
Columns
Size of Average Column
Latency requirements
Throughput read and writes per sec
![Page 14: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/14.jpg)
Unit Tests FTW!
![Page 15: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/15.jpg)
Max 2 billion columns per row
Awesome
Unless you accidentally write 2 billion columns to a row key named “null”
Check maxRowSize JMX
Watch logs for messages about compacting large rows
![Page 16: Real World Cassandra](https://reader034.fdocuments.in/reader034/viewer/2022052601/55972fcc1a28ab5e518b4695/html5/thumbnails/16.jpg)
Local (NYC) Meetups
www.meetup.com/NYC-Cassandra-User-Group/