C* Summit EU 2013: From CQL to Time-Series Event Tracking and Aggregation Using Cassandra and Hadoop

Post on 25-May-2015

685 views 0 download

description

Speaker: Mick Semb Wever, Programmer at FINN.no Video: http://www.youtube.com/watch?v=0ZymZ4OFcC4&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=14 FINN.no's is a classifieds website and Norway's busiest website. This session will go through various product development where c* has shown to be the best choice, focusing on our primary c* use-case: our in-house tracking solution that's collects raw time-series data in c* and aggregates minute-by-minute it using hadoop into various new datasets from advert-centric statistics to user-centric behavioural analysis. I'll cover the final technical design chosen after a number of development iterations touching on technologies: scribe, thrift, kafka, hadoop, pig, mahout; the hurdles faced along the way, and the throughput and performance of today's systems.

Transcript of C* Summit EU 2013: From CQL to Time-Series Event Tracking and Aggregation Using Cassandra and Hadoop