Fantasy League Sports with Big Data Technologies

21
FANTASY LEAGUE SPORTS Fantastical, fast, and furious fantasy stats. By: Silvia Oliveros

Transcript of Fantasy League Sports with Big Data Technologies

Page 1: Fantasy League Sports with Big Data Technologies

FANTASY LEAGUE SPORTSFantastical, fast, and furious fantasy stats.

By: Silvia Oliveros

Page 2: Fantasy League Sports with Big Data Technologies

WHY FANTASY LEAGUES?

• I like watching sports.

• Large fan base (41 million people).

• Simulate my own site with 5 million user base.

Page 3: Fantasy League Sports with Big Data Technologies

WEBSITE

Page 4: Fantasy League Sports with Big Data Technologies

PIPELINE

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

DATA INGESTION REAL-TIME / SPEED LAYER

BATCH LAYER SERVING LAYER

Page 5: Fantasy League Sports with Big Data Technologies

DATA INGESTION

User Data:Information

Roster

NFL:Play-by-Play

Kafka

Page 6: Fantasy League Sports with Big Data Technologies

DATA INGESTION

User Data:Information

Roster

NFL:Play-by-Play

Kafka

User Data (Roster):

Play-by-Play Data:

Page 7: Fantasy League Sports with Big Data Technologies

DATA INGESTION

User Data:Information

Roster

NFL:Play-by-Play

Kafka

Why Kafka?

Two consumers to send data to HDFS and Spark

Streaming.

Potential real-time changes in roster information

(future).

Page 8: Fantasy League Sports with Big Data Technologies

PIPELINE

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

DATA INGESTION REAL-TIME / SPEED LAYER

BATCH LAYER SERVING LAYER

Page 9: Fantasy League Sports with Big Data Technologies

REAL-TIME / SPEED LAYER

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

DATA INGESTION REAL-TIME / SPEED LAYER

BATCH LAYER SERVING LAYER

Page 10: Fantasy League Sports with Big Data Technologies

SPEED LAYER / REAL-TIMENew play comes in:

Page 11: Fantasy League Sports with Big Data Technologies

SPEED LAYER / REAL-TIMENew play comes in:

Lookup (roster data):

Page 12: Fantasy League Sports with Big Data Technologies

SPEED LAYER: REAL-TIMENew play comes in:

Lookup (roster data):

Generate information:

Page 13: Fantasy League Sports with Big Data Technologies

SPEED LAYER: REAL-TIMENew play comes in:

Lookup (roster data):

Generate information:

Aggregate pointsby user

Page 14: Fantasy League Sports with Big Data Technologies

PIPELINE

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

DATA INGESTION REAL-TIME / SPEED LAYER

BATCH LAYER SERVING LAYER

Page 15: Fantasy League Sports with Big Data Technologies

BATCH LAYER

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

DATA INGESTION REAL-TIME / SPEED LAYER

BATCH LAYER SERVING LAYER

Page 16: Fantasy League Sports with Big Data Technologies

BATCH LAYER• Spark on top of HDFS

• Admin queries (Updated once every 24 hours):

• Top Users

• Demographic Breakdown

• User and Player queries:

• Historical fantasy points per game / week

Page 17: Fantasy League Sports with Big Data Technologies

PIPELINE

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

Page 18: Fantasy League Sports with Big Data Technologies

SERVING LAYER

User Data:Information

Roster

NFL:Play-by-Play

HDFSKafka

Spark Streaming

Spark

Cassandra

Flask

Page 19: Fantasy League Sports with Big Data Technologies

SERVING LAYER

Cassandra

Flask

Multiple queries require different tables with efficient

schemas.

API for both analysts and users of the website.

D3 graphs

Page 20: Fantasy League Sports with Big Data Technologies

LESSONS LEARNED

• Technologies: Spark, Spark Streaming, Cassandra

• Scalability in Spark Streaming for different operations (number of records vs number of nodes)

• Spark Streaming saveAs Function saves a lot of small files even after repartition, so to deal with that in HDFS I wrote a function to append to a single file.

Page 21: Fantasy League Sports with Big Data Technologies

SILVIA OLIVEROS• M.S. Computer Engineering -

Purdue University

• Developed Visual Analytics Tools for DHS Partners:

• Coast Guard

• Dietary Survey (NHANES)

[email protected]/soliverost