Luke Young Rhee: Hashtag-Hashtag Presentation

20
ashta g # ashta g # Luke Young Rhee

Transcript of Luke Young Rhee: Hashtag-Hashtag Presentation

Page 1: Luke Young Rhee: Hashtag-Hashtag Presentation

ashtag#ashta

g#Luke Young Rhee

Page 2: Luke Young Rhee: Hashtag-Hashtag Presentation

ashtag#

The trending topics of trending topics

ashtag#

Luke Young Rhee

Page 3: Luke Young Rhee: Hashtag-Hashtag Presentation

Impact

Discovery

Page 4: Luke Young Rhee: Hashtag-Hashtag Presentation

ashtag#

The trending topics of trending topics

ashtag#htht.tech

DEMO

Page 6: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline

Page 7: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline {"text": "RT @Pozzzzzzz

"id": 5469180, “time”: “2016:09:30T...”,"entities": {"user_mentions":

... "hashtags": [

{“text”: ”yum”, “indices”: [32, 35] },

{“text”: ”beer”, “indices”: [32, 36] }

] },

Tweet

Page 8: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline kafka-connect-twitter

●Kafka Connect

●Data

Formating

Page 9: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline Kafka Streams

●Process

●Filter

Page 10: Luke Young Rhee: Hashtag-Hashtag Presentation

The PipelineDruid

● Low latency○ Ingestion

○ Analytics

● Scalable

Page 11: Luke Young Rhee: Hashtag-Hashtag Presentation

The PipelineQuery: Table

time hashtags

... [“beer”, “yum”]

... [“beer”, “Lagunitas”]

... [“cats”, “cute”, “notCrazy”]

Page 12: Luke Young Rhee: Hashtag-Hashtag Presentation

The PipelineQuery: Filter

time hashtags

... [“beer”, “yum”]

... [“beer”, “Lagunitas”]

... [“cats”, “cute”, “notCrazy”]

... “beer”

... “yum”

... “beer”

... “Lagunitas”

hashtags = “beer”

Page 13: Luke Young Rhee: Hashtag-Hashtag Presentation

The PipelineQuery: Count

time hashtags count

... “beer” 2

... “yum” 1

... “Lagunitas” 1

Page 14: Luke Young Rhee: Hashtag-Hashtag Presentation

The PipelineQuery: Count

+TopN

time hashtags count

... “beer” 2

... “yum” 1

... “Lagunitas” 1

Page 15: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline

Page 16: Luke Young Rhee: Hashtag-Hashtag Presentation

The Pipeline

Kafka Connector

Kafka Streams

Kafka Indexing Service

pydruid

Page 17: Luke Young Rhee: Hashtag-Hashtag Presentation

ChallengesKafka

Streams / Serdes

+Druid

Page 18: Luke Young Rhee: Hashtag-Hashtag Presentation

Druid Cluster

Page 19: Luke Young Rhee: Hashtag-Hashtag Presentation

ChallengesKafka

Streams / Serdes

+Druid

Ingest: 1% Twitter ~ 1k - 2k /min

Query: ~ 92.6k rows

Success!

Page 20: Luke Young Rhee: Hashtag-Hashtag Presentation

Thanks!

Luke Young Rhee

University of California, IrvineMS Mathematics

Nintex, IrvineTest Analyst

Enjoy being near the ocean and getting lost in new cities