What Is Data Science? Data Science Course - Data Science Tutorial For Beginners | Edureka
EDHREC @ Data Science MD
Click here to load reader
-
Upload
donald-miner -
Category
Technology
-
view
1.844 -
download
0
Transcript of EDHREC @ Data Science MD
![Page 1: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/1.jpg)
EDHREC, Magic: TG Recommendation Engine
(and data science on games)Donald Miner @donaldpminer
[email protected] 21st, 2015 - Data Science MD Meetup
Games & Stuff in Glen Burnie, MD
![Page 2: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/2.jpg)
About Don
![Page 3: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/3.jpg)
About Don, Planeswalker
![Page 4: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/4.jpg)
Talk agenda Background EDHREC Overview EDHREC Data Analysis EDHREC Architecture Data Science Application UX Lessons Learned Related Work in Magic and Other Domains Virtues of Data Science on Games
![Page 5: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/5.jpg)
Magic: The Gathering Trading card game First published in 1993 20 million players in 2015 (World of Warcraft has 7.1 million
subscribers) Organized tournaments Secondary market
1993$27,000
![Page 6: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/6.jpg)
Elder Dragon Highlander / Commander
One of the Magic “formats” Started independently from WOTC late
00’s Officially supported starting 2011 Typically multiplayer 100-card singleton deck
(instead of 60-card, up to 4x copies) Each deck has a single “commander”
(unique to this format)
![Page 7: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/7.jpg)
Data Science Term coined around 2008
Represents a shift in data analysis in industry
A mix of computer science, machine learning, statistics, programming, visualization, and domain knowledge
![Page 8: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/8.jpg)
EDHREC Overview
![Page 9: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/9.jpg)
![Page 10: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/10.jpg)
EDHREC Deck Recommendations
![Page 11: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/11.jpg)
EDHREC Commander Stats
![Page 12: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/12.jpg)
EDHREC Card Stats
![Page 13: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/13.jpg)
EDHREC Recommendation Engine
![Page 14: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/14.jpg)
EDHREC Algorithm 1.0User-based Collaborative Filtering
Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
Analogy:Deck -> UserCard -> Item
Pros:Better at picking up bigger themes in decksEasy to implement
Cons:Had issues discovering subtle deck themesHad issues pointing out combos
![Page 15: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/15.jpg)
Recommendation Engine 2.0 Algorithm
31,000decks
Decks that contain Sanguine Bond AND Exquisite Blood ÷
Decks that contain Sanguine Bond OR Exquisite Blood
Step 1: Card Affinity Matrix
Jaccard / Tanimoto distance
Repeat for every card combination(15,000 cards)
This is the basis of the Card Analysis pageThis matrix is built offline in batch
Image from http://blog.comsysto.com/2013/04/03/background-of-collaborative-filtering-with-mahout/
![Page 16: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/16.jpg)
Recommendation Engine 2.0 Algorithm 31,000
decks
1. Select each row of the Tanimoto matrix corresponding to cards in Deck D2. Sum the columns
3. Sort by score, display results
Step 2: Calculate Scores
This gives you a sum of the Tanimoto coefficients
I really have no idea what this algorithm is called… I’m not sure if it’s novel or notThis is performed in real time
![Page 17: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/17.jpg)
Lessons learned:Taking out the garbage A lot of garbage gets submitted to EDHREC
Decks with <20 cards Decks with invalid commanders Decks with illegal cards
The algorithms handle this well and rarely do problem cards show up
However, pruning “worthless” decks significantly improves performance due to all the O(N^2) algorithms going on
General advice: Think about which pieces of data are worthless in your data set
![Page 18: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/18.jpg)
Lessons learned:Partitioning (too much or too little) Partitioning the user/deck space into subgroups is a great way to
speed things up in recommendation engines The 31,000 EDHREC decks are partitioned into 27 partitions
(one per possible color combination) Algorithms are ran typically on a single partition
(e.g., Red/Blue deck recommendations only come from other Red/Blue decks)
However, themes that span color combinations suffer worse recommendations
However, partitioning too deep causes problems I tried partitioning by commander, and that was awful:
new commanders, themes than span commanders sufferGeneral advice: There is no good way to figure out a partition scheme, just try it out
![Page 19: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/19.jpg)
EDHREC Architecture
![Page 20: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/20.jpg)
Batch Processes (cron)
EDHREC Architecture
New DecksReddit Bot(praw)
New Decks
Pre-calculated
Stats
All Decks
![Page 21: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/21.jpg)
Batch Processes (cron)
New DecksReddit Bot(praw)
New Decks
Pre-calculated
Stats
All Decks
Redis• In-memory key/value data store
• Stores website state• Utilized as a cache• Stores all of the decks• Stores all of the pre-computed stats• Stores all metadata about Magic cards
• EDHREC serializes most things to common internal json data formats
• Very fast• Very easy to use• Good support with Python
• Getting harder to do “analysis”• Going to move to Redshift SQL
database for analytical things
![Page 22: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/22.jpg)
Batch Processes (cron)
New DecksReddit Bot(praw)
New Decks
Pre-calculated
Stats
All Decks
Cherrypy• “A Minimalist Python Web Framework”
• Runs the website• Pulls data from Redis and then
renders the results as HTML• Most of the data from Redis is cached
in memory objects (IPC to Redis too slow)
• EDHREC runs 6 of these in parallel behind an NGINX round robin proxy
• Very easy to use, doesn’t get in your way
• Very easy to expose Python data science
• Running into problems with maintainability due to my own sloppiness
![Page 23: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/23.jpg)
Batch Processes (cron)
New DecksReddit Bot(praw)
New Decks
Pre-calculated
Stats
All Decks
Python• Programming language• Plenty of good libraries for data
analysis:numpy, pandas in this case
• Can handle the “full stack” well(from data analysis to web front end)
• PRAW is a great framework for building Reddit bots
• Most things run every few hours
![Page 24: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/24.jpg)
Batch Processes (cron)
New DecksReddit Bot(praw)
New Decks
Pre-calculated
Stats
All Decks
Amazon Web Services
• Infrastructure as a Service
• Easily spin up new servers with pre-built operating system
• EDHREC runs on one m4.2xlarge8 CPUs, 32GB RAM, Better network10 cents per hour ($72/month)
• Great for recovering from failures
• Easy to upgrade machine
• Very good uptime so far
• Easy to backup to s3
![Page 25: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/25.jpg)
Some observations aboutUser Experience and AI applications
![Page 26: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/26.jpg)
LOL! Look at the dumb bot!
Lesson learned:Humans LOVE pointing out when something the AI is doing is strange or wrong,even if it gets it right 90% of the time. Therefore, I am very conservative of what I end up publishing asI’ve gotten burned a few times. Which can be a shame sometimes.
(just a couple examples)
![Page 27: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/27.jpg)
The apocalypse is near “EDHREC is ruining EDH/Commander” “EDHREC is taking the fun out of deck construction” “EDHREC kills conversation”
MapQuest takes the fun out of planning trips!
Mostly these are taken as compliments AI is going to have resistance from people who liked the
manual labor I don’t think the commentary entirely off base… but...
![Page 28: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/28.jpg)
Sometimes too much is too much Over-engineering and doing too much is an easy trap
You want to make it better and provide more “intelligence” Give the users ability to discover and find things
Increases user engagement Better results
Philosophy: EDHREC is a tool, not a solution I’m starting to see my other data science projects this way
Lesson learned:Spend more time on interactive “discovery tools”than intelligent do-everything algorithms
![Page 29: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/29.jpg)
Interesting related things to look at
![Page 30: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/30.jpg)
RoboRosewater Rosewater is the name of the Magic lead designer RoboRosewater is a “backwards” neural network,
trained on Magic cards
![Page 31: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/31.jpg)
MTG Finance
Lots of analysis around Magic finance!
mtgstocks.com
![Page 32: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/32.jpg)
Diablo 3 build clustering
![Page 33: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/33.jpg)
Virtues of this whole thingCommunity Most hobbies are defined by communities Technology can bring communities together
Self-Development Data has value and getting data of value is hard Hobby-based data is relatively easy to acquire (compared to say data used by
health care companies) A great way to do real data science on real data (opposed to synthetic data on a
more valuable data set)
Profit! Hobbyists are passionate about their hobby and willing to spend money on it They will pay for and support services they like
![Page 34: EDHREC @ Data Science MD](https://reader038.fdocuments.in/reader038/viewer/2022102322/58a45c5c1a28abb8288b473d/html5/thumbnails/34.jpg)
EDHREC, Magic: TG Recommendation Engine
(and data science on games)Donald Miner @donaldpminer
[email protected] 21st, 2015 - Data Science MD Meetup
Games & Stuff in Glen Burnie, MD